Skip to content

Targets Configuration

Targets define which agent or LLM provider to evaluate. They are configured in .agentv/targets.yaml to decouple eval files from provider details.

targets:
- name: azure-base
provider: azure
endpoint: ${{ AZURE_OPENAI_ENDPOINT }}
api_key: ${{ AZURE_OPENAI_API_KEY }}
model: ${{ AZURE_DEPLOYMENT_NAME }}
- name: vscode_dev
provider: vscode
workspace_template: ${{ WORKSPACE_PATH }}
judge_target: azure-base
- name: local_agent
provider: cli
command: 'python agent.py --prompt {PROMPT}'
judge_target: azure-base

Use ${{ VARIABLE_NAME }} syntax to reference values from your .env file:

targets:
- name: my_target
provider: anthropic
api_key: ${{ ANTHROPIC_API_KEY }}
model: ${{ ANTHROPIC_MODEL }}

This keeps secrets out of version-controlled files.

ProviderTypeDescription
azureLLMAzure OpenAI
anthropicLLMAnthropic Claude API
geminiLLMGoogle Gemini
claudeAgentClaude Agent SDK
codexAgentCodex CLI
pi-coding-agentAgentPi Coding Agent
vscodeAgentVS Code with Copilot
vscode-insidersAgentVS Code Insiders
cliAgentAny CLI command
mockTestingMock provider for dry runs

Set the default target at the top level or override per case:

# Top-level default
execution:
target: azure-base
tests:
- id: test-1
# Uses azure-base
- id: test-2
execution:
target: vscode_dev # Override for this case

Agent targets that need LLM-based evaluation specify a judge_target — the LLM used to run LLM judge evaluators:

targets:
- name: codex_target
provider: codex
judge_target: azure-base # LLM used for judging

For agent targets, workspace_template specifies a directory that gets copied to a temporary location before each test runs. This provides isolated, reproducible workspaces.

targets:
- name: claude_agent
provider: claude
workspace_template: ./workspace-templates/my-project
judge_target: azure-base

When workspace_template is set:

  • The template directory is copied to ~/.agentv/workspaces/<eval-run-id>/shared/
  • The .git directory is skipped during copy
  • Tests share the workspace; use hooks.after_each to reset state between tests

Run commands and reset/cleanup policies at different lifecycle points using workspace.hooks. This can be defined at the suite level (applies to all tests) or per test (overrides suite-level).

workspace:
template: ./workspace-templates/my-project
hooks:
before_all:
command: ["bun", "run", "setup.ts"]
timeout_ms: 120000
cwd: ./scripts
after_each:
command: ["bun", "run", "reset.ts"]
timeout_ms: 5000
reset: fast
after_all:
command: ["bun", "run", "cleanup.ts"]
timeout_ms: 30000
FieldDescription
templateDirectory to copy as workspace (alternative to target-level workspace_template)
hooks.before_allRuns once after workspace creation, before the first test
hooks.after_allRuns once after the last test, before cleanup
hooks.before_eachRuns before each test
hooks.after_eachRuns after each test (supports both command and reset)

Each hook config accepts:

FieldDescription
commandCommand array (e.g., ["bun", "run", "setup.ts"])
resetReset mode: none, fast, strict
cleanCleanup mode: always, on_success, on_failure, never
timeout_msTimeout in milliseconds (default: 60000 for setup hooks, 30000 for teardown hooks)
cwdWorking directory (relative paths resolved against eval file directory)

Lifecycle order: template copy → hooks.before_all → git baseline → (hooks.before_each → agent runs → file changes captured → hooks.after_each) × N tests → hooks.after_all → cleanup

Shared workspace: The workspace is created once and shared across all tests in a suite. Use hooks.after_each.reset to reset state between tests (e.g., fast/strict).

Error handling:

  • hooks.before_all / hooks.before_each command failure aborts the test with an error result
  • hooks.after_all / hooks.after_each command failure is non-fatal (warning only)

Script context: All scripts receive a JSON object on stdin with case context:

{
"workspace_path": "/home/user/.agentv/workspaces/run-123/case-01",
"test_id": "case-01",
"eval_run_id": "run-123",
"case_input": "Fix the bug",
"case_metadata": { "repo": "sympy/sympy", "base_commit": "abc123" }
}

Suite vs per-test: When both are defined, test-level fields replace suite-level fields. See Per-Test Workspace Config for examples.

Clone git repositories into the workspace automatically, with caching for fast repeat runs. Define repos at the suite level or per test:

workspace:
repos:
- path: ./my-repo
source:
type: git
url: https://github.com/org/repo.git
checkout:
ref: main
ancestor: 1 # check out the parent commit
clone:
depth: 10 # shallow clone
- path: ./local-copy
source:
type: local
path: /home/user/projects/my-project
hooks:
after_each:
reset: fast # none | fast | strict
isolation: shared # shared (default) | per_test
mode: pooled # pooled | temp | static
path: /tmp/my-ws # workspace path for mode=static
FieldDescription
repos[].pathDirectory within the workspace to clone into
repos[].source.typegit (remote URL) or local (absolute path)
repos[].checkout.refBranch, tag, or SHA to check out (default: HEAD)
repos[].checkout.resolveremote (ls-remote, default for git) or local
repos[].checkout.ancestorWalk N commits back from ref (e.g., 1 for parent)
repos[].clone.depthShallow clone depth
repos[].clone.filterPartial clone filter (e.g., blob:none)
repos[].clone.sparseSparse checkout paths
hooks.after_each.resetReset policy after each test: none, fast, strict
isolationshared reuses one workspace; per_test creates a fresh copy per test
modeWorkspace mode: pooled, temp, static
pathWorkspace path for mode=static. When empty or missing, the workspace is auto-materialised (template copied + repos cloned). Populated directories are reused as-is.
hooks.enabledBoolean (default: true). Set false to skip all lifecycle hooks.

Pooling: mode: pooled (or default shared repo mode) reuses pool slots between runs. Use mode: temp to disable pooling for fresh clone/checkouts each run.

Static auto-materialisation: When mode: static and path points to an empty or missing directory, AgentV automatically copies the template and clones repos into it. If the directory already exists and is populated, it is reused as-is.

Pool management commands:

  • agentv workspace list — list all pool entries with size and repo info
  • agentv workspace clean — remove all pool entries

Common patterns:

# Pinned commit with shallow clone (fast CI runs)
workspace:
repos:
- path: ./repo
source:
type: git
url: https://github.com/org/repo.git
checkout:
ref: abc123def
clone:
depth: 1
# Multi-repo shared workspace with reset
workspace:
repos:
- path: ./frontend
source: { type: git, url: https://github.com/org/frontend.git }
- path: ./backend
source: { type: git, url: https://github.com/org/backend.git }
hooks:
after_each:
reset: fast

Default finish behavior:

  • Success: cleanup
  • Failure: keep

CLI overrides:

  • --retain-on-success keep|cleanup
  • --retain-on-failure keep|cleanup
OptionUse Case
cwdRun in an existing directory (shared across tests)
workspace_templateCopy template to temp location (isolated per case)

These options are mutually exclusive. If neither is set, the eval file’s directory is used as the working directory.