Targets Configuration

Targets define which agent or LLM provider to evaluate. They are configured in .agentv/targets.yaml to decouple eval files from provider details.

Structure

targets:
  - name: azure-base
    provider: azure
    endpoint: ${{ AZURE_OPENAI_ENDPOINT }}
    api_key: ${{ AZURE_OPENAI_API_KEY }}
    model: ${{ AZURE_DEPLOYMENT_NAME }}

  - name: vscode_dev
    provider: vscode
    workspace_template: ${{ WORKSPACE_PATH }}
    judge_target: azure-base

  - name: local_agent
    provider: cli
    command: 'python agent.py --prompt {PROMPT}'
    judge_target: azure-base

Environment Variables

Use ${{ VARIABLE_NAME }} syntax to reference values from your .env file:

targets:
  - name: my_target
    provider: anthropic
    api_key: ${{ ANTHROPIC_API_KEY }}
    model: ${{ ANTHROPIC_MODEL }}

This keeps secrets out of version-controlled files.

Supported Providers

Provider	Type	Description
`azure`	LLM	Azure OpenAI
`anthropic`	LLM	Anthropic Claude API
`gemini`	LLM	Google Gemini
`claude`	Agent	Claude Agent SDK
`codex`	Agent	Codex CLI
`pi-coding-agent`	Agent	Pi Coding Agent
`vscode`	Agent	VS Code with Copilot
`vscode-insiders`	Agent	VS Code Insiders
`cli`	Agent	Any CLI command
`mock`	Testing	Mock provider for dry runs

Referencing Targets in Evals

Set the default target at the top level or override per case:

# Top-level default
execution:
  target: azure-base

tests:
  - id: test-1
    # Uses azure-base

  - id: test-2
    execution:
      target: vscode_dev  # Override for this case

Judge Target

Agent targets that need LLM-based evaluation specify a judge_target — the LLM used to run LLM judge evaluators:

targets:
  - name: codex_target
    provider: codex
    judge_target: azure-base  # LLM used for judging

Workspace Template

For agent targets, workspace_template specifies a directory that gets copied to a temporary location before each test runs. This provides isolated, reproducible workspaces.

targets:
  - name: claude_agent
    provider: claude
    workspace_template: ./workspace-templates/my-project
    judge_target: azure-base

When workspace_template is set:

The template directory is copied to ~/.agentv/workspaces/<eval-run-id>/shared/
The .git directory is skipped during copy
Tests share the workspace; use hooks.after_each to reset state between tests

Workspace Lifecycle Hooks

Run commands and reset/cleanup policies at different lifecycle points using workspace.hooks. This can be defined at the suite level (applies to all tests) or per test (overrides suite-level).

workspace:
  template: ./workspace-templates/my-project
  hooks:
    before_all:
      command: ["bun", "run", "setup.ts"]
      timeout_ms: 120000
      cwd: ./scripts
    after_each:
      command: ["bun", "run", "reset.ts"]
      timeout_ms: 5000
      reset: fast
    after_all:
      command: ["bun", "run", "cleanup.ts"]
      timeout_ms: 30000

Field	Description
`template`	Directory to copy as workspace (alternative to target-level `workspace_template`)
`hooks.before_all`	Runs once after workspace creation, before the first test
`hooks.after_all`	Runs once after the last test, before cleanup
`hooks.before_each`	Runs before each test
`hooks.after_each`	Runs after each test (supports both `command` and `reset`)

Each hook config accepts:

Field	Description
`command`	Command array (e.g., `["bun", "run", "setup.ts"]`)
`reset`	Reset mode: `none`, `fast`, `strict`
`clean`	Cleanup mode: `always`, `on_success`, `on_failure`, `never`
`timeout_ms`	Timeout in milliseconds (default: 60000 for setup hooks, 30000 for teardown hooks)
`cwd`	Working directory (relative paths resolved against eval file directory)

Lifecycle order: template copy → hooks.before_all → git baseline → (hooks.before_each → agent runs → file changes captured → hooks.after_each) × N tests → hooks.after_all → cleanup

Shared workspace: The workspace is created once and shared across all tests in a suite. Use hooks.after_each.reset to reset state between tests (e.g., fast/strict).

Error handling:

hooks.before_all / hooks.before_each command failure aborts the test with an error result
hooks.after_all / hooks.after_each command failure is non-fatal (warning only)

Script context: All scripts receive a JSON object on stdin with case context:

{
  "workspace_path": "/home/user/.agentv/workspaces/run-123/case-01",
  "test_id": "case-01",
  "eval_run_id": "run-123",
  "case_input": "Fix the bug",
  "case_metadata": { "repo": "sympy/sympy", "base_commit": "abc123" }
}

Suite vs per-test: When both are defined, test-level fields replace suite-level fields. See Per-Test Workspace Config for examples.

Repository Lifecycle

Clone git repositories into the workspace automatically, with caching for fast repeat runs. Define repos at the suite level or per test:

workspace:
  repos:
    - path: ./my-repo
      source:
        type: git
        url: https://github.com/org/repo.git
      checkout:
        ref: main
        ancestor: 1          # check out the parent commit
      clone:
        depth: 10             # shallow clone
    - path: ./local-copy
      source:
        type: local
        path: /home/user/projects/my-project
  hooks:
    after_each:
      reset: fast             # none | fast | strict
  isolation: shared           # shared (default) | per_test
  mode: pooled                # pooled | temp | static
  path: /tmp/my-ws            # workspace path for mode=static

Field	Description
`repos[].path`	Directory within the workspace to clone into
`repos[].source.type`	`git` (remote URL) or `local` (absolute path)
`repos[].checkout.ref`	Branch, tag, or SHA to check out (default: `HEAD`)
`repos[].checkout.resolve`	`remote` (ls-remote, default for git) or `local`
`repos[].checkout.ancestor`	Walk N commits back from ref (e.g., `1` for parent)
`repos[].clone.depth`	Shallow clone depth
`repos[].clone.filter`	Partial clone filter (e.g., `blob:none`)
`repos[].clone.sparse`	Sparse checkout paths
`hooks.after_each.reset`	Reset policy after each test: `none`, `fast`, `strict`
`isolation`	`shared` reuses one workspace; `per_test` creates a fresh copy per test
`mode`	Workspace mode: `pooled`, `temp`, `static`
`path`	Workspace path for `mode=static`. When empty or missing, the workspace is auto-materialised (template copied + repos cloned). Populated directories are reused as-is.
`hooks.enabled`	Boolean (default: `true`). Set `false` to skip all lifecycle hooks.

Pooling: mode: pooled (or default shared repo mode) reuses pool slots between runs. Use mode: temp to disable pooling for fresh clone/checkouts each run.

Static auto-materialisation: When mode: static and path points to an empty or missing directory, AgentV automatically copies the template and clones repos into it. If the directory already exists and is populated, it is reused as-is.

Pool management commands:

agentv workspace list — list all pool entries with size and repo info
agentv workspace clean — remove all pool entries

Common patterns:

# Pinned commit with shallow clone (fast CI runs)
workspace:
  repos:
    - path: ./repo
      source:
        type: git
        url: https://github.com/org/repo.git
      checkout:
        ref: abc123def
      clone:
        depth: 1

# Multi-repo shared workspace with reset
workspace:
  repos:
    - path: ./frontend
      source: { type: git, url: https://github.com/org/frontend.git }
    - path: ./backend
      source: { type: git, url: https://github.com/org/backend.git }
  hooks:
    after_each:
      reset: fast

Cleanup Behavior

Default finish behavior:

Success: cleanup
Failure: keep

CLI overrides:

--retain-on-success keep|cleanup
--retain-on-failure keep|cleanup

cwd vs workspace_template

Option	Use Case
`cwd`	Run in an existing directory (shared across tests)
`workspace_template`	Copy template to temp location (isolated per case)

These options are mutually exclusive. If neither is set, the eval file’s directory is used as the working directory.