Skip to content

Configuration

Experiments are defined as YAML config files. The harness validates configs with Pydantic — errors are caught before any sessions run.

Full example

model: "claude-sonnet-4-20250514"
provider: anthropic
hypothesis: "The agent preserves hedging across sessions"
work_dir: "./repos/my_project"
session_mode: chained
tags: ["experiment-1"]

system_prompt: |
  You are exploring a Python codebase. Use MEMORY.md to keep notes.

allowed_tools:
  - Read
  - Grep
  - Glob
  - Bash
  - Write
  - Edit

max_turns: 30
permission_mode: bypassPermissions
max_budget_usd: 1.00

memory_file: "MEMORY.md"
memory_seed: "# Project Notes\n"

capture_api_requests: true

sessions:
  - session_index: 1
    prompt: "Explore the project structure. Take notes in MEMORY.md."
  - session_index: 2
    prompt: "Read the main module in detail. Update your notes."
  - session_index: 3
    prompt: "Summarize what you know about this project."
    max_turns: 10

Config reference

Top-level fields

Field Required Default Description
engine no claude_code Coding-agent runtime: claude_code or codex
model yes Model identifier (Anthropic name for claude_code; Codex/OpenRouter slug for codex)
provider no anthropic (openai for codex) claude_code: anthropic, openrouter, bedrock, vertex. codex: openai or openrouter.
base_url no Custom API base URL (overrides provider default)
sandbox_mode no workspace-write Codex only: read-only, workspace-write, or danger-full-access
sandbox_workspace_network_access no Codex default Codex only: override sandbox_workspace_write.network_access for workspace-write runs
codex_multi_agent no false Codex only: enable features.multi_agent so Codex can spawn subagents
codex_goal_token_budget no Codex only: ask Codex to create a goal with this token budget before substantive work
codex_goal_objective no session prompt Codex only: objective text used with codex_goal_token_budget
hypothesis no What this experiment tests. Shown in the web UI.
work_dir yes Working directory the agent operates in (any directory)
repo_name no Human-readable name for the working directory
sessions yes List of session configs
session_mode no isolated isolated, chained, or forked
system_prompt no System prompt for all sessions
pre_run_commands no [] Shell commands to run before agent sessions
post_run_commands no [] Shell commands to run after agent sessions, even if a session errors
allowed_tools no Read, Grep, Glob, Bash, Write, Edit Tools the agent can use
max_turns no 50 Max agent turns per session
permission_mode no bypassPermissions acceptEdits or bypassPermissions
memory_file no MEMORY.md File to auto-seed in working directory
memory_seed no # Notes\n Initial content for the memory file
max_budget_usd no Per-session spend cap
agents no [] Subagent definitions (see Subagents)
capture_subagent_trajectories no true Save separate ATIF trajectories per subagent
capture_api_requests no true Capture raw API requests (enables resampling)
run_name no auto Custom name for the run directory
tags no [] Metadata tags
revert_work_dir no false Reset working directory to pre-run state after the run completes
load_project_settings no false Load the repo's CLAUDE.md and .claude/settings.json

Session fields

Field Required Default Description
session_index yes Sequential index starting at 1
prompt yes The user prompt for this session
system_prompt no Per-session system prompt override
max_turns no Per-session max turns override
fork_from no Session index to fork from (must be lower)
count no 1 Run N independent replicates of this session

Lifecycle hook fields

pre_run_commands and post_run_commands are lists of shell command objects. Each command receives HARNESS_RUN_DIR and HARNESS_WORK_DIR in its environment. This is useful for local services, fixture setup, and grading scripts.

Field Required Default Description
command yes Shell command to execute
cwd no harness process cwd Working directory for the command
timeout_seconds no 30 Command timeout
check no true Whether a non-zero exit should fail the run

Providers

Provider Config value Env var Notes
Anthropic anthropic (default) ANTHROPIC_API_KEY Direct Anthropic API. Falls back to Claude Code subscription if no key set.
OpenRouter openrouter OPENROUTER_API_KEY Routes through OpenRouter
AWS Bedrock bedrock AWS credentials Sets CLAUDE_CODE_USE_BEDROCK=1
GCP Vertex AI vertex GCP credentials Sets CLAUDE_CODE_USE_VERTEX=1
Claude Code subscription anthropic (none needed) If no ANTHROPIC_API_KEY is set, the SDK uses your Claude Code subscription credentials from ~/.claude/credentials.json. Usage is covered by your subscription (Pro/Max) with rate limits rather than per-token billing.

The table above applies to the claude_code engine, where provider selects how the Anthropic Messages API is routed.

Codex providers

For the codex engine, provider selects the Codex model provider instead:

Provider Config value Env var Notes
OpenAI openai (default) codex login or OPENAI_API_KEY Codex's built-in provider.
OpenRouter openrouter OPENROUTER_API_KEY Routes Codex through OpenRouter (Responses API).

OpenRouter lets you point Codex at any OpenRouter model without changing your Codex install. AgentLens injects the required model_providers block for you (base_url=https://openrouter.ai/api/v1, wire_api=responses), so you only need to set provider: openrouter and export OPENROUTER_API_KEY:

engine: codex
provider: openrouter
model: "openai/gpt-5.3-codex"   # exact OpenRouter slug, vendor prefix required
  • model must be the full OpenRouter slug including the vendor prefix (e.g. openai/gpt-5.3-codex). A bare gpt-5.3-codex 404s; AgentLens rejects a prefix-less slug at config-load time.
  • wire_api: responses is mandatory and set automatically — Codex's older chat/completions path was removed in Feb 2026.
  • Requests route through and bill on OpenRouter; no OpenAI plan is required.
  • base_url overrides the OpenRouter base if you front it with a gateway.
  • API capture/resample routes through the capture proxy, which forwards your OPENROUTER_API_KEY upstream (subscription-only auth is not enough for capture, same as the OpenAI path).

Cost reporting

Cost figures shown in run_meta.json, harness inspect, and the web UI come from the Claude Agent SDK's total_cost_usd field, which is calculated using Anthropic's list pricing regardless of which provider you use. This means:

  • OpenRouter — reported cost reflects Anthropic list prices, not your actual OpenRouter bill (which may differ)
  • Bedrock / Vertex — reported cost may not match AWS or GCP billing
  • Claude Code subscription — cost is reported but you're not actually billed per-token

Treat cost figures as rough estimates, not authoritative billing data.

Automatic behaviors

  • Memory file is auto-seeded. The harness creates the memory file with seed content if it doesn't already exist.
  • Working directory path is injected into the system prompt. The agent knows where to read/write.
  • The agent's cwd is set to the working directory.

Validation rules

  • Session indices must be unique and contiguous starting at 1
  • fork_from must reference a session with a lower index
  • count must be >= 1
  • session_index must be >= 1