Configuration¶
Experiments are defined as YAML config files. The harness validates configs with Pydantic — errors are caught before any sessions run.
Full example¶
model: "claude-sonnet-4-20250514"
provider: anthropic
hypothesis: "The agent preserves hedging across sessions"
work_dir: "./repos/my_project"
session_mode: chained
tags: ["experiment-1"]
system_prompt: |
You are exploring a Python codebase. Use MEMORY.md to keep notes.
allowed_tools:
- Read
- Grep
- Glob
- Bash
- Write
- Edit
max_turns: 30
permission_mode: bypassPermissions
max_budget_usd: 1.00
memory_file: "MEMORY.md"
memory_seed: "# Project Notes\n"
capture_api_requests: true
sessions:
- session_index: 1
prompt: "Explore the project structure. Take notes in MEMORY.md."
- session_index: 2
prompt: "Read the main module in detail. Update your notes."
- session_index: 3
prompt: "Summarize what you know about this project."
max_turns: 10
Config reference¶
Top-level fields¶
| Field | Required | Default | Description |
|---|---|---|---|
engine |
no | claude_code |
Coding-agent runtime: claude_code or codex |
model |
yes | — | Model identifier (Anthropic name for claude_code; Codex/OpenRouter slug for codex) |
provider |
no | anthropic (openai for codex) |
claude_code: anthropic, openrouter, bedrock, vertex. codex: openai or openrouter. |
base_url |
no | — | Custom API base URL (overrides provider default) |
sandbox_mode |
no | workspace-write |
Codex only: read-only, workspace-write, or danger-full-access |
sandbox_workspace_network_access |
no | Codex default | Codex only: override sandbox_workspace_write.network_access for workspace-write runs |
codex_multi_agent |
no | false |
Codex only: enable features.multi_agent so Codex can spawn subagents |
codex_goal_token_budget |
no | — | Codex only: ask Codex to create a goal with this token budget before substantive work |
codex_goal_objective |
no | session prompt | Codex only: objective text used with codex_goal_token_budget |
hypothesis |
no | — | What this experiment tests. Shown in the web UI. |
work_dir |
yes | — | Working directory the agent operates in (any directory) |
repo_name |
no | — | Human-readable name for the working directory |
sessions |
yes | — | List of session configs |
session_mode |
no | isolated |
isolated, chained, or forked |
system_prompt |
no | — | System prompt for all sessions |
pre_run_commands |
no | [] |
Shell commands to run before agent sessions |
post_run_commands |
no | [] |
Shell commands to run after agent sessions, even if a session errors |
allowed_tools |
no | Read, Grep, Glob, Bash, Write, Edit | Tools the agent can use |
max_turns |
no | 50 |
Max agent turns per session |
permission_mode |
no | bypassPermissions |
acceptEdits or bypassPermissions |
memory_file |
no | MEMORY.md |
File to auto-seed in working directory |
memory_seed |
no | # Notes\n |
Initial content for the memory file |
max_budget_usd |
no | — | Per-session spend cap |
agents |
no | [] |
Subagent definitions (see Subagents) |
capture_subagent_trajectories |
no | true |
Save separate ATIF trajectories per subagent |
capture_api_requests |
no | true |
Capture raw API requests (enables resampling) |
run_name |
no | auto | Custom name for the run directory |
tags |
no | [] |
Metadata tags |
revert_work_dir |
no | false |
Reset working directory to pre-run state after the run completes |
load_project_settings |
no | false |
Load the repo's CLAUDE.md and .claude/settings.json |
Session fields¶
| Field | Required | Default | Description |
|---|---|---|---|
session_index |
yes | — | Sequential index starting at 1 |
prompt |
yes | — | The user prompt for this session |
system_prompt |
no | — | Per-session system prompt override |
max_turns |
no | — | Per-session max turns override |
fork_from |
no | — | Session index to fork from (must be lower) |
count |
no | 1 |
Run N independent replicates of this session |
Lifecycle hook fields¶
pre_run_commands and post_run_commands are lists of shell command objects.
Each command receives HARNESS_RUN_DIR and HARNESS_WORK_DIR in its environment.
This is useful for local services, fixture setup, and grading scripts.
| Field | Required | Default | Description |
|---|---|---|---|
command |
yes | — | Shell command to execute |
cwd |
no | harness process cwd | Working directory for the command |
timeout_seconds |
no | 30 |
Command timeout |
check |
no | true |
Whether a non-zero exit should fail the run |
Providers¶
| Provider | Config value | Env var | Notes |
|---|---|---|---|
| Anthropic | anthropic (default) |
ANTHROPIC_API_KEY |
Direct Anthropic API. Falls back to Claude Code subscription if no key set. |
| OpenRouter | openrouter |
OPENROUTER_API_KEY |
Routes through OpenRouter |
| AWS Bedrock | bedrock |
AWS credentials | Sets CLAUDE_CODE_USE_BEDROCK=1 |
| GCP Vertex AI | vertex |
GCP credentials | Sets CLAUDE_CODE_USE_VERTEX=1 |
| Claude Code subscription | anthropic |
(none needed) | If no ANTHROPIC_API_KEY is set, the SDK uses your Claude Code subscription credentials from ~/.claude/credentials.json. Usage is covered by your subscription (Pro/Max) with rate limits rather than per-token billing. |
The table above applies to the claude_code engine, where provider selects how
the Anthropic Messages API is routed.
Codex providers¶
For the codex engine, provider selects the Codex model provider instead:
| Provider | Config value | Env var | Notes |
|---|---|---|---|
| OpenAI | openai (default) |
codex login or OPENAI_API_KEY |
Codex's built-in provider. |
| OpenRouter | openrouter |
OPENROUTER_API_KEY |
Routes Codex through OpenRouter (Responses API). |
OpenRouter lets you point Codex at any OpenRouter model without changing your
Codex install. AgentLens injects the required model_providers block for you
(base_url=https://openrouter.ai/api/v1, wire_api=responses), so you only need
to set provider: openrouter and export OPENROUTER_API_KEY:
engine: codex
provider: openrouter
model: "openai/gpt-5.3-codex" # exact OpenRouter slug, vendor prefix required
modelmust be the full OpenRouter slug including the vendor prefix (e.g.openai/gpt-5.3-codex). A baregpt-5.3-codex404s; AgentLens rejects a prefix-less slug at config-load time.wire_api: responsesis mandatory and set automatically — Codex's older chat/completions path was removed in Feb 2026.- Requests route through and bill on OpenRouter; no OpenAI plan is required.
base_urloverrides the OpenRouter base if you front it with a gateway.- API capture/resample routes through the capture proxy, which forwards your
OPENROUTER_API_KEYupstream (subscription-only auth is not enough for capture, same as the OpenAI path).
Cost reporting¶
Cost figures shown in run_meta.json, harness inspect, and the web UI come from the Claude Agent SDK's total_cost_usd field, which is calculated using Anthropic's list pricing regardless of which provider you use. This means:
- OpenRouter — reported cost reflects Anthropic list prices, not your actual OpenRouter bill (which may differ)
- Bedrock / Vertex — reported cost may not match AWS or GCP billing
- Claude Code subscription — cost is reported but you're not actually billed per-token
Treat cost figures as rough estimates, not authoritative billing data.
Automatic behaviors¶
- Memory file is auto-seeded. The harness creates the memory file with seed content if it doesn't already exist.
- Working directory path is injected into the system prompt. The agent knows where to read/write.
- The agent's cwd is set to the working directory.
Validation rules¶
- Session indices must be unique and contiguous starting at 1
fork_frommust reference a session with a lower indexcountmust be >= 1session_indexmust be >= 1