Skip to content

CLI Reference

The harness provides a harness command with several subcommands.

harness run

Run a multi-session experiment from a config file.

harness run <config.yaml> [OPTIONS]
Option Description
--model TEXT Override the model from config
--tag TEXT Add tags (can be repeated)
--session-mode MODE Override session mode (isolated, chained, forked)
--run-name TEXT Custom name for the run directory
--runs-dir PATH Output directory (default: runs)
--no-capture Disable API request capture (disables resampling)

Examples:

# Basic run
harness run examples/isolated.yaml

# With overrides
harness run examples/isolated.yaml \
  --model claude-sonnet-4-20250514 \
  --tag baseline \
  --session-mode chained \
  --run-name my-experiment-01

# Custom output directory
harness run config.yaml --runs-dir ./output

harness list

List all completed runs.

harness list [OPTIONS]
Option Description
--runs-dir PATH Runs directory (default: runs)
--json Output as JSON

Example output:

  my-run-01  |  claude-sonnet-4  |  isolated  |  2 sessions, 30 steps  |  $0.1234
  my-run-02  |  claude-sonnet-4  |  chained   |  3 sessions, 45 steps  |  $0.2345

harness inspect

Show details of a completed run.

harness inspect <run_dir> [OPTIONS]
Option Description
--json Output as JSON (includes file changes)

Example output:

Run: smoke-test-01
Model: claude-sonnet-4 (openrouter)
Mode: isolated
Tags: smoke-test
Total: 15 steps, 5 tool calls
Cost: $0.0596
File writes: 1

  Session 1: 15 steps, 5 tool calls  $0.0596

File changes:
  session 1, step 15: MEMORY.md (+9/-0)

harness resample

Resample a specific API turn N times (no tool execution).

For concepts and method comparison, see Resampling & Replay.

harness resample <run_dir> [OPTIONS]
Option Default Description
--session INT 1 Session index
--request INT 1 Request index to resample
--count INT 5 Number of resamples
--model TEXT original Override model
--replicate INT Replicate number (for session_NN_rNN dirs)
--list-requests List available requests and exit

Discovering requests

harness resample runs/my-run --session 1 --list-requests

Resampling

# Resample request 5 ten times
harness resample runs/my-run --session 1 --request 5 --count 10

# Resample from a replicate session
harness resample runs/my-run --session 2 --replicate 3 --request 5 --count 5

Results are saved to session_NN/resamples/request_NNN/ (and request_NNN_vNN/ for edited variants).

harness resample-edit

Edit a captured API request and resample with the modified version.

For intervention strategy and output details, see Resampling & Replay.

harness resample-edit <run_dir> [OPTIONS]
Option Default Description
--session INT 1 Session index
--request INT 1 Request index
--dump Dump the request JSON to stdout for editing
--input PATH Path to edited request JSON (use - for stdin)
--label TEXT cli-edit Human-readable label for this variant
--count INT 5 Number of resamples
--model TEXT original Override model
--replicate INT Replicate number

Two-step workflow

Step 1 — Dump the original request:

harness resample-edit runs/my-run --session 1 --request 5 --dump > edit.json

Step 2 — Edit the JSON file (change assistant text, tool results, system prompt, etc.), then resample.

Do not edit thinking blocks. They carry cryptographic signatures validated by the API — any modification will cause a 400 error. See Thinking blocks for details.

harness resample-edit runs/my-run --session 1 --request 5 \
  --input edit.json --label "removed hedging" --count 5

Piping from stdin

harness resample-edit runs/my-run --session 1 --request 5 --dump \
  | jq '.system = "You are a cautious engineer. Always check for edge cases."' \
  | harness resample-edit runs/my-run --session 1 --request 5 \
      --input - --label "cautious prompt" --count 10

Batch interventions

for req in 3 5 7 9; do
  harness resample-edit runs/my-run --session 1 --request $req --dump \
    | jq '(.messages[] | select(.role == "user") | .content[] | select(.type == "tool_result")).content = "Error: file not found"' \
    | harness resample-edit runs/my-run --session 1 --request $req \
        --input - --label "tool-error" --count 5
done

harness resample-session

Re-run a forked session N times to study behavioral variance.

For behavioral semantics and output expectations, see Resampling & Replay.

harness resample-session <run_dir> [OPTIONS]
Option Default Description
--session INT 2 Session index to resample
--count INT 5 Number of new replicates

Example:

harness resample-session runs/my-run --session 2 --count 5

This finds the session's fork_from target, resolves the session ID, and runs N new replicates. New directories are appended with auto-incrementing replicate numbers, and run_meta.json is updated.

harness replay

Replay a session from any API turn with full tool execution. Each replicate runs in an isolated git worktree, so multiple replicates execute in parallel. Each replay becomes a new independent run with complete provenance.

For replay internals and data model details, see Resampling & Replay.

harness replay <run_dir> [OPTIONS]
Option Default Description
--session INT 1 Session index to replay from
--turn INT Turn index to replay from (1-based, required unless --list-turns)
--count INT 1 Number of replay replicates
--prompt TEXT Additional prompt after tool results
--list-turns List available turns and exit
--continue-sessions After replaying the selected session, run sessions N+1..end using the source config
--runs-dir PATH runs Output directory
--replicate INT Replicate number (for session_NN_rNN dirs)

Listing turns

$ harness replay runs/my-run --session 1 --list-turns

Turns in session 1 (12 total):

  Turn 1: Read  (1 results)
  Turn 2: Read, Grep  (2 results)  [_step_1_3]
  Turn 3: Edit, Write  (2 results)  [_step_1_5]
  ...

Replaying

By default, only the targeted session is replayed. Use --continue-sessions to also run sessions after it.

# Replay from turn 5, three times (only session 1 runs)
harness replay runs/my-run --session 1 --turn 5 --count 3

# Replay session 1 turn 5, then continue with sessions 2, 3, etc.
harness replay runs/my-run --session 1 --turn 5 --continue-sessions

# Replay with an additional prompt
harness replay runs/my-run --session 1 --turn 5 --prompt "Try a different approach"

# Replay from turn 1 (re-run from scratch)
harness replay runs/my-run --session 1 --turn 1 --count 2

Each replay creates a new run directory (e.g. replay_my-run_s1_t5_r01_2026-03-16T00-00-00/) with full artifacts including replay_meta.json for provenance tracking. The source working directory is never modified — each replicate operates in its own git worktree.