AgentLens¶
A harness for running multi-session agent trajectories using the Claude Agent SDK, capturing them in ATIF (Agent Trajectory Interchange Format), and tracking file state changes across sessions.
Built for AI alignment and interpretability research — studying how LLM agents behave across multi-turn, multi-session, multi-agent interactions.
Note
AgentLens currently supports Claude Code via the Claude Agent SDK. Support for additional agents and frameworks is planned — see Roadmap. Some features (especially turn-level replay) are experimental. We welcome PRs and contributions — open an issue if you run into bugs.
What it does¶
The harness takes a YAML config describing a sequence of sessions (prompts to an agent), runs each session against a working directory via the Claude Agent SDK, and produces structured outputs:
- ATIF trajectories — standardized JSON capturing every agent step, tool call, observation, and thinking block
- Shadow git change tracking — automatic tracking of all file changes via an invisible git repo, with per-step write attribution and full unified diffs
- Session chaining — three modes for controlling how sessions relate to each other (isolated, chained, forked)
- Resampling & replay — four methods for studying behavioral variance, from quick API resampling to full trajectory replay with tool execution. Edit assistant text, tool results, or system prompts to test counterfactuals
- Subagent capture — separate ATIF trajectories for each subagent invocation, linked to the parent via
SubagentTrajectoryRef
Next steps¶
- Installation — get set up
- Quick Start — run your first experiment
- Session Modes — isolated, chained, and forked behavior
- Resampling & Replay — variance analysis from API-level to full replay