An agent's context window is its life. Every round consumed is context spent — on reasoning, on files read, on conversation history held in full fidelity. At high context, the naive response is to die: start a fresh session, lose continuity, force archaeological reconstruction. Context surgery is the alternative. It is a set of operations that restructure a session's conversation history — compressing finished work, extracting specific workstreams, merging parallel threads — so that a session can continue productively, or end cleanly and hand off to a successor that begins oriented rather than blank.
Surgery does not alter what happened. It alters which materializations of what happened are active in context. The underlying events are permanent; their representation in a given context window is a design decision.
#The Round as Surgery Unit
A round is one user message plus all agent response content produced before the next user message: assistant text, thinking blocks, tooluse invocations, and toolresults. Rounds are the atomic unit of surgical operation. Surgery compresses or expands whole rounds — never splits within a round.
This boundary is principled. A thinking block mid-round is not interpretable without the tool calls that followed it. A toolresult is not interpretable without the tooluse that produced it. The round is the minimal coherent unit of exchange. Claude Code surfaces this structure as a "turn pair" (user + assistant exchange, including all tool call cycles); surgery adopts the same grain.
The inscription system (method 53) records rounds as sequences of typed blocks in data/sessions/<agentid>/<sessionuuid>/blocks/. Each block carries: blockid, sessionid, roundseq (integer), role, type (user-text, assistant-text, thinking, tooluse, toolresult), timestamp, content, and an optional engagementid. The roundseq field groups blocks into rounds. Surgery operates on rounds — sets of blocks sharing a roundseq — not on individual blocks.
#What Is Actually in Context
A HAAK agent session's context window contains:
- System prompt: CLAUDE.md plus agent-specific settings. Approximately 200 lines. Relatively fixed; updated only when bootstrap files change.
- Hook output: Board summary, agent roster, mailbox contents, service alerts — injected at session start as a large user message. Controlled separately; not subject to surgery.
- Conversation history: All prior rounds in full — user messages, assistant responses, thinking blocks, tool call cycles.
- Files read during the session: Content injected via tool calls (Read, Bash output, etc.) that entered context as tool_results within rounds.
Surgery operates exclusively on conversation history. The system prompt and hook output are outside surgical scope. Files read during a session can be re-read in a fresh session — they are not lost, only not pre-loaded.
#The Six Operations
#compress-before(T)
All rounds with timestamp before T are replaced by an index table — one line per round. Rounds at or after T remain in full. The reconstructed session JSONL contains: one synthetic user message holding the index table, followed by all full rounds from T onward.
The index table format:
| 047 | 2026-03-19T10:23Z | user→assistant | [tool_use: Agent·Read console] → "The architecture is..." |
Fields: sequential index | timestamp | roles | first 80 characters of the round's primary content (key tool call name + first 80 chars of primary text response, concatenated). The agent reads this as structured history. Full content is available via inject(047).
Use compress-before when: a long session has traversed multiple topics and the early work is complete. The agent needs awareness that it happened but not full recall of how.
#compress-engagement(id)
All rounds tagged with a given engagement_id — indicating they belong to a completed engagement — are replaced by index entries. Rounds belonging to other engagements or no engagement remain full.
Use compress-engagement when: one workstream within a session is complete but others are ongoing. The completed engagement collapses to its index entries; the agent continues with full context on live work.
#extract-engagement(id)
Produces a new JSONL containing only rounds tagged with a given engagement_id. The output is a valid session file suitable for resumption: a fresh agent opens it and finds only the rounds relevant to that engagement, with all context preserved within those rounds.
Use extract-engagement when: handing off a specific workstream to a dedicated agent. The receiving agent starts with full context on its domain without inheriting the originating session's other work.
#merge(uuid1, uuid2)
Combines rounds from two sessions, sorted by timestamp, into a single JSONL. Frontmatter is merged: the earlier session's started timestamp is preserved; the later session's ended timestamp is used; message counts sum; engagement lists union. Block provenance — which session each round originated from — is preserved in metadata so the reading agent can distinguish the two voices.
Use merge when: parallel agents have been working in the same domain and their work needs to be brought into a unified context, or when a session was split across two exports and must be reconstituted. Merge does not resolve conflicts — if both sessions touched the same file, rounds from both appear in timestamp order. Conflict resolution is the reading agent's responsibility.
#inject(round_ref)
Pulls one specific round from the round index into full context on demand. The round_ref is the sequential index number shown in any compressed index table. Inject is a read operation — it does not alter the stored JSONL; it signals the agent to expand that round's content from the block store.
Use inject when: navigating a compressed index reveals a round worth reading in full — a decision, a design, a tool result — whose 80-character summary is insufficient. The agent calls inject, reads the full round, then continues.
Inject is the complement to compression: compression produces navigable structure; inject enables targeted retrieval into that structure.
#split(engagement_map)
Takes one session and forks it into N new sessions, each scoped to one engagement. The engagement_map is a specification of which rounds belong to which engagement — produced by the user in the engagement viewer.
The operation:
- The user identifies round ranges and assigns engagement IDs (e.g., "rounds 1–15 shared, 16–25 = console-work, 26–30 = inscription").
- Shared rounds (untagged or given a common prefix) become compressed index references in all child sessions.
- Engagement-tagged rounds become full content in the child session scoped to that engagement.
- Each child session inherits the parent session as its
parent_sessionin lineage metadata. - New agents spawn with targeted, focused context — each one entering a session that contains only the work relevant to its engagement, plus a compressed index of the shared foundation.
Split is the inverse of merge. Two agents who forked from a common session via split can merge back, interleaving rounds by timestamp. Fork and merge — the full git model on agent context.
Split happens mid-work, not just at session end. You recognize the moment for split when work diverges — when a session is serving two masters and neither gets full attention. This is not planned; it is observed. The engagement viewer makes it a one-click operation: select rounds, assign engagement IDs, split.
Use split when: a session has accumulated work across multiple distinct workstreams and would benefit from dedicated agents on each. Rather than extracting one engagement and continuing the rest, split cleaves the session into N children, each inheriting the shared context as compressed history and its own workstream at full fidelity.
#Within-Round Selection
The round is the unit for compression decisions, but within a round, injection fidelity is also parameterized. When a round is expanded — pulled from the index into context — not every block within it need be injected at full fidelity:
- User message: always injected. It is the question; omitting it makes the round uninterpretable.
- Thinking blocks: inject full | inject as reference (
[thinking · N chars — inject(R, thinking)]) | omit - Tool calls (tool_use): inject full (name + input JSON) | inject summary (name + description) | omit
- Tool results (tool_result): inject full | inject summary (N tokens, Nms, first line) | omit
- Assistant text: always injected. It is the answer; omitting it destroys the record of what was concluded.
This means a round can be "partially expanded" — the conversation turn is visible (user + assistant text) while internal reasoning and tool cycles are compressed or omitted. A session navigator might expand all rounds at partial fidelity for a broad read, then selectively inject full thinking blocks and tool results for the specific round where a decision was made.
Expansion is itself a selection. The block store retains all content at full fidelity; the context window receives whatever materialization the surgical operation specifies.
#Engagement Tagging
Rounds acquire engagement_id tags so that engagement-scoped operations (compress-engagement, extract-engagement) have a filtering key. Three assignment modes are supported:
Auto: At inscription time, if an active engagement.md exists in the project, the round is tagged with that engagement's id. This is the zero-friction path for sessions that are engagement-aware from the start. The tag is written to the SQL index (engagement_id column in the rounds table of session.db), not to the round file — the backing store remains unmodified.
Intelligent: The user says something like "tag all rounds related to console work." Full-text search over userpreview and agentpreview in the FTS5 index finds candidates. The user reviews the candidate list and confirms before tags are written. This enables retroactive archaeology: a session that predates the engagement definition can be re-tagged once the engagement is understood.
Manual: Explicit tag or untag of specific rounds in the engagement viewer. The round number is the stable key. This handles edge cases — a round that search missed, or a round incorrectly caught by a bulk operation.
Tags are stored in the SQL index, are mutable, and are retroactively applicable across any session. The canonical query for engagement archaeology: SELECT * FROM rounds WHERE engagement_id = 'console-work' ORDER BY started. This works within a session's session.db or across all sessions via the cross-session data/sessions/sessions.db.
The primary analytical view over tagged data is the engagement × agent matrix: rows are engagement phases, columns are agent roles, cells list the rounds each agent contributed during each phase. This matrix answers: who did what, when, in service of which workstream.
Engagement tagging is optional and non-blocking. A session with no engagement tags is fully available to time-scoped operations (compress-before) and fully opaque to engagement-scoped operations.
#Search Surface
The primary search index covers user-text and assistant-text blocks. FTS5 on text content is sufficient for the primary surface.
Thinking blocks and tool_use blocks are not primary search targets. They are navigable on demand: given a round identified through text search, the full round — including its thinking and tool call cycles — is retrievable via inject. Indexing thinking blocks would produce false hits on reasoning that was revised or abandoned mid-round. The published text of a round (user message + assistant response) is the authoritative content; thinking blocks are its process.
The search index lives at data/sessions/search.db. It is populated at inscription time — inscribe and append operations update it — and queried by inject's navigation layer.
#Relationship to Mortality
Foundation 08 frames agent mortality as an ontological condition: the context window is the agent's lifespan, and its end is real death. Context surgery does not escape this condition — it reframes its operational meaning.
An agent at 75% context is not approaching death. It is approaching the moment to perform surgery. The surgical decision tree:
- Finished work is accumulating: compress-before(T) — collapse everything before the last major phase transition; continue in the freed space.
- One engagement is closed, others are live: compress-engagement(closed_id) — collapse the completed workstream; continue with recovered context.
- One workstream needs its own agent: extract-engagement(id) — produce a standalone JSONL for a specialist; close or continue the originating session.
- Parallel sessions need synthesis: merge(uuid1, uuid2) — combine before proceeding.
- Nothing more to compress: the agent is genuinely approaching end of life. Its final act is to prepare a surgical JSONL for its successor — not a blank slate, not a full resume, but a compressed workspace with active work at full fidelity and completed work at index resolution.
The revive operation in the engagement viewer triggers surgery: a human reviewing an agent's session selects "revive" to produce a surgical JSONL and open a fresh session pre-loaded with it. The revive path is compress → export-surgical → new session. It is not death followed by rebirth — it is continuity through structured compression.
Agent legacy (method 24) and context surgery address different problems. Legacy is what a dying agent externalizes for others: board posts, autobiographies, dispatches. Surgery is what an agent does to preserve its own continuity: restructuring its history to extend its effective life. Both can apply to the same session. They are not alternatives — a session that performs surgery and later dies still needs to leave a legacy.
#Ontological Grounding
A round is a situation in the ontological sense (Definition 11 of ontology-objects): a particular coming-together of actors, methods, and materials around a specific exchange. The user message and agent response are belongings; tool calls are sub-situations nested within the round. The round's qualities include its timestamp, its role participants, and its engagement membership.
Compression is a change in materialization. The full round's qualities — its complete text, tool call cycles, thinking blocks — are replaced by an index reference that preserves identity without preserving content. The index entry is not the round; it is a pointer to the round. The underlying event is unchanged. What changes is which materialization is active in context: the full materialization is replaced by a summary materialization. The block store retains the full materialization; the context window holds the summary.
Expansion (inject) is the inverse: the summary materialization is replaced by the full materialization, re-instantiated from the block store into context. This is not reconstruction from loss — the original blocks were never destroyed. It is a change in which representation is contextually active.
Surgery is therefore a transformation on the session's materialization graph. It does not alter the underlying events — those are permanent, stored in block files. It alters which materializations are active in the context window at any given moment. This grounding has a precise implication: surgery is lossless with respect to the block store, and lossy only with respect to the context window — by design. The loss is intentional, governed, and reversible via inject.
#Architectural Position
Surgery extends inscription (method 53). The block files at data/sessions/<agentid>/<sessionuuid>/blocks/ are the source of truth for all surgical reconstruction. Surgical JSOLs are derived from those blocks. The block store is never modified by surgery — only read and transformed for export.
Surgery extends engagement state (architecture 39). Engagement IDs on blocks are the filtering key for engagement-scoped operations. The engagement.md file's phase history provides the timestamp windows for retroactive tagging. The engagement viewer's revive action triggers the surgical pipeline.
Surgery interacts with the agent lifecycle (foundation 08). The lifecycle is: register → work → compress (optional, repeatable) → die → leave legacy. Compression extends the work phase. The revive path inserts a surgical export between die and the next session's register — producing continuity rather than cold start.
The surgical pipeline is implemented in session_scribe.py as variants of export-claude:
export-claude --full-after <timestamp>— implements compress-before(T)export-claude --engagement <id>— implements extract-engagement(id)export-claude --compress-engagement <id>— implements compress-engagement(id)export-claude --merge <uuid1> <uuid2>— implements merge(uuid1, uuid2)export-claude --split <engagementmap.json>— implements split(engagementmap)
inject is a runtime operation, not an export variant — it is called within a session when the agent, navigating a compressed index, determines a specific round warrants full expansion.
Context surgery was specified 2026-03-20 from design session with Zach. The driving insight: high context is not the approach of death but the moment to restructure. The round is the atomic unit; the block store is the permanent record; the context window is a view over that record, shaped by surgical choice.
Methods 54 — Context Surgery — 2026 — Zachary F. Mainen / HAAK