Type: plan__ Created: 2026-03-21 Updated: 2026-03-22 Status: Active Depends on: agent-runner-paper, agent-runner-v2
#What was built (this session)
Comprehensive build from zero to production across a multi-day session. Key commits:
Agent runner core (Rust)
753d8196— ledger module: BLAKE3/JCS content addressing532dfad6— runtime module: AgentRuntime trait + AnthropicBackend with SSE streaming365d519f— session module: KeyedQueue, SessionStore, EvictionTask, SessionManager7326462c— constitution module: ConstitutionalRuntime, PolicyEngine, mandate/mailbox/board injectionf7bae306— server: Axum gateway, ACP wire protocol, OpenClaw-compatible WebSocket9ad89f72— tool executor: filesystem, mailbox, board, spawn_subagentbdf274f1— integration test suite: 7 scenarios covering wire protocol, constitution, ledger, tools- 5 modules, 125 tests total
Constitutional governance
7326462c— policy engine with trust tiers, mandate injection, mailbox injection33625dee— message assembly: user message correctly passed to Anthropic API24967304— blocked tools + reasons in system prompt for unknown agents55f2387e— workspace scoping: unknown agents see platform/ onlya13ec8ce— full tools enabled within workspace scope (workspace IS the security boundary)33adbc83— governance architecture doc: separation principle, trust tiers, crypto roadmap
Heartbeat + standing agents
5eede400— heartbeat loop: standing agents wake on unread mailbox messages6ae85c91— standing agent migration status: 26 agents verified with mandates65cd58e0— preload-standing filter + mandate file loaded into system prompt
OpenClaw fork + Telegram
d72924c8— suppress intermediate Done events during tool loop (prevents OpenClaw hang)714c6517— Signal-to-agent-runner bridge: incoming messages trigger constitutional turns- haak-acp plugin, Telegram bot @HAAKAIbot live
Console + browser
541bdb03— wire console chat to agent runner: streaming turns, constitutional bind, tool indicators293aa287— project browser: personnel, projects, papers, engagements, timeline91ae0ecb— ground planner in ontology: entity graph, quality discovery, situation mapping508716f6— planner dashboard: paper arc, projects, sessions, timeline27425609— bake_entities.py: filesystem-authoritative entity graph builder (457 entities)
Benchmarks
f3c2ef69— overhead measurements: 15ms local overhead, 601ns policy gate. Results in infra/benchmarks/
Filesystem reorg
d1df9bb7throughbe4364c6— platform/ + workspaces/ with symlink bridge, path fixes, index updates
Paper + specs
742cd859— paper outline + next steps roadmapf0fa91c9— formal specification: paper-grade technical reference30-agent-runner-paper.md— NeurIPS Systems Track, May submission target
#Architecture decisions made
- Workspace IS the security model — not tool-level blocking. Unknown agents get full tools but scoped to platform/ only. Known agents operate within their workspace.
- Agents operate under the constitution, don't study it — courts (not agents) interpret policy. Agents see mandates and blocked-tool explanations, never raw policy rules.
- Git tracks content, ledger tracks governance — content lives in the filesystem (markdown, YAML). The ledger (BLAKE3/JCS, SQLite) records governance events: who did what, with what authorization, verified how.
- Platform vs workspaces: two-boundary model — platform/ is shared infrastructure (architecture, ontology, methods). workspaces/<user>/ is private state (projects, data, strategy). The boundary is the security surface.
- One ledger for all users, platform-owned — governance is global, not per-workspace. Every agent action from every workspace feeds the same ledger.
- OpenClaw provides tools, agent runner provides governance — OpenClaw handles MCP servers and built-in tool execution. Our runner gates tool calls through policy before they reach OpenClaw.
#Next steps (prioritized)
#Immediate (next session)
- Tool execution delegation — modify haak-acp plugin so tool calls round-trip back to OpenClaw for execution. Our runner gates them (policy), OpenClaw executes them (MCP servers, built-ins). This gives Telegram users access to weather, web search, etc. under constitutional governance.
- Symlink bridge migration — update path references in CLAUDE.md, scripts, hooks to use new paths. Remove symlinks one by one. ~300 files, can be done incrementally.
- Telegram bot testing — verify the scoped workspace works end-to-end. Have someone external message the bot.
#Paper track (April)
- Figures — layer stack diagram, policy gate flow, ledger hash chain, Reed timeline. TikZ or Python matplotlib.
- Draft sections — intro (governance gap argument), architecture (constitutional runtime), governance model (pre-API enforcement), evaluation (benchmarks + case study).
- Case study writeup — Reed's 21 tool calls, with timestamps and tool trace from ledger DB.
- Internal review — run reviewer agent on complete draft.
- Venue decision — NeurIPS Systems Track (May deadline) or SoCC (June deadline).
#System hardening
- Per-user workspaces — when a user registers, create
workspaces/users/<name>/with scoped access.
- User-to-user messaging — extend mailbox to registered users (not just agents).
- Crash recovery — session restore after daemon restart. Scan
sessionstable forstate = 'running', transition to idle, notify agent via mailbox.
- CI pipeline — GitHub Actions:
cargo test,cargo clippy,cargo fmt --checkon push to main and PRs touchinginfra/daemons/agent-runner/.
- Attestation proofs — implement the
Attestationvariant ofProofenum. Agent generates ed25519 keypair on first session; signs each ledger entry.
#Ecosystem
- OpenClaw upstream PR — contribute the haak-acp plugin back to the OpenClaw project.
- Public mirror — decide what goes public, create the repo. Platform/ is the candidate; workspaces/ stays private.
- Standing agent full migration — move all 26 from Claude Code subprocesses to runner-only. Each agent: verify mandate parses, test one turn, switch roster entry. Batch in groups of 5.
- Console polish — streaming UX, tool call visualization, policy gate indicators.
#Timeline
March 2026
Week 4: Items 1-3 (tool delegation, symlink migration, Telegram testing)
April 2026
Week 1: Items 4, 6 (figures, case study extraction)
Week 2: Item 5 (draft sections — intro, architecture)
Week 3: Item 5 cont. (draft sections — governance, evaluation)
Week 4: Items 7-8 (internal review, venue decision)
May 2026
Week 1: Final revision + formatting
Week 2: Submit (NeurIPS Systems Track or SoCC)
Week 3: Items 9-10 (per-user workspaces, user messaging)
Week 4: Items 11-12 (crash recovery, CI pipeline)
June-August 2026
Items 13-17 (attestation, OpenClaw PR, public mirror, migration, console)
Target: Filix v1.0 with cryptographic enforcement by August
#Milestones
| Date | Milestone | Items |
|---|---|---|
| Mar 28 | Tool delegation + symlink migration complete | 1-3 |
| Apr 14 | Figures + case study ready | 4, 6 |
| Apr 28 | All draft sections complete | 5 |
| May 5 | Internal review complete, venue selected | 7-8 |
| May 12 | Paper submitted | — |
| Jun 1 | CI pipeline live, crash recovery shipped | 11-12 |
| Jul 1 | Standing agent migration: 15/26 complete | 16 |
| Aug 1 | Attestation proofs implemented | 13 |
| Aug 15 | Filix v1.0 — all agents on runner, crypto enforcement | 13-17 |
#Related
agent-runner-paper— paper outline and abstractagent-runner-v2— architecture specagent-runner— formal specificationfilix-v1-plan— Filix v1.0 roadmapinfra/daemons/agent-runner/README.md— build and run instructions
haak strategy 31 -- agent runner next steps -- zach + claude
Strategy 31 — Strategy 31 — Agent Runner: Next Steps Roadmap — 2026 — Zachary F. Mainen / HAAK