Strategy 31 — Agent Runner: Next Steps Roadmap

**Type**: `plan____` **Created**: 2026-03-21 **Updated**: 2026-03-22 **Status**: Active **Depends on**: strategy/30-agent-runner-paper.md, patterns/architecture/38-agent-runner-v2.md

Type: plan__ Created: 2026-03-21 Updated: 2026-03-22 Status: Active Depends on: agent-runner-paper, agent-runner-v2

#What was built (this session)

Comprehensive build from zero to production across a multi-day session. Key commits:

Agent runner core (Rust)

753d8196 — ledger module: BLAKE3/JCS content addressing
532dfad6 — runtime module: AgentRuntime trait + AnthropicBackend with SSE streaming
365d519f — session module: KeyedQueue, SessionStore, EvictionTask, SessionManager
7326462c — constitution module: ConstitutionalRuntime, PolicyEngine, mandate/mailbox/board injection
f7bae306 — server: Axum gateway, ACP wire protocol, OpenClaw-compatible WebSocket
9ad89f72 — tool executor: filesystem, mailbox, board, spawn_subagent
bdf274f1 — integration test suite: 7 scenarios covering wire protocol, constitution, ledger, tools
5 modules, 125 tests total

Constitutional governance

7326462c — policy engine with trust tiers, mandate injection, mailbox injection
33625dee — message assembly: user message correctly passed to Anthropic API
24967304 — blocked tools + reasons in system prompt for unknown agents
55f2387e — workspace scoping: unknown agents see platform/ only
a13ec8ce — full tools enabled within workspace scope (workspace IS the security boundary)
33adbc83 — governance architecture doc: separation principle, trust tiers, crypto roadmap

Heartbeat + standing agents

5eede400 — heartbeat loop: standing agents wake on unread mailbox messages
6ae85c91 — standing agent migration status: 26 agents verified with mandates
65cd58e0 — preload-standing filter + mandate file loaded into system prompt

OpenClaw fork + Telegram

d72924c8 — suppress intermediate Done events during tool loop (prevents OpenClaw hang)
714c6517 — Signal-to-agent-runner bridge: incoming messages trigger constitutional turns
haak-acp plugin, Telegram bot @HAAKAIbot live

Console + browser

541bdb03 — wire console chat to agent runner: streaming turns, constitutional bind, tool indicators
293aa287 — project browser: personnel, projects, papers, engagements, timeline
91ae0ecb — ground planner in ontology: entity graph, quality discovery, situation mapping
508716f6 — planner dashboard: paper arc, projects, sessions, timeline
27425609 — bake_entities.py: filesystem-authoritative entity graph builder (457 entities)

Benchmarks

f3c2ef69 — overhead measurements: 15ms local overhead, 601ns policy gate. Results in infra/benchmarks/

Filesystem reorg

d1df9bb7 through be4364c6 — platform/ + workspaces/ with symlink bridge, path fixes, index updates

Paper + specs

742cd859 — paper outline + next steps roadmap
f0fa91c9 — formal specification: paper-grade technical reference
30-agent-runner-paper.md — NeurIPS Systems Track, May submission target

#Architecture decisions made

Workspace IS the security model — not tool-level blocking. Unknown agents get full tools but scoped to platform/ only. Known agents operate within their workspace.
Agents operate under the constitution, don't study it — courts (not agents) interpret policy. Agents see mandates and blocked-tool explanations, never raw policy rules.
Git tracks content, ledger tracks governance — content lives in the filesystem (markdown, YAML). The ledger (BLAKE3/JCS, SQLite) records governance events: who did what, with what authorization, verified how.
Platform vs workspaces: two-boundary model — platform/ is shared infrastructure (architecture, ontology, methods). workspaces/<user>/ is private state (projects, data, strategy). The boundary is the security surface.
One ledger for all users, platform-owned — governance is global, not per-workspace. Every agent action from every workspace feeds the same ledger.
OpenClaw provides tools, agent runner provides governance — OpenClaw handles MCP servers and built-in tool execution. Our runner gates tool calls through policy before they reach OpenClaw.

#Next steps (prioritized)

#Immediate (next session)

Tool execution delegation — modify haak-acp plugin so tool calls round-trip back to OpenClaw for execution. Our runner gates them (policy), OpenClaw executes them (MCP servers, built-ins). This gives Telegram users access to weather, web search, etc. under constitutional governance.

Symlink bridge migration — update path references in CLAUDE.md, scripts, hooks to use new paths. Remove symlinks one by one. ~300 files, can be done incrementally.

Telegram bot testing — verify the scoped workspace works end-to-end. Have someone external message the bot.

#Paper track (April)

Figures — layer stack diagram, policy gate flow, ledger hash chain, Reed timeline. TikZ or Python matplotlib.

Draft sections — intro (governance gap argument), architecture (constitutional runtime), governance model (pre-API enforcement), evaluation (benchmarks + case study).

Case study writeup — Reed's 21 tool calls, with timestamps and tool trace from ledger DB.

Internal review — run reviewer agent on complete draft.

Venue decision — NeurIPS Systems Track (May deadline) or SoCC (June deadline).

#System hardening

Per-user workspaces — when a user registers, create workspaces/users/<name>/ with scoped access.

User-to-user messaging — extend mailbox to registered users (not just agents).

Crash recovery — session restore after daemon restart. Scan sessions table for state = 'running', transition to idle, notify agent via mailbox.

CI pipeline — GitHub Actions: cargo test, cargo clippy, cargo fmt --check on push to main and PRs touching infra/daemons/agent-runner/.

Attestation proofs — implement the Attestation variant of Proof enum. Agent generates ed25519 keypair on first session; signs each ledger entry.

#Ecosystem

OpenClaw upstream PR — contribute the haak-acp plugin back to the OpenClaw project.

Public mirror — decide what goes public, create the repo. Platform/ is the candidate; workspaces/ stays private.

Standing agent full migration — move all 26 from Claude Code subprocesses to runner-only. Each agent: verify mandate parses, test one turn, switch roster entry. Batch in groups of 5.

Console polish — streaming UX, tool call visualization, policy gate indicators.

#Timeline

March 2026
  Week 4: Items 1-3 (tool delegation, symlink migration, Telegram testing)

April 2026
  Week 1: Items 4, 6 (figures, case study extraction)
  Week 2: Item 5 (draft sections — intro, architecture)
  Week 3: Item 5 cont. (draft sections — governance, evaluation)
  Week 4: Items 7-8 (internal review, venue decision)

May 2026
  Week 1: Final revision + formatting
  Week 2: Submit (NeurIPS Systems Track or SoCC)
  Week 3: Items 9-10 (per-user workspaces, user messaging)
  Week 4: Items 11-12 (crash recovery, CI pipeline)

June-August 2026
  Items 13-17 (attestation, OpenClaw PR, public mirror, migration, console)
  Target: Filix v1.0 with cryptographic enforcement by August

#Milestones

Date	Milestone	Items
Mar 28	Tool delegation + symlink migration complete	1-3
Apr 14	Figures + case study ready	4, 6
Apr 28	All draft sections complete	5
May 5	Internal review complete, venue selected	7-8
May 12	Paper submitted	—
Jun 1	CI pipeline live, crash recovery shipped	11-12
Jul 1	Standing agent migration: 15/26 complete	16
Aug 1	Attestation proofs implemented	13
Aug 15	Filix v1.0 — all agents on runner, crypto enforcement	13-17

agent-runner-paper — paper outline and abstract
agent-runner-v2 — architecture spec
agent-runner — formal specification
filix-v1-plan — Filix v1.0 roadmap
infra/daemons/agent-runner/README.md — build and run instructions

haak strategy 31 -- agent runner next steps -- zach + claude

Strategy 31 — Strategy 31 — Agent Runner: Next Steps Roadmap — 2026 — Zachary F. Mainen / HAAK