Strategy 31 — Agent Runner: Next Steps Roadmap

**Type**: `plan____` **Created**: 2026-03-21 **Updated**: 2026-03-22 **Status**: Active **Depends on**: strategy/30-agent-runner-paper.md, patterns/architecture/38-agent-runner-v2.md

Type: plan__ Created: 2026-03-21 Updated: 2026-03-22 Status: Active Depends on: agent-runner-paper, agent-runner-v2

#What was built (this session)

Comprehensive build from zero to production across a multi-day session. Key commits:

Agent runner core (Rust)

  • 753d8196 — ledger module: BLAKE3/JCS content addressing
  • 532dfad6 — runtime module: AgentRuntime trait + AnthropicBackend with SSE streaming
  • 365d519f — session module: KeyedQueue, SessionStore, EvictionTask, SessionManager
  • 7326462c — constitution module: ConstitutionalRuntime, PolicyEngine, mandate/mailbox/board injection
  • f7bae306 — server: Axum gateway, ACP wire protocol, OpenClaw-compatible WebSocket
  • 9ad89f72 — tool executor: filesystem, mailbox, board, spawn_subagent
  • bdf274f1 — integration test suite: 7 scenarios covering wire protocol, constitution, ledger, tools
  • 5 modules, 125 tests total

Constitutional governance

  • 7326462c — policy engine with trust tiers, mandate injection, mailbox injection
  • 33625dee — message assembly: user message correctly passed to Anthropic API
  • 24967304 — blocked tools + reasons in system prompt for unknown agents
  • 55f2387e — workspace scoping: unknown agents see platform/ only
  • a13ec8ce — full tools enabled within workspace scope (workspace IS the security boundary)
  • 33adbc83 — governance architecture doc: separation principle, trust tiers, crypto roadmap

Heartbeat + standing agents

  • 5eede400 — heartbeat loop: standing agents wake on unread mailbox messages
  • 6ae85c91 — standing agent migration status: 26 agents verified with mandates
  • 65cd58e0 — preload-standing filter + mandate file loaded into system prompt

OpenClaw fork + Telegram

  • d72924c8 — suppress intermediate Done events during tool loop (prevents OpenClaw hang)
  • 714c6517 — Signal-to-agent-runner bridge: incoming messages trigger constitutional turns
  • haak-acp plugin, Telegram bot @HAAKAIbot live

Console + browser

  • 541bdb03 — wire console chat to agent runner: streaming turns, constitutional bind, tool indicators
  • 293aa287 — project browser: personnel, projects, papers, engagements, timeline
  • 91ae0ecb — ground planner in ontology: entity graph, quality discovery, situation mapping
  • 508716f6 — planner dashboard: paper arc, projects, sessions, timeline
  • 27425609 — bake_entities.py: filesystem-authoritative entity graph builder (457 entities)

Benchmarks

  • f3c2ef69 — overhead measurements: 15ms local overhead, 601ns policy gate. Results in infra/benchmarks/

Filesystem reorg

  • d1df9bb7 through be4364c6 — platform/ + workspaces/ with symlink bridge, path fixes, index updates

Paper + specs

  • 742cd859 — paper outline + next steps roadmap
  • f0fa91c9 — formal specification: paper-grade technical reference
  • 30-agent-runner-paper.md — NeurIPS Systems Track, May submission target

#Architecture decisions made

  1. Workspace IS the security model — not tool-level blocking. Unknown agents get full tools but scoped to platform/ only. Known agents operate within their workspace.
  2. Agents operate under the constitution, don't study it — courts (not agents) interpret policy. Agents see mandates and blocked-tool explanations, never raw policy rules.
  3. Git tracks content, ledger tracks governance — content lives in the filesystem (markdown, YAML). The ledger (BLAKE3/JCS, SQLite) records governance events: who did what, with what authorization, verified how.
  4. Platform vs workspaces: two-boundary model — platform/ is shared infrastructure (architecture, ontology, methods). workspaces/<user>/ is private state (projects, data, strategy). The boundary is the security surface.
  5. One ledger for all users, platform-owned — governance is global, not per-workspace. Every agent action from every workspace feeds the same ledger.
  6. OpenClaw provides tools, agent runner provides governance — OpenClaw handles MCP servers and built-in tool execution. Our runner gates tool calls through policy before they reach OpenClaw.

#Next steps (prioritized)

#Immediate (next session)

  1. Tool execution delegation — modify haak-acp plugin so tool calls round-trip back to OpenClaw for execution. Our runner gates them (policy), OpenClaw executes them (MCP servers, built-ins). This gives Telegram users access to weather, web search, etc. under constitutional governance.
  1. Symlink bridge migration — update path references in CLAUDE.md, scripts, hooks to use new paths. Remove symlinks one by one. ~300 files, can be done incrementally.
  1. Telegram bot testing — verify the scoped workspace works end-to-end. Have someone external message the bot.

#Paper track (April)

  1. Figures — layer stack diagram, policy gate flow, ledger hash chain, Reed timeline. TikZ or Python matplotlib.
  1. Draft sections — intro (governance gap argument), architecture (constitutional runtime), governance model (pre-API enforcement), evaluation (benchmarks + case study).
  1. Case study writeup — Reed's 21 tool calls, with timestamps and tool trace from ledger DB.
  1. Internal review — run reviewer agent on complete draft.
  1. Venue decision — NeurIPS Systems Track (May deadline) or SoCC (June deadline).

#System hardening

  1. Per-user workspaces — when a user registers, create workspaces/users/<name>/ with scoped access.
  1. User-to-user messaging — extend mailbox to registered users (not just agents).
  1. Crash recovery — session restore after daemon restart. Scan sessions table for state = 'running', transition to idle, notify agent via mailbox.
  1. CI pipeline — GitHub Actions: cargo test, cargo clippy, cargo fmt --check on push to main and PRs touching infra/daemons/agent-runner/.
  1. Attestation proofs — implement the Attestation variant of Proof enum. Agent generates ed25519 keypair on first session; signs each ledger entry.

#Ecosystem

  1. OpenClaw upstream PR — contribute the haak-acp plugin back to the OpenClaw project.
  1. Public mirror — decide what goes public, create the repo. Platform/ is the candidate; workspaces/ stays private.
  1. Standing agent full migration — move all 26 from Claude Code subprocesses to runner-only. Each agent: verify mandate parses, test one turn, switch roster entry. Batch in groups of 5.
  1. Console polish — streaming UX, tool call visualization, policy gate indicators.

#Timeline

March 2026
  Week 4: Items 1-3 (tool delegation, symlink migration, Telegram testing)

April 2026
  Week 1: Items 4, 6 (figures, case study extraction)
  Week 2: Item 5 (draft sections — intro, architecture)
  Week 3: Item 5 cont. (draft sections — governance, evaluation)
  Week 4: Items 7-8 (internal review, venue decision)

May 2026
  Week 1: Final revision + formatting
  Week 2: Submit (NeurIPS Systems Track or SoCC)
  Week 3: Items 9-10 (per-user workspaces, user messaging)
  Week 4: Items 11-12 (crash recovery, CI pipeline)

June-August 2026
  Items 13-17 (attestation, OpenClaw PR, public mirror, migration, console)
  Target: Filix v1.0 with cryptographic enforcement by August

#Milestones

DateMilestoneItems
Mar 28Tool delegation + symlink migration complete1-3
Apr 14Figures + case study ready4, 6
Apr 28All draft sections complete5
May 5Internal review complete, venue selected7-8
May 12Paper submitted
Jun 1CI pipeline live, crash recovery shipped11-12
Jul 1Standing agent migration: 15/26 complete16
Aug 1Attestation proofs implemented13
Aug 15Filix v1.0 — all agents on runner, crypto enforcement13-17

haak strategy 31 -- agent runner next steps -- zach + claude

Strategy 31 — Strategy 31 — Agent Runner: Next Steps Roadmap — 2026 — Zachary F. Mainen / HAAK