Strategy 30 — Agent Runner Paper: Constitutional Governance of AI Agents

**Type**: `plan____` **Created**: 2026-03-21 **Status**: Draft **Target**: Systems or AI conference (OSDI, NeurIPS Systems Track, AAMAS, or SoCC) **Submission window**: May-June 2026

Type: plan__ Created: 2026-03-21 Status: Draft Target: Systems or AI conference (OSDI, NeurIPS Systems Track, AAMAS, or SoCC) Submission window: May-June 2026

#Title candidates

  1. Constitutional Governance of AI Agents: Pre-API Policy Enforcement via Content-Addressed Ledger
  2. Binding the Model: A Runtime Architecture for Governed Multi-Agent Systems
  3. Before the Call: Constitutional Policy Gates for Autonomous AI Agents
  4. The Constitutional Layer: Governance-by-Construction in Multi-Agent Runtimes

Preferred: (1) for systems venues (emphasizes mechanism), (3) for AI venues (emphasizes novelty of pre-API enforcement).

#Abstract (draft)

Multi-agent AI systems deployed in production lack governance guarantees. Existing frameworks — LangChain, CrewAI, AutoGen — treat safety as output filtering: the model sees all tools, generates unconstrained actions, and a post-hoc layer decides whether to execute. This architecture is fundamentally weaker than preventing unsafe actions from entering the model's action space. We present the HAAK Agent Runner, a production Rust runtime (8,200 lines, 113 tests) that constitutionally governs AI agents before their API calls. Policy enforcement operates at the tool-definition level: blocked tools are removed from the model's tool list before each turn, so the model never generates calls to tools it cannot use. A three-layer governance model — actors (governed), constitution (governing), courts (verifying) — separates execution from enforcement from adjudication, with workspace scoping replacing tool-level blocking as the primary security boundary. Every governance event produces a content-addressed ledger entry (BLAKE3 over JCS-canonical JSON, RFC 8785) with hash-chained parent references, creating a tamper-evident audit trail with a typed upgrade path from null proofs through signed attestations to ZK-STARK verification. The system implements OpenClaw's Agent Client Protocol (ACP) wire interface, governs 27 standing agents in production, and adds measurable overhead of [X] ms per constitutional bind. We evaluate policy gate correctness, runtime overhead, and present a case study of an autonomous infrastructure agent executing 21 tool calls under constitutional governance.

(198 words — refine after benchmarks fill in the overhead number.)

#Paper structure

#1. Introduction (2 pages)

The governance gap in multi-agent AI. Production systems run agents with broad tool access — shell, filesystem, network, messaging — but governance is afterthought. The standard pattern: generate action, check action, maybe block. This is post-hoc filtering, analogous to application-layer firewalls that inspect traffic after it has been constructed. We argue for a stronger primitive: constitutional bind, where the policy layer shapes the model's action space before generation. The model cannot propose what it cannot see.

Contributions:

  • A formal definition of pre-API policy enforcement and proof that it is strictly stronger than post-hoc filtering (Section 4)
  • A three-layer governance model (actors, constitution, courts) with workspace scoping as the primary security boundary (Section 5)
  • A production runtime implementing constitutional governance over the ACP wire protocol (Section 3-4)
  • A content-addressed audit ledger with hash-chained entries and cryptographic upgrade path (Section 6)
  • Evaluation on 27 standing agents including an autonomous infrastructure case study (Section 7)

Agent frameworks. LangChain (tool chains, no governance), CrewAI (role-based, no enforcement), AutoGen (multi-agent, conversation-level safety), Claude Code (built-in permission model, not extensible).

Agent protocols. OpenClaw's ACP: session management, turn execution, tool definition. MCP (Model Context Protocol): tool servers, no governance. Neither defines a policy layer.

Governance approaches. Constitutional AI (Bai et al. 2022) — training-time alignment, not runtime enforcement. Guardrails AI — output validation. NeMo Guardrails — programmable rails on input/output. All post-hoc. None shape the action space.

Content-addressed storage. IPFS, Git, CAS in distributed systems. BLAKE3 for speed. JCS (RFC 8785) for deterministic JSON canonicalization.

#3. Architecture (2 pages)

The five-layer stack:

  1. Gateway — Axum WebSocket server implementing ACP JSON-RPC 2.0. Accepts OpenClaw and native agents on the same port.
  2. SessionManager — per-session serial queues, three session modes (persistent/domain/oneshot), TTL eviction, SQLite session store with BLAKE3-derived IDs.
  3. ConstitutionalRuntime\<R\> — generic decorator over any AgentRuntime. The governance layer is invisible to both the session manager above and the backend below.
  4. AgentRuntime backends — AnthropicBackend (reqwest + SSE, streaming, reasoning blocks), ClaudeCodeBackend (subprocess).
  5. Ledger — content-addressed entries with BLAKE3 CIDs over JCS-canonical payloads, hash-chained via parent references.

Key architectural property: the ConstitutionalRuntime<R> implements the same AgentRuntime trait as the backends. This means governance is compositionally transparent — it can wrap any backend, and the session manager requires no knowledge of whether governance is present.

#4. Constitutional Runtime (3 pages) — core contribution

The per-turn governance sequence:

  1. Mandate injection — agent's constitutional mandate (role, constraints, allowed tools) injected as system prompt preamble. The mandate is the agent's identity: what it may do, what it must do, what it must not do.
  1. Mailbox injection — unread inter-agent messages prepended to turn input. Each injection produces a MailboxInject ledger entry. This is how standing agents coordinate without direct communication.
  1. Tool gating (pre-API) — the policy engine evaluates each tool against the agent's mandate and the constitution. Blocked tools are removed from the tool list sent to the model API. The model's completion is conditioned on a tool list that excludes forbidden actions. This is the core claim: the model cannot call Bash if Bash is not in its tool definition.
  1. Tool call interception (runtime) — even with pre-API gating, a tool call's arguments may violate policy (e.g., Write to a forbidden path). The runtime re-evaluates at call time. Blocked calls produce a PolicyGate ledger entry and inject an error tool result. The model sees the block and adapts.
  1. Ledger append — every event (tool call, tool result, policy verdict, session lifecycle) becomes a LedgerEntry with BLAKE3 CID, parent references, and null-but-typed proof and envelope fields.

Formal claim. Pre-API enforcement is strictly stronger than post-hoc filtering. Proof sketch: post-hoc filtering allows the model to generate the forbidden action, which may leak information (the model has "thought about" the action, and its subsequent text may be influenced by having considered it). Pre-API enforcement prevents the forbidden action from entering the model's generative process. The model's output distribution is conditioned on a tool set that never included the blocked tool.

#5. Governance Model (2 pages)

The governance architecture instantiates a separation principle: agents operate under the constitution, not with it. Three layers with distinct relationships to the same normative order.

5.1 The separation principle. Actors (agents, users) do work and are governed. The constitution defines what is permitted. Courts (auditor, owner, hash chain) verify compliance post-hoc. The temporal separation is essential: governance happens in real time (the policy gate fires during the turn); adjudication happens after the fact (the auditor reads the ledger entry the gate produced). Agents never read the ledger during normal operation — the information asymmetry IS the enforcement mechanism.

5.2 Trust tiers and workspace scoping. Three tiers (unknown, registered, standing) map to workspace roots rather than tool-level permissions. Unknown users are scoped to platform/ (public, read-only). Registered users see their own workspace plus platform. Standing agents see the full repository. The tool executor's safe_resolve enforces boundaries at the filesystem level — path canonicalization rejects traversal before any tool executes. This shifts security from "you cannot read" to "you can only read what is yours."

5.3 The two-boundary model. Platform (the system — runtime, constitution, ontology, ledger) vs. workspaces (the content — projects, data, strategy). Platform is publishable; workspaces are private by default. The ledger lives in the platform because governance records are public — a secret constitution is not a constitution; a private audit log provides no evidence.

5.4 Why agents don't read the ledger. Three reasons: (1) information asymmetry prevents adversarial strategies based on governance knowledge; (2) separation of concerns — agents work, courts verify; (3) context economy — ledger metadata wastes tokens the agent cannot act on.

5.5 Roles mapped to the ontology. Every governance concept maps to the relational situational ontology: agents are actors (Def. 2), the constitution is a policy of maximal scope (Def. 14), sessions are situations (Def. 11), ledger entries are belongings (Def. R1) with qualities (Def. R2). The architecture was built from the ontology, not rationalized into it after the fact.

5.6 The crypto roadmap. v0 (current): null proofs, hash-chain tamper evidence. v1.0 (Aug 2026): attestation proofs with Ed25519 signatures — third parties verify "this runner, running this constitution, made this decision." v1.5: audience-keyed encryption via KDF-derived envelope keys. v2.0: ZK-STARK proofs — prove compliance without revealing policy rules or agent input. All proof and envelope fields are present and typed from v0; the schema carries zero migration debt.

#6. Content-Addressed Ledger (1.5 pages)

Every ledger entry:

  • CID: BLAKE3 hash of JCS-canonical JSON representation
  • Parents: CIDs of causally prior entries (previous turn, tool call that produced this result)
  • Quality: enum discriminator (Turn, ToolCall, ToolResult, PolicyVerdict, MailboxInject, SessionLifecycle)
  • Proof: null in v0, attestation in v1.0, ZK-STARK in v2.0
  • Envelope: null in v0, KDF-encrypted in v1.5

The hash chain creates a DAG, not a linear chain — a turn's parents include both the previous turn and any tool calls within it. This DAG structure maps directly to the causal structure of agent execution.

Crypto upgrade path. The proof and envelope fields are present and typed from v0. The schema carries zero migration debt: populating proof with attestation data in v1.0 requires no schema change, no data migration, no event format change.

#7. Evaluation (2 pages)

7.1 Overhead. Benchmark the constitutional bind: measure latency from "tools defined" to "API call sent" with and without the policy gate. Measure per-entry ledger write latency (BLAKE3 + JCS + SQLite insert). Target: <5ms overhead per tool gate, <1ms per ledger write.

7.2 Policy gate correctness. Automated test: for each of N tool definitions, assert that a blocked tool never appears in the tool list sent to the API. Run across all 27 agent mandates. This is a coverage property, not a sampling test — enumerate all (agent, tool) pairs.

7.3 Scale. 27 standing agents with persistent sessions. Heartbeat interval 5 minutes. Session store growth over 30 days. Ledger size vs. turn count.

7.4 Case study: Reed. Reed is a standing infrastructure agent with mandate to maintain services, check health, and recover crashed daemons. In a single autonomous run (triggered by heartbeat), Reed executed 21 tool calls: read service status, diagnosed a crashed daemon, restarted it, verified recovery, and posted a summary to the board. All 21 calls logged to the ledger with hash-chained entries. Present the full trace as a figure.

#8. Discussion (1 page)

Limitations. Pre-API enforcement depends on the model API accepting a tool list. If a model ignores tool definitions and generates arbitrary JSON, the runtime gate is the last defense. Trust bootstrapping: who writes the constitution? Currently a single human operator. Multi-party governance is future work.

The attestation roadmap. v0 has null proofs — the ledger is auditable but not cryptographically verifiable. v1.0 (August 2026) adds agent-signed attestations: each agent holds an ed25519 keypair, and its ledger entries carry signatures. v2.0 adds ZK-STARK proofs for privacy-preserving audit.

Relationship to Filix. The agent runner is the runtime substrate for Filix v1.0, a platform-independent agent governance system. The session store is designed to become a materialized view over the ledger — no migration required.

SystemGovernance modelPre-API?Audit trailProtocol
Claude CodePermission promptsNo (user approves)Local logsProprietary
LangChainNone (opt-in callbacks)NoOptionalCustom
CrewAIRole descriptionsNo (advisory)NoneCustom
AutoGenConversation rulesNoChat historyCustom
NeMo GuardrailsInput/output railsNo (post-hoc)Rails logCustom
Guardrails AIOutput validationNo (post-hoc)Validation logCustom
OpenClaw/ACPSession managementNo (no policy layer)NoneACP
HAAK RunnerConstitutional bindYesBLAKE3 ledgerACP-compatible

#10. Conclusion (0.5 pages)

Pre-API constitutional governance is a stronger primitive than post-hoc filtering, implementable with modest overhead, and practical at the scale of dozens of standing agents. The content-addressed ledger provides a tamper-evident audit trail with a clear upgrade path to cryptographic verification. The system is ACP-compatible and deployable today.

#Key claims requiring evidence

ClaimEvidence neededStatus
Pre-API enforcement is strictly stronger than post-hocFormal argument + empirical: model never generates blocked tool callsDraft argument in Section 4; need automated verification
Content addressing provides tamper-evident auditBLAKE3 hash chain verification testImplemented; need benchmark
Constitutional bind adds <5ms overheadBenchmark harness measuring gate latencyNot yet built
Standing agent autonomy is viableReed case study with 21-tool-call traceHave the trace; need to extract and present
27 agents in productionRoster snapshot + uptime dataRoster exists; need uptime metrics
Workspace scoping is a more natural security model than tool-level blockingArgument + comparison: same tools, different workspaces → different security propertiesArgument drafted in Architecture 41
The ledger records governance, not content — distinct from gitSchema analysis: ledger fields vs git objects; show orthogonalityArgument drafted in Architecture 41

#Figures

  1. Layer stack diagram — the five-layer architecture (already in README, needs polish)
  2. Policy gate flow — tool definition list -> policy engine -> filtered list -> API call; show a concrete example (agent X has mandate Y, tools A,B,C defined, B blocked, API receives A,C)
  3. Ledger hash chain — DAG of ledger entries for a multi-tool turn: sessionopen -> turn1 (parents: [sessionopen]) -> toolcall1 (parents: [turn1]) -> toolresult1 (parents: [toolcall1]) -> turn2 (parents: [turn1, toolresult1])
  4. Reed case study timeline — 21 tool calls with timestamps, tool names, and outcomes; highlight the board post at the end

#Venue analysis

VenueDeadlineFitNotes
NeurIPS 2025 Systems TrackMay 2026 (TBC)StrongSystems for ML; constitutional governance fits
AAMAS 2026Feb 2026 (passed)StrongMulti-agent systems; would need 2027
SoCC 2026June 2026 (TBC)GoodCloud/systems; audit trail angle
OSDI 2026May 2026 (TBC)StretchTop systems; need stronger eval
Workshop: NeurIPS MASECOct 2026FallbackMulti-Agent Safety workshop

Recommendation: Target NeurIPS Systems Track (May deadline) as primary. Prepare SoCC as backup. If eval is not ready by May, submit to a NeurIPS workshop and upgrade to main conference for 2027.


haak strategy 29 -- agent runner paper -- zach + claude

Strategy 30 — Strategy 30 — Agent Runner Paper: Constitutional Governance of AI Agents — 2026 — Zachary F. Mainen / HAAK