Every step visible. Every decision traceable. Every lesson captured. Externalization — writing intermediate state to disk rather than processing internally — is not a documentation preference but the enabling condition for intervention, evaluation, auditing, and self-improvement. The Library Theorem provides the formal reason: systems that externalize their reasoning to organized memory reason better than systems that don't.
#The principle
When a workflow runs five steps silently and returns a result, the human sees a black box. When each step writes its output to disk, the human sees a process they can steer. This is the difference between a tool and a collaborator.
Externalization enables four capabilities, each depending on the previous:
- Intervention. A human can pause between any two steps, read the intermediate output, and redirect. This requires that the output exists — it must be written, not merely computed.
- Evaluation. After execution, someone can assess not just the final result but the path taken. Which steps were productive? Where did the process go wrong? This requires that the full trace is preserved.
- Auditing. Across many executions, the traces form an audit trail: who decided what, when, and why. This requires consistent format and machine-readable structure — not just notes in a margin, but state files with proposed/approved diffs, timestamps, and decision records.
- Self-improvement. Accumulated audit trails reveal patterns: where humans always intervene (the system is wrong), where they rubber-stamp (reduce friction), what feedback repeats (extract into defaults). This requires that the patterns are accessible — indexed, searchable, comparable.
#Why the Library Theorem makes this non-optional
Without the Library Theorem, externalization is merely good practice — "write things down so you can review them later." With it, externalization is a formal requirement for efficient reasoning.
The theorem shows that a bounded-capacity processor reasoning over external memory achieves O(log N) retrieval when the memory is indexed, vs. Ω(N) when it is not. A system that processes internally — holding everything in its context window without inscribing intermediate state — cannot benefit from this advantage. The reasoning trace never becomes searchable; the intermediate results never become retrievable; the accumulated understanding never becomes indexed.
Externalization is the first step of inscription. You cannot index what has not been written down.
#Auditability
Auditability is externalization applied to decisions. Every AI-assisted workflow decision must be traceable:
- Who proposed what. The agent's recommendation, in its own words.
- What the human accepted or changed. The diff between proposed and approved.
- Why. The human's reasoning, captured at the moment of decision — not reconstructed later.
The mechanism is state files: YAML documents that capture proposed outputs, approved outputs, intervention notes, feedback injected into subsequent steps, and timestamps. They are git-tracked (version history is automatic) and interface-agnostic (the same file works for CLI and Streamlit and any future interface).
State files are not logs. Logs record what happened; state files record what was decided. The distinction matters because decisions have structure — a proposal, a response, a rationale — that flat logging flattens.
#Reflexive improvement
Any agent, after any non-trivial task, asks: did I learn something generalizable? If yes, the lesson is proposed for codification in the appropriate location:
| Lesson type | Codification target |
|---|---|
| Execution pattern | SKILL.md (the skill's documentation) |
| Process pattern | Method definition (patterns/methods/) |
| Constraint | Policy (patterns/policies/) |
| Conceptual position | Foundation (foundations/) |
| Agent behavior | CLAUDE.md or agent definition |
The Detect → Evaluate → Locate → Propose → Codify loop is structural, not optional. Methods with evaluation phases include reflexive checkpoints. The question "what did we learn?" is not afterthought but step.
This applies to all agent types: an editor discovering a review anti-pattern proposes a method update; an architect finding a missing convention proposes a policy update; a reviewer noticing a recurring blind spot proposes a persona refinement. The system improves itself by writing down what it learns — which is, again, inscription.
#The medium is the message: why markdown doesn't accumulate debt
Most software systems accumulate technical debt — dependencies rot, schemas need migration, APIs break between versions, processes that once worked stop working when the infrastructure they depend on changes. HAAK doesn't. The system has been running continuously since its creation, revised constantly, never taken down. This is not luck or discipline. It is a consequence of the medium.
The externalized state is English in markdown files, versioned in git. This choice has three consequences that conventional software stacks don't share:
No brittle dependencies. There is no database that kills the system if it stops. No process that can't be halted. No framework version to upgrade. No API contract between components. The files are the system. If every tool disappeared tomorrow, the files would still be readable and navigable by a human with a text editor.
No technical debt from the medium itself. Code accumulates debt because abstractions leak, interfaces change, and yesterday's clean architecture becomes today's legacy constraint. English doesn't have this problem. A foundation written in February is as readable in December. The only "debt" would be if the ideas became wrong — but that's revision, not migration. You update the argument; you don't port it to a new runtime.
Continuous adaptive operation. The system learns while running. The human revises foundations, the agent updates indices, methods sharpen through practice, conventions emerge from repetition and get codified. There is no deploy cycle, no staging environment, no rollback procedure. The system is always in its current state, and every change is immediately live — because "deploying" a markdown file means writing it.
This is what makes the system forkable (Architecture 02). Because the medium is plain text with conventions, not code with dependencies, anyone can copy the structure and apply it to a different domain. The value isn't in the specific content — it's in the organizational machinery: the index hierarchy, the constitution, the method definitions, the type system. All of it transfers because none of it is coupled to a technology stack.
The system is currently learning from its human operator — every correction, every "that file doesn't belong there," every "we should codify this" is a training signal that gets inscribed into the structure. The trajectory is toward more autonomous learning: the reflexive improvement loop (§3) running with less human intervention as patterns stabilize. But the mechanism is the same at every point on that trajectory: observe, propose, write it down, let it be reviewed.
The one deliberate exception: the constitution. A constitution IS a form of debt — it constrains future action, and changing it has consequences that ripple through policies, methods, and conventions downstream. But it is explicit, visible, revisable debt. The difference between a constitution and technical debt is that technical debt is hidden and discovered by accident, while a constitution is declared and examined on purpose. You know exactly what it costs you, and you chose it.
This is a fundamentally different relationship to maintenance than conventional software. HAAK doesn't fight entropy by freezing interfaces; it absorbs change by keeping everything legible and revisable. The cost of change is the cost of editing English, which is as low as it gets — except where the constitution says otherwise, and there you pay the cost knowingly.
#What is NOT externalized
Not everything needs externalization. The principle applies to workflow steps — discrete, meaningful units of work that a human might want to inspect or redirect. It does not apply to:
- Internal computation within a single step (how the model generates a response)
- Routine file operations (creating a directory, moving a file)
- System bookkeeping (updating an index, running an audit)
The test: if a human would want to pause here, read the output, and potentially redirect, then the step should externalize its output. If not, it runs silently.
#Historical development
This foundation consolidates three v1 documents that were written as separate principles before their connection was understood:
- v1/09_auditability (Feb 22): Every workflow decision traceable via state files
- v1/11_workflow-state-evaluation (Feb 22): Externalizing steps enables everything else
- v1/13_reflexive-improvement (Feb 22): System learns from its own operation
All three became constitutional requirements (§1 Externalization, §3 Reflexive improvement, §5 Auditability). This foundation provides the why behind those requirements: externalization is the enabling condition because it creates the indexed external memory that the Library Theorem says makes reasoning efficient.
#Constitutional implications
This foundation directly grounds three constitutional requirements:
- Externalization (Constitution §1): "Everything visible, recorded, reviewable." The principle stated here.
- Auditability (Constitution §5): "Every workflow decision is traceable." The decision-focused application.
- Reflexive improvement (Constitution §3): "The system learns from its own operation." The self-improvement application.
The constitution states these as requirements; this foundation explains why they are non-negotiable.
haak · foundation · 2026-02-24 · zach + claude
Foundations 05 — Externalization — 2026 — Zachary F. Mainen / HAAK