Indexed retrieval from external memory costs O(log N) per access; sequential retrieval costs Ω(N). This exponential separation compounds with reasoning depth: O(T log N) vs. Θ(T²) over T steps. The result is architectural — it holds for any bounded-capacity processor with a readable/writable external store.
#The formal result
The Library Theorem adapts the I/O complexity framework of Aggarwal & Vitter (1988) to transformer-based reasoning systems. The core model:
- An agent with a fixed-size context window (the "page" in I/O complexity terms)
- An external store of N pages, each containing bounded information
- Read and write operations that transfer one page between the store and the context window per step
- A data processing inequality bounding how much information each step can extract
#Key results
Theorem 0 (Capacity bound). Each read step extracts at most C log₂|Σ| bits from the store, where C is context window size and Σ is the token alphabet.
Theorem 1 (Sequential lower bound). Without indexing structure, retrieving a target item from N pages costs Ω(N) steps — essentially scanning.
Theorem 2 (Indexed upper bound). With a B-tree index of branching factor b = ⌊C/η⌋ (where η is the overhead per pointer), retrieval costs O(log_b N) steps.
Theorem 3 (The Library Theorem). The separation ratio between indexed and sequential retrieval is Ω(b^h / h) where h is tree height — exponential in the depth of the index hierarchy.
Theorem 4 (Reasoning cost). Over T reasoning steps involving both reads and writes to external memory, unindexed reasoning costs Θ(T²) while indexed reasoning costs O(T log_b(N₀ + T)) — the advantage compounds with reasoning depth.
Theorems 5–6 (In development). Recursive indexing: the Library Theorem applies to its own meta-index (a flat global index reintroduces Ω(N) at the organizational level). Separation requirement: mixing index and data degrades retrieval toward sequential.
#The key insight: reasoning is search
The theorem rests on one conceptual move: identifying reasoning with search.
Li & Wang (2025) showed that a transformer's output at each step depends entirely on its context window — the WINDOW = SPACE result. This means finding the correct answer reduces to finding the correct input tokens in the external store. Reasoning is retrieval from organized external memory. Indexing that memory determines retrieval cost, therefore indexing determines reasoning cost.
This is not a metaphor. It is a formal identity within the I/O complexity model. A transformer that reasons over an external store is computationally equivalent to a processor navigating a data structure. The B-tree analysis applies with full rigor.
#What it does and does not say
It says: Architectural organization of external memory provides a formal advantage that cannot be overcome by scaling model size, context length, or compute — the Ω(N) lower bound for sequential access is unconditional.
It does not say: Current LLMs actually exploit this advantage. The theorem characterizes the capacity of the model class (what the best algorithm in the class can achieve), not the performance of trained models. There is likely a large gap between theoretical capacity and empirical performance — mapping that gap is the experimental program's purpose.
It does not say: Larger context windows eliminate the need for external memory. The advantage is a ratio (N / log N) that grows with store size. For any fixed context window, sufficiently large stores make indexing essential.
#Empirical program
Seven experiments designed to validate and extend the theorem:
| # | Test | Prediction | Status |
|---|---|---|---|
| 1 | Structured vs. unstructured scratchpad (P3 core) | Structured outperforms at matched token budget | Critical path |
| 2 | Reasoning depth curves (vary T at fixed N) | T² vs. T log N separation | Designed |
| 3 | Degradation (corrupt index progressively) | Performance collapses toward O(N) | Designed |
| 4 | RAG-for-reasoning (index self-generated reasoning) | Compounding gains from self-indexing | Designed |
| 5 | Constitutional AI as IAR (flat vs. indexed constitution) | Indexed constitution improves alignment | Designed |
| 6 | Scale invariance (same experiments across model sizes) | Smaller models have higher hallucination floor | Designed |
| 7 | AI peer review as applied demo | Indexed review process outperforms sequential | Designed |
Publication targets: NeurIPS 2026 (Theorems 0–4 + experiments 1–3), Nature (full program), perspective piece (conceptual narrative).
#Connection to HAAK
HAAK's file system is a concrete instantiation of the indexed external memory the theorem describes:
| Theorem concept | HAAK instantiation |
|---|---|
| External store | The file system (patterns/, projects/, foundations/) |
| Index | index.md files, frontmatter cross-references, naming conventions |
| B-tree branching | Directory hierarchy — each folder branches to children |
| Read/write operations | Agent reads files for context, writes outputs to disk |
| Degradation | Stale indexes, broken links, orphaned documents |
This is not coincidental. HAAK was designed to instantiate the theorem, and the theorem was formalized by observing what HAAK was already doing. The reflexive loop — system as both instance and test — is deliberate.
#Historical development
- Feb 20 2026: Handwritten "Thinking Needs Writing" proposal names inscription-augmented reasoning
- Feb 21 2026: Library Theorem v1 drafted; 7 HAAK review personas assess it
- Feb 22 2026: Reframed as agent theory (v2); "reasoning is search" insight crystallizes
- Feb 23 2026: v3 with formal I/O complexity framework and Theorems 0–4
- Feb 24 2026: Model-persona comparison matrix (31 reviews across 4 models × 7 personas)
#Constitutional implications
The Library Theorem grounds the constitutional requirement of externalization (§1) with formal precision: it is not merely good practice to write things down; it is the enabling condition for O(log N) reasoning. Systems that process internally — without inscribing intermediate state — are provably limited to O(N) retrieval over their accumulated reasoning.
haak · foundation · 2026-02-24 · zach + claude
Foundations 02 — The Library Theorem — 2026 — Zachary F. Mainen / HAAK