The Library Theorem

**Indexed retrieval from external memory costs O(log N) per access; sequential retrieval costs Ω(N). This exponential separation compounds with reasoning depth: O(T log N) vs. Θ(T²) over T steps. The…

Indexed retrieval from external memory costs O(log N) per access; sequential retrieval costs Ω(N). This exponential separation compounds with reasoning depth: O(T log N) vs. Θ(T²) over T steps. The result is architectural — it holds for any bounded-capacity processor with a readable/writable external store.

#The formal result

The Library Theorem adapts the I/O complexity framework of Aggarwal & Vitter (1988) to transformer-based reasoning systems. The core model:

An agent with a fixed-size context window (the "page" in I/O complexity terms)
An external store of N pages, each containing bounded information
Read and write operations that transfer one page between the store and the context window per step
A data processing inequality bounding how much information each step can extract

#Key results

Theorem 0 (Capacity bound). Each read step extracts at most C log₂|Σ| bits from the store, where C is context window size and Σ is the token alphabet.

Theorem 1 (Sequential lower bound). Without indexing structure, retrieving a target item from N pages costs Ω(N) steps — essentially scanning.

Theorem 2 (Indexed upper bound). With a B-tree index of branching factor b = ⌊C/η⌋ (where η is the overhead per pointer), retrieval costs O(log_b N) steps.

Theorem 3 (The Library Theorem). The separation ratio between indexed and sequential retrieval is Ω(b^h / h) where h is tree height — exponential in the depth of the index hierarchy.

Theorem 4 (Reasoning cost). Over T reasoning steps involving both reads and writes to external memory, unindexed reasoning costs Θ(T²) while indexed reasoning costs O(T log_b(N₀ + T)) — the advantage compounds with reasoning depth.

Theorems 5–6 (In development). Recursive indexing: the Library Theorem applies to its own meta-index (a flat global index reintroduces Ω(N) at the organizational level). Separation requirement: mixing index and data degrades retrieval toward sequential.

#The key insight: reasoning is search

The theorem rests on one conceptual move: identifying reasoning with search.

Li & Wang (2025) showed that a transformer's output at each step depends entirely on its context window — the WINDOW = SPACE result. This means finding the correct answer reduces to finding the correct input tokens in the external store. Reasoning is retrieval from organized external memory. Indexing that memory determines retrieval cost, therefore indexing determines reasoning cost.

This is not a metaphor. It is a formal identity within the I/O complexity model. A transformer that reasons over an external store is computationally equivalent to a processor navigating a data structure. The B-tree analysis applies with full rigor.

#What it does and does not say

It says: Architectural organization of external memory provides a formal advantage that cannot be overcome by scaling model size, context length, or compute — the Ω(N) lower bound for sequential access is unconditional.

It does not say: Current LLMs actually exploit this advantage. The theorem characterizes the capacity of the model class (what the best algorithm in the class can achieve), not the performance of trained models. There is likely a large gap between theoretical capacity and empirical performance — mapping that gap is the experimental program's purpose.

It does not say: Larger context windows eliminate the need for external memory. The advantage is a ratio (N / log N) that grows with store size. For any fixed context window, sufficiently large stores make indexing essential.

#Empirical program

Seven experiments designed to validate and extend the theorem:

#	Test	Prediction	Status
1	Structured vs. unstructured scratchpad (P3 core)	Structured outperforms at matched token budget	Critical path
2	Reasoning depth curves (vary T at fixed N)	T² vs. T log N separation	Designed
3	Degradation (corrupt index progressively)	Performance collapses toward O(N)	Designed
4	RAG-for-reasoning (index self-generated reasoning)	Compounding gains from self-indexing	Designed
5	Constitutional AI as IAR (flat vs. indexed constitution)	Indexed constitution improves alignment	Designed
6	Scale invariance (same experiments across model sizes)	Smaller models have higher hallucination floor	Designed
7	AI peer review as applied demo	Indexed review process outperforms sequential	Designed

Publication targets: NeurIPS 2026 (Theorems 0–4 + experiments 1–3), Nature (full program), perspective piece (conceptual narrative).

#Connection to HAAK

HAAK's file system is a concrete instantiation of the indexed external memory the theorem describes:

Theorem concept	HAAK instantiation
External store	The file system (patterns/, projects/, foundations/)
Index	`index.md` files, frontmatter cross-references, naming conventions
B-tree branching	Directory hierarchy — each folder branches to children
Read/write operations	Agent reads files for context, writes outputs to disk
Degradation	Stale indexes, broken links, orphaned documents

This is not coincidental. HAAK was designed to instantiate the theorem, and the theorem was formalized by observing what HAAK was already doing. The reflexive loop — system as both instance and test — is deliberate.

#Historical development

Feb 20 2026: Handwritten "Thinking Needs Writing" proposal names inscription-augmented reasoning
Feb 21 2026: Library Theorem v1 drafted; 7 HAAK review personas assess it
Feb 22 2026: Reframed as agent theory (v2); "reasoning is search" insight crystallizes
Feb 23 2026: v3 with formal I/O complexity framework and Theorems 0–4
Feb 24 2026: Model-persona comparison matrix (31 reviews across 4 models × 7 personas)

#Constitutional implications

The Library Theorem grounds the constitutional requirement of externalization (§1) with formal precision: it is not merely good practice to write things down; it is the enabling condition for O(log N) reasoning. Systems that process internally — without inscribing intermediate state — are provably limited to O(N) retrieval over their accumulated reasoning.

haak · foundation · 2026-02-24 · zach + claude

Foundations 02 — The Library Theorem — 2026 — Zachary F. Mainen / HAAK