Encapsulation

HAAK uses hierarchical indexing to give bounded reasoners (human and AI) efficient access to a growing store of documents. The Library Theorem proves this gives an exponential advantage over flat…

#Problem

HAAK uses hierarchical indexing to give bounded reasoners (human and AI) efficient access to a growing store of documents. The Library Theorem proves this gives an exponential advantage over flat access: O(log N) vs Ω(N).

But the advantage has a precondition: the index must be well-formed. Specifically, index entries must be cleanly separated from data. When they mix — an index node that's half navigation and half content, an agent context that blends its own reasoning with another agent's output, a reviewer whose priors are entangled with the paper's claims — the hierarchy degrades toward flat. Not metaphorically. The agent scanning a corrupted index node pays O(|node|) to find the B relevant entries among irrelevant content. The effective branching factor drops. Enough mixing and navigation cost approaches Ω(N): the sequential bound.

This is the central fragility of any indexed system. The exponential advantage is not robust to contamination. Encapsulation — keeping distinct information sources in distinct containers — is the mechanism that preserves it.

#Theoretical basis

#The Library Theorem says hierarchy wins

Theorems 1–3 (Inscription paper): a bounded reasoner searching N items in external memory pays Ω(N) sequentially, O(log N) with a hierarchical index. The separation is exponential in depth.

#The separation assumes clean indices

Each index node summarizes B children in ≤ W tokens (working memory bound). The agent reads the node, identifies which child to descend into, and proceeds. This works when the node contains only navigational information — B entries, each pointing to a child.

When the node also contains non-navigational content (discussion, rationale, data), the agent must scan more tokens to find the B entries. If non-navigational content is c tokens per node, effective scan cost per level is O(B + c) instead of O(B). When c >> B, the index node is functionally a data document that happens to contain some pointers — the agent is doing sequential scan within what should be O(1) navigation.

#Mixing is the mechanism of degradation

Drift resistance (architecture 04) identifies the observable problem: organizational structure degrades over time. Encapsulation identifies the mechanism: degradation happens when distinct information types are mixed within a single container. Every drift event is an encapsulation violation:

Drift eventWhat's mixedEffect
Misfiled documentWrong data under an index entryPointer leads to irrelevant content; agent must backtrack
Stale index.mdOutdated pointers + current childrenAgent follows dead links; effective branching factor drops
Content in index fileNavigation + prose/discussionAgent scans non-navigational tokens at every level
Agent context leakageOwn reasoning + other agent's traceCan't distinguish what it computed from what it was told
Reviewer prior entanglementBeliefs before reading + paper's claimsCan't evaluate paper independently of prior position

In each case, two information sources that should be separate are combined in one container. The container becomes harder to use for its intended purpose because the consumer must disambiguate.

#Recursive application (Theorem 5)

The Library Theorem applies to its own infrastructure: the index is itself data that gets searched. A flat meta-index (like a centralized file-paths.md) reintroduces Ω(N) at the organizational level. Self-describing hierarchical nodes (index.md per directory) achieve O(log N). Encapsulation is what makes the recursion clean — each node describes only its own children, so no inter-node coordination is needed.

#The principle

Information that serves different functions must live in different containers. Specifically:

  1. Index vs. data: Navigation information (what's here, where to go) must not share a container with content (discussion, analysis, prose). An index node that also contains content is a degraded index node.
  1. Agent vs. agent: Each agent's reasoning trace must be identifiable as its own. Mixing agent A's context with agent B's output makes A's working memory a flat tape of mixed provenance — it can't efficiently navigate its own reasoning.
  1. Before vs. after: Information known before an event (reviewer's priors, agent's initial state) must be captured separately from information acquired after (paper's claims, agent's findings). Temporal mixing destroys the ability to attribute insights to their source.
  1. Type vs. type: Documents of different types serve different retrieval patterns. Mixing types in one container (a document that's half index and half opinion, or a directory that holds both foundations and features) degrades retrieval for both.

#Corollary: no unnecessary structure

The principle cuts both ways. Encapsulation mandates separation where functions differ, but it also prohibits separation where they don't. A directory created "just in case," a method that wraps one function in three layers, a configuration file for a single setting — these are containers without a function to separate. They add navigation cost (more nodes to traverse) without adding discrimination (the separation doesn't help the agent route). The test: does this container help an agent skip something irrelevant? If not, the container is noise.

#Implementation in HAAK

#index.md: pure navigation

Each index.md is an index node in the filesystem hierarchy. It must contain:

  • A one-line description of what this directory is
  • A table or list of children (files and subdirectories) with brief descriptions
  • Status indicators where relevant
  • Links to related areas

It must NOT contain:

  • Extended discussion or rationale (put in a separate document)
  • Design decisions or history (put in architecture docs, notes, or decision records)
  • Content that would be useful independent of navigation

Test: If you removed all non-navigational content from a index.md and the file still served its purpose, the removed content was violating encapsulation. If removing content makes the file useless for navigation, it belongs.

Why this matters: An agent navigating the tree reads index.md at every level. Each non-navigational token it processes at level k is multiplied across all future navigations through that node. O(B + c) per level × D levels = O(D(B + c)) total. Keeping c ≈ 0 keeps navigation at the theoretical bound O(D × B) = O(B log_B N).

#Agent isolation

Agent architecture principle #2 (architecture 06) already mandates: "Information must not leak between agents, reviewers, or scopes via model context. All sharing happens through explicit, stored documents."

The theoretical grounding: an agent's working memory IS its index of the current task. It holds pointers to what it's read, what's relevant, what to do next. If another agent's reasoning trace is mixed into this working memory, the agent is navigating a corrupted index — some pointers are its own (valid), some are from another agent (irrelevant noise). The effective branching factor in working memory drops.

Implementation is already in place:

  • Each agent instance starts with a clean context (no shared state)
  • Agents communicate through files, not shared memory
  • The reviewer agent explicitly forbids accessing other reviews
  • The editor reads reviews as distinct documents, not as a merged stream

#Pre-review position statements

When a reviewer reads a paper, their prior beliefs about the problem and the paper's claims enter the same context window. After reading, the reviewer cannot cleanly separate "what I already thought" from "what the paper argued." This is temporal encapsulation violation.

The position statement protocol (--position flag in /review, Step 1b) fixes this:

  1. Reviewer receives only a one-line topic description (no title, abstract, or content)
  2. Reviewer writes 1–3 paragraphs: what they know, what's open, what would change their mind
  3. Position statement is saved as {NN}{lastname}position.md
  4. Only then does the reviewer see and evaluate the paper

The editorial synthesis gets two cleanly separated documents per reviewer: what they thought before, and what they thought after. Divergence between position and review is signal — it means the paper moved the reviewer. Agreement is also signal — it means the reviewer's assessment is grounded in prior expertise, not paper influence.

#Directory structure

The hierarchy-rationale (architecture 01) establishes that types map to directories. The encapsulation principle adds: types map to directories BECAUSE different types serve different retrieval patterns, and mixing them in one container degrades retrieval for all of them.

A directory containing both foundation.md and feature-spec.md and review.md is a mixed-type container. An agent looking for the foundation must scan past the feature spec and review. This is the same mechanism as content in an index file — irrelevant tokens in the scan path.

Rules:

  • One type per directory (foundations in foundations/, features in features/, etc.)
  • Mixed-type directories are a code smell — they signal a missing subdirectory
  • When a directory accumulates items of a new type, create a subdirectory

#Document boundaries

Within a document: keep distinct sections about distinct topics. But the deeper rule is about document boundaries themselves:

  • A document that covers two unrelated topics should be two documents
  • A summary that compresses two heterogeneous sources into one narrative loses decomposability
  • The editorial synthesis writes per-reviewer summaries before a cross-reviewer synthesis — not one merged document

This connects to compression theory (Thread 2, Learning): the best compression of structured data preserves its structure. Compressing two things that shouldn't be together increases description length for any particular query. Separate documents maintain retrievability; merged documents create porridge.

#Relationship to other architecture patterns

PatternRelationship to encapsulation
01 Hierarchy rationaleHierarchy IS the index structure. Encapsulation is what keeps it working.
02 ForkabilitySystem/data split is encapsulation at the repository level.
04 Drift resistanceDrift is encapsulation violation. Every drift event mixes information that should be separate. Drift resistance is encapsulation maintenance.
05 Inscription architectureThe inscription paper proves why encapsulation matters (Library Theorem).
06 Agent architecturePrinciple #2 (encapsulation between agents) is a specific instance.
07 Macaria architectureTeam coordination requires encapsulation between labs/projects.

#Relationship to foundations

This pattern may also belong in foundations/ as a conceptual position — the claim that information separation is not just engineering hygiene but a theoretically necessary condition for maintaining hierarchical retrieval advantage. The architecture document (here) focuses on implementation. A foundation would focus on the argument: why this is true, what theory predicts, what the alternative looks like.

#Violations to watch for

Concrete signs that encapsulation is breaking down:

  1. index.md growing prose: If an index file exceeds ~30 lines, it probably contains non-navigational content. Extract to a separate document.
  1. Agent spawned with another agent's output in its prompt: Unless the agent's explicit job is to synthesize that output (like the editor reading reviews), this is context leakage.
  1. Review that extensively quotes the paper without distinct evaluation: The reviewer may have lost the boundary between paper claims and their own assessment.
  1. Directory with mixed document types: A directory containing both index-like and content-like files. Reorganize.
  1. A document that serves two audiences: If both navigating agents and reading humans need the same file for different purposes, it's probably mixing navigation with content. Split it.
  1. Summaries that merge sources without attribution: A synthesis document that doesn't clearly delineate which input contributed which point. The reader can't decompose it back to sources.

#Degradation model (conjecture, needs formalization)

If a fraction f of index nodes at each level have encapsulation violations (non-navigational content of size c), the effective navigation cost is:

O(D × (B + f·c))

When f·c << B, the system is near the clean bound. When f·c >> B, the system approaches:

O(D × c) ≈ O(log(N) × c)

which for large c approaches the flat bound. The transition is gradual — there's no sharp threshold, just progressive degradation as more nodes are contaminated.

This suggests a measurable health metric: for each index.md, compute the ratio of navigational tokens to total tokens. System-wide average of this ratio is the encapsulation score. Drift resistance (architecture 04) needs this metric — it's the "degradation metric" identified as a gap there.


haak · created 2026-02-22 · zach + claude

Architecture 08 — Encapsulation — 2026 — Zachary F. Mainen / HAAK