Knowledge Organization

HAAK stores knowledge in a monorepo. The file system is real but not canonical for identity. Two synchronized projections — `index.md` files and `entities.db` — serve discovery. This document…

#Overview

HAAK stores knowledge in a monorepo. The file system is real but not canonical for identity. Two synchronized projections — index.md files and entities.db — serve discovery. This document specifies the directory structure, index schema, discovery interface, sync protocol, and roles that together keep the system navigable across agents and sessions.

#1. Ontological Grounding

A document is an entity. The file on disk is a belonging — the location where content lives. The path is a quality of that entity, not the entity itself. Moving a file does not change the entity; it changes a quality.

The filesystem is a domain projection of the entity graph. It is canonical for content (the bits live here) but not for identity (the entity is defined in the graph). Two projections expose the same underlying graph:

index.md files — filesystem-local, human and agent navigable, updated at write time.
entities.db documents table — queryable, attribute-filterable across the whole system, updated by Veda's sync daemon.

Neither projection generates the other. Both project the same entity graph. Inconsistency between them is drift; the sync protocol (§6) bounds it.

Indexing is semantic, not content-based. Index entries carry path, description, domain, concern, and status — never document content. The file is fetched only when needed. This is the Library Theorem applied to HAAK itself: hierarchical indexed retrieval is O(log N); content scanning is O(N). The index preserves the advantage.

#2. Canonical Directory Structure

The HAAK monorepo has exactly 10 visible top-level directories:

data/           — knowledge graph hub (entities.db, papers.db, media/, bibliography/)
filix/          — successor system (first-class, not a project)
foundations/    — why (FLAT, numbered)
infra/          — operational substrate (console/, daemons/, scripts/)
ontology/       — what (FLAT, numbered, includes proofs/)
patterns/       — how (architecture/, methods/, policies/, styles/)
personas/       — who (FLAT, slug files)
projects/       — where, organized by domain
strategy/       — when (FLAT, numbered, forward-looking)
web/            — public sites (boundary: separate git repos)

projects/ domains:

projects/
  personal/     — private life (excluded from public release)
  art/          — Latent States → latentstates.org
  lab/          — neuroscience, mice → mainenlab.org
  inscription/  — AI + philosophy + ontology → zmainen.org/papers
  power/        — AI governance, policy, Teachout
  ops/          — running HAAK, transitioning to Filix

web/ sites:

web/
  zmainen/      — zmainen.org (permanent: the person)
  latentstates/ — latentstates.org (permanent: art practice)
  mainenlab/    — mainenlab.org (permanent: the lab)
  haak/         — haak.world (transitional: current system)
  filix/        — filix.world (future: successor system)

Key structural rules:

data/ is the knowledge graph hub. It must stay at the root and never be scattered into projects or infra. It is the one place databases live.
filix/ is architecturally first-class — the successor system being built inside the current one. It is not a project; it does not belong under projects/.
styles/ lives at patterns/styles/, not at the root.
proofs/ lives at ontology/proofs/, not at the root.
web/ boundary: each site is its own git repository. The web/ directory contains symlinks or submodules pointing to those repos. No web content is tracked directly in the HAAK monorepo.
infra/ contains only operational substrate: console/, daemons/, scripts/. No knowledge content belongs here.
foundations/ (legacy mirror) is removed. The canonical location is foundations/ at root.

Depth limits:

strategy/, foundations/, ontology/, and personas/ are FLAT. No subdirectories. Numbered docs only (NN-slug.md for strategy/foundations/ontology; slug.md for personas).

patterns/ has four fixed subdirectories: architecture/, methods/, policies/, styles/. Each is FLAT internally. No further nesting.

projects/<domain>/<name>/ allows up to two levels below the project root. A third level is permitted only for experiment data directories. Never deeper.

web/<site>/ is one level within each site repo.

data/ is flat for databases. One level for subdirectories (data/media/, data/bibliography/).

infra/ has three fixed subdirectories: console/, daemons/, scripts/. Each may have internal structure as needed by the tooling.

Naming conventions:

Strategy, methods, architecture, foundations, and ontology docs use NN-slug.md — two-digit number, hyphenated slug. Project and domain directories use slug/ with no number. Within projects, phase or type prefixes (01-, 02-) are allowed. The [+]/[-] bracket notation is PROHIBITED in directory names going forward. Existing [+] directories get renamed during reorganization; the notation moves into the index.md entry as a status tag.

No spaces in directory or file names. Hyphens only.

#2a. Public/Private Boundary

HAAK will eventually publish its methods and architecture. This section defines what releases publicly and what stays private.

Public on release:

foundations/        — the intellectual case for the system
ontology/           — the relational model (including proofs/)
patterns/           — methods, architecture, policies, styles
personas/           — researcher profiles (sanitized: no private contact details)
filix/              — the successor system
infra/              — scripts, daemons, console (operational substrate)
web/                — all five public sites
projects/art/       — Latent States practice
projects/lab/       — neuroscience work → mainenlab.org
projects/inscription/ — AI + philosophy papers
projects/power/     — governance and policy work

Private always:

data/               — knowledge graph, databases, personal records
projects/personal/  — private life

Selective (sanitized before release):

strategy/           — forward-looking plans; redact sensitive actors and timelines
projects/ops/       — HAAK operational history; redact internal tooling details

The release boundary is enforced at the git level: public repos contain only the public subtree. The HAAK monorepo is private; release is a one-way export, not a fork.

#3. Index.md Schema

Every directory must have an index.md. The schema is fixed:

# <Directory Name>

<One paragraph: what this directory is for, who reads it, what it contains.>

## Contents

| Path | Description | Domain | Concern | Status |
|:-----|:------------|:-------|:--------|:-------|
| filename.md | One-line description | strategy | publication | active |

Controlled vocabulary:

Domain (pick one): projects · patterns · strategy · foundations · ontology · web · data · infrastructure

Concern (pick one or two): publication · architecture · governance · infrastructure · release · method · ontology · identity · coordination · research

Status (pick one): active · draft · superseded · archived

No freeform tags. A new tag requires Kavi's approval before use. Agents must not invent vocabulary — the controlled set exists so queries over entities.db return consistent results.

#4. Discovery Interface

The cold-start discovery path for any agent:

Read CLAUDE.md — always first.
Read top-level index.md — what exists at the root.
Query entities.db by domain and concern to find candidate documents.
Read the specific file — fetch only what you need.

The entities.db documents table schema:

id, path, description, domain, concern, status, owner_role, created_at, updated_at

Example query — active strategy documents in the publication concern:

sqlite3 data/entities.db \
  "SELECT path, description FROM documents
   WHERE domain='strategy' AND concern='publication' AND status='active'"

The query interface is raw sqlite3. No wrapper is needed. The schema is intentionally minimal so any agent or subagent can issue queries without learning a library.

Agents also know their own role context. The entities.db contains agent roles as entities. On cold-start an agent queries its role to get: mandate document path, peer agents in the same domain, and open tasks tagged to the role. This is more reliable than reading the board and hoping prior sessions left good traces.

#5. Cold-Start Wiring

Two places in the constitution embed the indexing protocol:

CLAUDE.md session start protocol (after step 1): > Before creating any file, read patterns/methods/51-indexing-protocol.md. It governs index.md schema, controlled vocabulary, and discovery queries.

Constitutional preamble for all subagent task prompts: > When you create a file: add an entry to the parent index.md using schema | path | description | domain | concern | status |. Domain: projects/patterns/strategy/foundations/ontology/web/data/infrastructure. Concern: publication/architecture/governance/infrastructure/release/method/ontology/identity/coordination/research. Status: active/draft/superseded/archived.

These two insertions close the gap between specification and practice. Without them, the indexing protocol is advisory; with them, it is the path of least resistance.

#6. Sync Protocol

index.md → entities.db (Reed's daemon):

Runs on file-change event (FSEvents on Mac, inotify on Exoscale) and on a 15-minute cron. Reads all index.md files, parses the Contents table, upserts into the documents table in entities.db. The updated_at field on each row reflects the last sync, not the last human edit. entities.db is eventually consistent — acceptable latency is one sync cycle. After each upsert, entities.db is rsynced to the replica site.

Veda's weekly audit:

Diffs index.md contents against actual filesystem state. Three finding types:

Missing — file exists on disk, not in any index. Auto-repair: generate stub entry tagged [unreviewed].
Orphan — index entry exists, file is gone. Auto-repair: remove entry, log removal.
Stale — file mtime significantly newer than index entry's updated_at. Flagged for review; semantic judgment required.

Veda posts the audit report to the board. If drift exceeds 10 missing entries, or any orphan chain extends more than 2 hops: board alert and Kavi notification.

#7. Reorganization Process

The one-time filesystem cleanup follows this sequence in order. No step may be skipped.

Tag current HEAD: reorg-baseline-YYYY-MM-DD — permanent revert point.
Create branch: reorg/filesystem-cleanup.
Define target structure (the directory map in §2).
Scan all internal references — links and paths in documents — and catalogue them before moving anything.
Move files with git mv — history follows the file.
Flatten strategy subdirectories — number contents and move to strategy root.
Rename [+] directories — move notation to index.md status field.
Update all index.md files at affected levels.
Verify all catalogued internal references still resolve.
Merge to main: single commit, message reorg: filesystem cleanup per architecture/37.
Rebuild entities.db from a fresh index.md scan.

No file moves on main, ever. All restructuring happens on a branch.

#8. External Repo Policy

Three tiers of relationship:

HAAK-governed — repos owned by haak-world or zmainen that implement HAAK infrastructure or public presence. Full structure enforced: index.md at every level, CLAUDE.md, controlled vocabulary in frontmatter. Examples: haak-world/haak, haak-world/inscription-review, zmainen/zmainen.org.

Legacy / external-adjacent — repos with external dependencies (arXiv links, collaborator clones, GitHub Pages URLs). Do not restructure internally. Register in entities.db. Add a HAAK.md at the repo root containing: entity ID, canonical path from HAAK, concern, owner role. Examples: zmainen/library-theorem, mainenlab repos.

Fully external — repos HAAK does not own. Read-only relationship. Tracked in data/repos.db only. No entities.db registration. Examples: IBL repos, collaborator repos.

New repo template: haak-world/repo-template — a GitHub template repository that instantiates the canonical structure: index.md at root and in each subdirectory, CLAUDE.md stub, .gitignore, HAAK.md registration stub. Creating a new HAAK-governed repo from the template starts it clean.

#9. Mac ↔ Exoscale Sync

Layer	Primary	Replica	Transport
Document filesystem	Mac (development)	Exoscale (running)	Git — both sites clone the same repo
entities.db	Mac	Exoscale	rsync after each Veda audit
Operational DBs (gateway, health, gmail, repos)	Exoscale	Mac (read replica)	rsync daily
Large media	Exoscale S3 (filix-media)	Mac on demand	rclone pull

Git is the filesystem sync transport. Exoscale pulls on merge to main via webhook or cron. Mac pushes. This is the only safe bidirectional sync — git handles conflicts. Operational databases flow in one direction: Exoscale is the system of record, Mac holds read replicas. Media stays in S3; local copies are pulled on demand, never pushed back.

#10. Role Assignments

Responsibility	Owner
Sync daemon (index.md → entities.db)	Reed (steward)
Weekly audit + drift correction	Veda (archivist)
Vocabulary governance (new tags)	Kavi (chief of staff)
Reorganization execution	Veda + Reed jointly
New repo template maintenance	Reed
Cold-start protocol (CLAUDE.md)	Lina (architect)
External repo registration	Nexo (liaison)

Architecture 37 — Knowledge Organization — 2026 — Zachary F. Mainen / HAAK