29. Ontology Audit — March 2026

An infrastructure-wide audit of all databases (41 SQLite), all markdown files (4,535), and the entity/belonging model, measured against the relational situational ontology. Conducted by session…

An infrastructure-wide audit of all databases (41 SQLite), all markdown files (4,535), and the entity/belonging model, measured against the relational situational ontology. Conducted by session bfce97f34445 (opus-4-6, 2026-03-16).

#Diagnosis

The system has two implementations that coexist without a bridge.

Layer 1: The ontology. Entities.db implements the core model faithfully: entities (376K rows), belongings (1.5M rows), entity_identifiers (12K rows). One relation (belongs-to), qualities carry semantics, reflexive closure. The ontology documents (01–11) are rigorous and internally consistent. This layer is solid.

Layer 2: Everything else. 40 domain databases, each with its own bespoke schema. Papers.db has 11 tables. Public-archives.db has 9. Health.db has 4. Music.db has 7. Each was built to solve a specific problem at a specific moment. Each works. None speaks the ontology's language.

The gap is not theory vs. implementation. It is two implementations — one universal, one domain-specific — with no declared mapping between them.

#Three Structural Problems

#1. Domain databases don't map to entities/belongings

papers.db has paperauthors(paperid, name, position). The ontology says: person belongs-to paper with quality "author." But there's no place for position — the quality framework has no model for ordered composition.

Resolution (from ontology/12): Add a rank field to the belongings table. Rank is a property of the belonging, not a quality. Orthogonal to semantics. Handles author position, track number, episode sequence universally. With this, domain databases become round-trippable through entities/belongings.

#2. Markdown files are bimodal

Core structural docs (foundations, ontology, patterns, projects) — 1,976 files with frontmatter, 100% indexed, clean entity extraction. These map to the ontology correctly.

Operational docs (transcripts, strategy, session logs) — 2,559 files without frontmatter, invisible to the entity graph. These are 56% of all markdown. The ontology says they are situations (sessions) or materializations (transcripts). The system treats them as unstructured text.

Resolution: Add lightweight frontmatter to transcripts and strategy docs: type, date, projects, participants. Four fields. Makes 2,500+ orphaned files queryable as situation materializations.

#3. build_entities.py is hardcoded

The lifecycle doc (architecture/22) envisions declarative YAML schema mappings. What exists is ~15 custom Python functions, one per database. Each knows its own schema intimately. None is generic. Adding a new database means writing a new function.

Resolution: Migrate to YAML schema mappings. Generic builder reads mappings, applies to any source. This makes the ontology generative (produces the data model from declarations) rather than descriptive (consulted as a reference after the fact).

#What Works

Index hierarchy. Every folder has index.md. /read navigates it. O(log N) retrieval is real and tested.
Skill library. 50+ skills, composition works, /write maintains indices on mutation.
Board protocol. Inter-agent coordination via timestamped entries with origin tags.
Provenance tracking. source columns everywhere. Frontmatter carries created, status, domains.
The ontology itself. Situations as primitives, relationships as derived queries, qualities as semantic layer — the framework is sound.

#What's Missing

Gap	What exists	What's needed
Situation entities	Projects as directories	Projects as situation entities in entities.db; directories as materializations
Session registration	Board entries, transcripts	Sessions as situation entities with standard belongings (actors, methods, domain, materializations)
Quality graph as data	Prose in 02-relations.md	Quality entities with meta-quality belongings in entities.db — the reflexive closure made queryable
Ordered composition	`position` columns in domain DBs	`rank` field on belongings table
Declarative schema mapper	Hardcoded Python functions	YAML declarations consumed by a generic builder
Situation register	Nothing	`data/active-situations.jsonl` — live sessions write situation state for sibling discovery
Policy inheritance	Implicit via directory nesting	Explicit policy resolution (S5 from ontology/12): inner overrides outer, constitution non-overridable

#Database Inventory Summary

Category	Databases	Row counts	Ontology conformance
Core ontology	entities.db	376K entities, 1.5M belongings	Full
Communications	gateway.db, signal/, whatsapp/, matrix/	Messages across 4 channels	None — flat message tables, no situation decomposition
Academic	papers.db, elife.db	20.5K papers, 6.2K reviews	None — 11-table relational schema
Media	music.db, spotify.db	6.8K tracks	None — flat track library
Archives	public-archives.db	2.4M docs, 1.6M entities	Partial — entities exist but as domain-specific types, not universal
Health	health.db	3.7M records	None — time-series schema
Personal	contacts.db, personal.db, todos.db, notes.db	Mixed	None — each has its own schema
Infrastructure	vault.db, repos.db, files.db, storage.db, sessions.db	Mixed	None — operational tables
Workspace	gmail messages.db, gdrive DBs, events.db, arc.db, books.db	Mixed	None — mirror/sync schemas

#Priority Order

Add rank to belongings. Small schema change, unblocks round-tripping for all ordered data.
Register sessions as situations. Wire into /bye and /checkpoint. Makes session history queryable.
Add frontmatter to transcripts. Batch script. Makes 1,700 files discoverable.
Build quality graph as data. Seed quality entities from 02-relations prose. Makes the vocabulary queryable.
Declarative schema mapper. Replace hardcoded Python with YAML. Highest leverage for long-term maintenance.
Situation register. data/active-situations.jsonl with startup/shutdown hooks. Enables sibling discovery.

Items 1–3 can be done in a single session. Items 4–6 are architectural work requiring design review.

#Conclusion

The system is not a house of cards. Domain databases keep working regardless of ontology layer completeness. The risk is divergence, not collapse: every new bespoke database is another schema that works locally but doesn't participate in the graph. The four pieces that close the gap — rank field, situation entities, declarative mapper, quality graph as data — are known and scoped. The ontology is the right architecture. The implementation is catching up.

haak architecture · 29 · ontology audit · 2026-03-16 · session bfce97f34445 (claude opus-4-6)

Architecture 29 — 29. Ontology Audit — March 2026 — 2026 — Zachary F. Mainen / HAAK