29. Ontology Audit — March 2026

An infrastructure-wide audit of all databases (41 SQLite), all markdown files (4,535), and the entity/belonging model, measured against the relational situational ontology. Conducted by session…

An infrastructure-wide audit of all databases (41 SQLite), all markdown files (4,535), and the entity/belonging model, measured against the relational situational ontology. Conducted by session bfce97f34445 (opus-4-6, 2026-03-16).


#Diagnosis

The system has two implementations that coexist without a bridge.

Layer 1: The ontology. Entities.db implements the core model faithfully: entities (376K rows), belongings (1.5M rows), entity_identifiers (12K rows). One relation (belongs-to), qualities carry semantics, reflexive closure. The ontology documents (01–11) are rigorous and internally consistent. This layer is solid.

Layer 2: Everything else. 40 domain databases, each with its own bespoke schema. Papers.db has 11 tables. Public-archives.db has 9. Health.db has 4. Music.db has 7. Each was built to solve a specific problem at a specific moment. Each works. None speaks the ontology's language.

The gap is not theory vs. implementation. It is two implementations — one universal, one domain-specific — with no declared mapping between them.


#Three Structural Problems

#1. Domain databases don't map to entities/belongings

papers.db has paperauthors(paperid, name, position). The ontology says: person belongs-to paper with quality "author." But there's no place for position — the quality framework has no model for ordered composition.

Resolution (from ontology/12): Add a rank field to the belongings table. Rank is a property of the belonging, not a quality. Orthogonal to semantics. Handles author position, track number, episode sequence universally. With this, domain databases become round-trippable through entities/belongings.

#2. Markdown files are bimodal

Core structural docs (foundations, ontology, patterns, projects) — 1,976 files with frontmatter, 100% indexed, clean entity extraction. These map to the ontology correctly.

Operational docs (transcripts, strategy, session logs) — 2,559 files without frontmatter, invisible to the entity graph. These are 56% of all markdown. The ontology says they are situations (sessions) or materializations (transcripts). The system treats them as unstructured text.

Resolution: Add lightweight frontmatter to transcripts and strategy docs: type, date, projects, participants. Four fields. Makes 2,500+ orphaned files queryable as situation materializations.

#3. build_entities.py is hardcoded

The lifecycle doc (architecture/22) envisions declarative YAML schema mappings. What exists is ~15 custom Python functions, one per database. Each knows its own schema intimately. None is generic. Adding a new database means writing a new function.

Resolution: Migrate to YAML schema mappings. Generic builder reads mappings, applies to any source. This makes the ontology generative (produces the data model from declarations) rather than descriptive (consulted as a reference after the fact).


#What Works

  • Index hierarchy. Every folder has index.md. /read navigates it. O(log N) retrieval is real and tested.
  • Skill library. 50+ skills, composition works, /write maintains indices on mutation.
  • Board protocol. Inter-agent coordination via timestamped entries with origin tags.
  • Provenance tracking. source columns everywhere. Frontmatter carries created, status, domains.
  • The ontology itself. Situations as primitives, relationships as derived queries, qualities as semantic layer — the framework is sound.

#What's Missing

GapWhat existsWhat's needed
Situation entitiesProjects as directoriesProjects as situation entities in entities.db; directories as materializations
Session registrationBoard entries, transcriptsSessions as situation entities with standard belongings (actors, methods, domain, materializations)
Quality graph as dataProse in 02-relations.mdQuality entities with meta-quality belongings in entities.db — the reflexive closure made queryable
Ordered compositionposition columns in domain DBsrank field on belongings table
Declarative schema mapperHardcoded Python functionsYAML declarations consumed by a generic builder
Situation registerNothingdata/active-situations.jsonl — live sessions write situation state for sibling discovery
Policy inheritanceImplicit via directory nestingExplicit policy resolution (S5 from ontology/12): inner overrides outer, constitution non-overridable

#Database Inventory Summary

CategoryDatabasesRow countsOntology conformance
Core ontologyentities.db376K entities, 1.5M belongingsFull
Communicationsgateway.db, signal/, whatsapp/, matrix/Messages across 4 channelsNone — flat message tables, no situation decomposition
Academicpapers.db, elife.db20.5K papers, 6.2K reviewsNone — 11-table relational schema
Mediamusic.db, spotify.db6.8K tracksNone — flat track library
Archivespublic-archives.db2.4M docs, 1.6M entitiesPartial — entities exist but as domain-specific types, not universal
Healthhealth.db3.7M recordsNone — time-series schema
Personalcontacts.db, personal.db, todos.db, notes.dbMixedNone — each has its own schema
Infrastructurevault.db, repos.db, files.db, storage.db, sessions.dbMixedNone — operational tables
Workspacegmail messages.db, gdrive DBs, events.db, arc.db, books.dbMixedNone — mirror/sync schemas

#Priority Order

  1. Add rank to belongings. Small schema change, unblocks round-tripping for all ordered data.
  2. Register sessions as situations. Wire into /bye and /checkpoint. Makes session history queryable.
  3. Add frontmatter to transcripts. Batch script. Makes 1,700 files discoverable.
  4. Build quality graph as data. Seed quality entities from 02-relations prose. Makes the vocabulary queryable.
  5. Declarative schema mapper. Replace hardcoded Python with YAML. Highest leverage for long-term maintenance.
  6. Situation register. data/active-situations.jsonl with startup/shutdown hooks. Enables sibling discovery.

Items 1–3 can be done in a single session. Items 4–6 are architectural work requiring design review.


#Conclusion

The system is not a house of cards. Domain databases keep working regardless of ontology layer completeness. The risk is divergence, not collapse: every new bespoke database is another schema that works locally but doesn't participate in the graph. The four pieces that close the gap — rank field, situation entities, declarative mapper, quality graph as data — are known and scoped. The ontology is the right architecture. The implementation is catching up.


haak architecture · 29 · ontology audit · 2026-03-16 · session bfce97f34445 (claude opus-4-6)

Architecture 29 — 29. Ontology Audit — March 2026 — 2026 — Zachary F. Mainen / HAAK