| Section | What it covers |
|---|---|
| Ontology | What an entity is, what a belonging is |
| Locality | Assert direct belongings only |
| Schema | Two tables, nothing else |
| Sources and mappings | How external data becomes entities |
| Lifecycle | Birth and death as belonging |
| Queries | SQL patterns |
| Build | How the index is derived |
#Ontology
An entity is a thing with an identity. It has an id, a type, and a name. That is all it is. Everything you know about it — its affiliations, its location, its genre, its authors — is a belonging.
A belonging is a directed arrow from an entity to something it belongs to, with optional temporal bounds. The direction is always: from the entity to what it belongs to. Every belonging is temporal — it has a start, and it can end. A belonging without dates is one where the dates are unknown or irrelevant, not one that is atemporal by nature.
track → person (belongs to its artist)
track → Psytrance (belongs to a genre)
person → mainenlab (belongs to a lab)
lab → programme (belongs to a programme)
paper → doi:10.1038/x (belongs to its DOI)
paper → person (belongs to its author)
paper → journal (belongs to its journal)
doi → CrossRef (belongs to its registrar)
file → filesystem (belongs to its container)
project → active (belongs to a lifecycle stage)
cell → body (belongs to an organism)
memory → person (belongs to a mind)
branch → parent-branch (belongs to its parent)
This is the only mechanism. Every relationship — hierarchy, affiliation, classification, lifecycle, provenance, authorship, location — is a belonging. There are no relation types, no role labels, no special columns. Just: this entity belongs to that target, starting when, ending when, according to whom.
#Locality
Assert only direct belongings. Derive everything else by traversal.
A track belongs to an artist. The artist belongs to a record label. Do not also tag the track with the label — that relationship is a query, not an assertion. A person belongs to mainenlab. Mainenlab belongs to the Neuroscience Programme. Do not tag the person with the programme.
The builder does not propagate belongings. Queries traverse them.
#Schema
Two tables in data/entities.db:
entities — things that exist.
| Column | Purpose |
|---|---|
| id | type:slug (e.g., person:davide-crombie, doi:10.1038/x) |
| type | the entity class |
| name | display name |
belongings — every relation.
| Column | Purpose |
|---|---|
| entity_id | FK to entities — the thing that belongs |
| target | what it belongs to (an entity id, a name, a path) |
| start_date | when the belonging began (NULL = unknown) |
| end_date | when it ended (NULL = ongoing or unknown) |
| source | who asserted this (the scanner, the user, the API) |
#Examples
entity: person:davide-crombie
→ mainenlab (2020, NULL, frontmatter)
→ phd-student (2020, NULL, frontmatter)
→ project:inscription (2026, NULL, frontmatter)
entity: org:mainen
→ org:neuroscience (NULL, NULL, frontmatter)
entity: paper:f040022b
→ doi:10.1038/nrn1933 (NULL, NULL, papers.db)
→ person:Z. Mainen (NULL, NULL, papers.db)
→ journal:Nature Reviews (2006, NULL, papers.db)
→ published (2006, NULL, papers.db)
→ papers.db (NULL, NULL, scan)
entity: track:abc123
→ person:Some Artist (NULL, NULL, music.db)
→ Psytrance (NULL, NULL, music.db)
→ music.db (NULL, NULL, scan)
No role_type column. The target's identity tells you the nature of the relationship. A belonging to a person is authorship or artistry. A belonging to a genre name is classification. A belonging to "active" is lifecycle. The semantics are in the entities, not in labels on the arrows.
#Sources and schema mappings
A source is an entity that emits other entities and belongings. music.db is a source. papers.db is a source. A directory of markdown files is a source. Discogs is a source. iNaturalist is a source. Sources are not special — they're entities like anything else, and other entities belong to them.
A schema mapping is an operator (Definition 2) that transforms a source into entities and belongings. It is a method (Definition 3) — a repeatable procedure. Given a source, the mapping knows:
- What in this source counts as an entity
- What the entity's type and name are
- What counts as a belonging
- What the belonging's target and dates are
Currently, schema mappings are Python functions in build_entities.py (one per source). The target architecture: mappings as declarative data (YAML), applied by a generic builder. Adding a new source means writing a mapping, not writing code.
# Example: schema mapping for papers.db
source: data/papers.db
format: sqlite
entity:
table: papers
id: "paper:{id}"
type: paper
name: "{title}"
belongings:
- column: doi
target: "doi:{value}"
ensure_target: {type: doi}
- column: authors
parse: json_array
target: "person:{value}"
ensure_target: {type: person}
- column: journal
target: "journal:{value}"
ensure_target: {type: journal}
- column: year
as: start_date
target_from: article_type
map: {posted-content: preprint, journal-article: published}
Schema mappings are themselves entities. They belong to the source they map and to the ontology they map into. A change to a mapping is an ontological decision — it goes through the PR process like any other change.
No privileged source of truth. The system does not prescribe where truth lives. If two sources disagree about an entity's belongings, both assertions coexist. Conflict resolution is a policy question, resolved through the same PR mechanism as any other conflict (see 24-branch-visibility.md).
#Lifecycle
Lifecycle is a belonging with dates. An entity belongs to a lifecycle stage, and that belonging has a startdate and possibly an enddate. Birth is the start of a belonging. Death is the end. Joining is the start. Leaving is the end.
Common lifecycle stages (these are target names, not special values):
| Context | Stages |
|---|---|
| Methods, foundations | draft → active → superseded |
| Skills | active → deprecated |
| Projects | active → paused → completed → archived |
| Organizations | active → inactive → dissolved |
| Branches | active → merged |
#Queries
What does entity X belong to?
SELECT target, start_date, end_date, source FROM belongings
WHERE entity_id = 'person:davide-crombie'
What belongs to entity Y?
SELECT e.id, e.type, e.name FROM entities e
JOIN belongings b ON e.id = b.entity_id
WHERE b.target = 'org:mainen'
Walk the chain upward from a person:
-- Davide → mainenlab → neuroscience → champalimaud-research
-- Each step: SELECT target FROM belongings WHERE entity_id = ?
Everything that belongs to "active":
SELECT e.type, e.name FROM entities e
JOIN belongings b ON e.id = b.entity_id
WHERE b.target = 'active' AND b.end_date IS NULL
#Build
python3 scripts/build_entities.py [--verbose] rebuilds data/entities.db from scratch. The DB is disposable — delete it, scan again, same result. The builder applies schema mappings (currently hardcoded as functions) to each source.
#Portability
Two tables, one relation. The entity/belonging model is independent of storage backend, source format, or domain. It indexes SQLite, frontmatter, and external APIs equally. This is what Filix inherits from HAAK.
Architecture 22 — Entities and Belongings — 2026 — Zachary F. Mainen / HAAK