Entities and Belongings

| Section | What it covers | |:--------|:---------------| | Ontology | What an entity is, what a belonging is | | Locality | Assert direct belongings only | | Schema | Two tables, nothing else | |…

SectionWhat it covers
OntologyWhat an entity is, what a belonging is
LocalityAssert direct belongings only
SchemaTwo tables, nothing else
Sources and mappingsHow external data becomes entities
LifecycleBirth and death as belonging
QueriesSQL patterns
BuildHow the index is derived

#Ontology

An entity is a thing with an identity. It has an id, a type, and a name. That is all it is. Everything you know about it — its affiliations, its location, its genre, its authors — is a belonging.

A belonging is a directed arrow from an entity to something it belongs to, with optional temporal bounds. The direction is always: from the entity to what it belongs to. Every belonging is temporal — it has a start, and it can end. A belonging without dates is one where the dates are unknown or irrelevant, not one that is atemporal by nature.

track      → person         (belongs to its artist)
track      → Psytrance      (belongs to a genre)
person     → mainenlab      (belongs to a lab)
lab        → programme      (belongs to a programme)
paper      → doi:10.1038/x  (belongs to its DOI)
paper      → person         (belongs to its author)
paper      → journal        (belongs to its journal)
doi        → CrossRef       (belongs to its registrar)
file       → filesystem     (belongs to its container)
project    → active         (belongs to a lifecycle stage)
cell       → body           (belongs to an organism)
memory     → person         (belongs to a mind)
branch     → parent-branch  (belongs to its parent)

This is the only mechanism. Every relationship — hierarchy, affiliation, classification, lifecycle, provenance, authorship, location — is a belonging. There are no relation types, no role labels, no special columns. Just: this entity belongs to that target, starting when, ending when, according to whom.

#Locality

Assert only direct belongings. Derive everything else by traversal.

A track belongs to an artist. The artist belongs to a record label. Do not also tag the track with the label — that relationship is a query, not an assertion. A person belongs to mainenlab. Mainenlab belongs to the Neuroscience Programme. Do not tag the person with the programme.

The builder does not propagate belongings. Queries traverse them.

#Schema

Two tables in data/entities.db:

entities — things that exist.

ColumnPurpose
idtype:slug (e.g., person:davide-crombie, doi:10.1038/x)
typethe entity class
namedisplay name

belongings — every relation.

ColumnPurpose
entity_idFK to entities — the thing that belongs
targetwhat it belongs to (an entity id, a name, a path)
start_datewhen the belonging began (NULL = unknown)
end_datewhen it ended (NULL = ongoing or unknown)
sourcewho asserted this (the scanner, the user, the API)

#Examples

entity: person:davide-crombie
  → mainenlab             (2020, NULL, frontmatter)
  → phd-student           (2020, NULL, frontmatter)
  → project:inscription   (2026, NULL, frontmatter)

entity: org:mainen
  → org:neuroscience       (NULL, NULL, frontmatter)

entity: paper:f040022b
  → doi:10.1038/nrn1933   (NULL, NULL, papers.db)
  → person:Z. Mainen      (NULL, NULL, papers.db)
  → journal:Nature Reviews (2006, NULL, papers.db)
  → published             (2006, NULL, papers.db)
  → papers.db             (NULL, NULL, scan)

entity: track:abc123
  → person:Some Artist    (NULL, NULL, music.db)
  → Psytrance             (NULL, NULL, music.db)
  → music.db              (NULL, NULL, scan)

No role_type column. The target's identity tells you the nature of the relationship. A belonging to a person is authorship or artistry. A belonging to a genre name is classification. A belonging to "active" is lifecycle. The semantics are in the entities, not in labels on the arrows.

#Sources and schema mappings

A source is an entity that emits other entities and belongings. music.db is a source. papers.db is a source. A directory of markdown files is a source. Discogs is a source. iNaturalist is a source. Sources are not special — they're entities like anything else, and other entities belong to them.

A schema mapping is an operator (Definition 2) that transforms a source into entities and belongings. It is a method (Definition 3) — a repeatable procedure. Given a source, the mapping knows:

  • What in this source counts as an entity
  • What the entity's type and name are
  • What counts as a belonging
  • What the belonging's target and dates are

Currently, schema mappings are Python functions in build_entities.py (one per source). The target architecture: mappings as declarative data (YAML), applied by a generic builder. Adding a new source means writing a mapping, not writing code.

# Example: schema mapping for papers.db
source: data/papers.db
format: sqlite
entity:
  table: papers
  id: "paper:{id}"
  type: paper
  name: "{title}"
belongings:
  - column: doi
    target: "doi:{value}"
    ensure_target: {type: doi}
  - column: authors
    parse: json_array
    target: "person:{value}"
    ensure_target: {type: person}
  - column: journal
    target: "journal:{value}"
    ensure_target: {type: journal}
  - column: year
    as: start_date
    target_from: article_type
    map: {posted-content: preprint, journal-article: published}

Schema mappings are themselves entities. They belong to the source they map and to the ontology they map into. A change to a mapping is an ontological decision — it goes through the PR process like any other change.

No privileged source of truth. The system does not prescribe where truth lives. If two sources disagree about an entity's belongings, both assertions coexist. Conflict resolution is a policy question, resolved through the same PR mechanism as any other conflict (see 24-branch-visibility.md).

#Lifecycle

Lifecycle is a belonging with dates. An entity belongs to a lifecycle stage, and that belonging has a startdate and possibly an enddate. Birth is the start of a belonging. Death is the end. Joining is the start. Leaving is the end.

Common lifecycle stages (these are target names, not special values):

ContextStages
Methods, foundationsdraft → active → superseded
Skillsactive → deprecated
Projectsactive → paused → completed → archived
Organizationsactive → inactive → dissolved
Branchesactive → merged

#Queries

What does entity X belong to?

SELECT target, start_date, end_date, source FROM belongings
WHERE entity_id = 'person:davide-crombie'

What belongs to entity Y?

SELECT e.id, e.type, e.name FROM entities e
JOIN belongings b ON e.id = b.entity_id
WHERE b.target = 'org:mainen'

Walk the chain upward from a person:

-- Davide → mainenlab → neuroscience → champalimaud-research
-- Each step: SELECT target FROM belongings WHERE entity_id = ?

Everything that belongs to "active":

SELECT e.type, e.name FROM entities e
JOIN belongings b ON e.id = b.entity_id
WHERE b.target = 'active' AND b.end_date IS NULL

#Build

python3 scripts/build_entities.py [--verbose] rebuilds data/entities.db from scratch. The DB is disposable — delete it, scan again, same result. The builder applies schema mappings (currently hardcoded as functions) to each source.

#Portability

Two tables, one relation. The entity/belonging model is independent of storage backend, source format, or domain. It indexes SQLite, frontmatter, and external APIs equally. This is what Filix inherits from HAAK.

Architecture 22 — Entities and Belongings — 2026 — Zachary F. Mainen / HAAK