Ontology/09 establishes that situations are primitive and relationships are derived. This plan specifies how to build the mapping engines that turn document corpora into situation graphs, starting with the Epstein public archives and generalizing to any source.
#Architecture
A mapping engine is a schema mapping (22-lifecycle.md) that produces situation entities and participation belongings. Each engine is:
- A YAML mapping declaration — what counts as a situation, who the participants are, what the evidence is
- A builder function — applies the mapping to a source, emits entities and belongings into
data/entities.db - A materialized view — optional SQL views that derive relationship summaries from the situation graph
The engine is a method (Definition 3). Adding a new corpus means writing a mapping, not writing code. The builder is generic; the mappings are specific.
#Current state
public-archives.db holds project-specific tables:
| Table | Rows | Status |
|---|---|---|
| documents | 2,356,681 | Phase 1 complete — metadata indexed, 0 pulled |
| entities | ~1,600,000 | Mixed — KG entities (606) + ICIJ (839K) + WikiLeaks (8.7K) |
| contacts | 2,412 | Aggregate summaries — ontologically wrong, to be replaced |
| document_entities | 0 | Unpopulated — needs Phase 2 extraction |
| crosssourcematches | 25,900 | Auto-matched, 0 human-verified |
entities.db holds the unified entity/belonging model (from build_entities.py). The two databases are currently disjoint.
#Phase 0: Unify the models
Before building mapping engines, bridge public-archives.db into entities.db.
- [ ] Write schema mapping for
public-archives.db→entities.db - KG persons →
person:entities - ICIJ entities →
org:,person:,address:entities - Documents →
document:entities (519K DOJ + 251K cables + 297 DDoSecrets) - Sources →
source:entities - [ ] Migrate cross-source matches → shared belongings (two entity IDs belong to the same canonical person entity)
- [ ] Keep
public-archives.dbas the raw ingest store;entities.dbbecomes the unified graph
Depends on: nothing — can start immediately.
#Phase 1: Epstein situation engine
The first mapping engine. Decomposes the 2,412 aggregate contacts into individual situation entities.
#1a: Decompose knowledge graph contacts
The knowledge graph has aggregate edges with counts. For edges that carry individual evidence (document IDs in evidencedocids JSON):
- [ ] For each contact row with evidence docs: create one
situation:entity per evidence doc - [ ] Participants belong to the situation; method (email/flight/payment) belongs to it; evidence doc belongs to it
- [ ] For contacts without individual evidence: create a single aggregate situation entity flagged
granularity:aggregate - [ ] Materialized view:
contacts_derivedreplaces the contacts table
#1b: Flight log decomposition
The 1,449 flight records from the knowledge graph. Each row represents a shared flight. The flight manifest is the evidence document.
- [ ] Each flight →
situation:flight-YYYY-MM-DD-NNNentity - [ ] All passengers belong to it (not just pairs — a flight with 4 passengers generates one situation, not 6 pairs)
- [ ] Method:
method:flight
#1c: Email decomposition (requires S3 pull)
The 325 email edges (directed) represent aggregate counts. Decomposing them into individual situations requires pulling the actual emails from S3.
- [ ] Pull email documents from
exoscale:filix-epstein-filesfor known email contacts - [ ] OCR/parse each email: extract From, To, CC, Date, Subject
- [ ] Each email →
situation:email-YYYY-MM-DD-NNNentity - [ ] Sender and all recipients belong to it
- [ ] Method:
method:email
Depends on: S3 uploads complete (done as of 2026-03-10)
#Phase 2: WikiLeaks cable engine
Each cable is a situation. The metadata is already in public-archives.db.
- [ ] YAML mapping: cable → situation, sender embassy → participant, recipient → participant
- [ ] Classification level → belonging to a policy domain entity
- [ ] Reference cables → cross-references between situations
- [ ] 251,287 cables → 251,287 situation entities
Depends on: Phase 0 (cables already indexed, just need entity migration)
#Phase 3: ICIJ offshore engine
Corporate relationships are standing situations with extended temporal bounds.
- [ ] Officer→entity relationships → situation entities with officer and company as participants
- [ ] Incorporation dates → temporal bounds on the situation
- [ ] Jurisdiction → domain belonging
- [ ] Intermediary relationships → situations linking intermediary to entity
- [ ] 839K entities, potentially millions of situations
Depends on: Phase 0
#Phase 4: Generic engine + YAML mappings
Extract the common pattern from Phases 1–3 into a generic builder.
- [ ] Define YAML mapping schema (extends 22-lifecycle.md format):
```yaml source: data/public-archives.db situationmapping: table: contacts idtemplate: "situation:{source}-{rowid}" methodcolumn: contact_type participants:
- column: senderentityid
resolve_to: entities
- column: receiverentityid
resolve_to: entities evidence:
- column: evidencedocids
parse: jsonarray temporal: start: datefirst end: date_last ```
- [ ] Generic builder reads YAML, emits entities + belongings
- [ ] Migrate Phases 1–3 engines to YAML mappings
- [ ] Document mapping format in 22-lifecycle.md
Depends on: Phase 1–3 (need concrete experience before abstracting)
#Phase 5: Inference engine
Graph-based inference over the situation graph. Separate from mapping — runs after situations exist.
- [ ] Redaction resolver: for each
[REDACTED]in a document, score candidate entities by co-participation graph constraints - [ ] Gap detector: identify temporal gaps in expected co-participation patterns
- [ ] Cross-source linker: propose identity matches based on graph topology (beyond the current name-matching in crosssourcematches)
- [ ] All inferences stored with
confidence < 1.0and explicit evidential chains - [ ] Inference review UI: present candidates for human verification
Depends on: Phase 1 (need enough situations for graph structure to be informative)
#Phase 6: Navigation
Browsable situation graph.
- [ ] Extend entity browser (scripts/entity_browser.py) with situation navigation
- [ ] Person view: all situations, co-participants ranked by frequency, temporal timeline
- [ ] Situation view: all participants, method, evidence doc, date
- [ ] Graph view: 2D layout of person nodes, weighted by shared situations
- [ ] Search: find situations by participant, method, date range, or keyword in evidence
Depends on: Phase 4 (need the full graph to navigate)
#What this replaces
The contacts table in public-archives.db becomes a materialized view. The crosssourcematches table becomes shared belongings (two entity IDs belonging to the same canonical person). The document_entities table merges into standard belongings (document belongs to the entities it mentions).
The project-specific schema (public-archives.db) persists as the raw ingest store — it holds the original data as received from each source. The unified entities.db holds the ontologically correct representation: situations and belongings. The raw store is the archive. The entity store is the graph.
#Timeline
| Phase | Scope | Blocking? |
|---|---|---|
| 0 | Unify models | Yes — everything depends on this |
| 1a | KG contact decomposition | No — can proceed with available data |
| 1b | Flight decomposition | No — parallel with 1a |
| 1c | Email decomposition | Blocked on S3 pull + OCR pipeline |
| 2 | WikiLeaks cables | No — parallel with Phase 1 |
| 3 | ICIJ offshore | No — parallel with Phase 1 |
| 4 | Generic engine | Blocked on Phases 1–3 (need examples first) |
| 5 | Inference | Blocked on Phase 1 (need graph density) |
| 6 | Navigation | Blocked on Phase 4 |
Phase 0 first, then Phases 1–3 in parallel, then 4, then 5–6.
strategy · 2026-03-10 · zach + claude
Strategy 22 — Situation Mapping Engines — 2026 — Zachary F. Mainen / HAAK