Situation Mapping Engines

Ontology/09 establishes that situations are primitive and relationships are derived. This plan specifies how to build the mapping engines that turn document corpora into situation graphs, starting…

Ontology/09 establishes that situations are primitive and relationships are derived. This plan specifies how to build the mapping engines that turn document corpora into situation graphs, starting with the Epstein public archives and generalizing to any source.

#Architecture

A mapping engine is a schema mapping (22-lifecycle.md) that produces situation entities and participation belongings. Each engine is:

  1. A YAML mapping declaration — what counts as a situation, who the participants are, what the evidence is
  2. A builder function — applies the mapping to a source, emits entities and belongings into data/entities.db
  3. A materialized view — optional SQL views that derive relationship summaries from the situation graph

The engine is a method (Definition 3). Adding a new corpus means writing a mapping, not writing code. The builder is generic; the mappings are specific.

#Current state

public-archives.db holds project-specific tables:

TableRowsStatus
documents2,356,681Phase 1 complete — metadata indexed, 0 pulled
entities~1,600,000Mixed — KG entities (606) + ICIJ (839K) + WikiLeaks (8.7K)
contacts2,412Aggregate summaries — ontologically wrong, to be replaced
document_entities0Unpopulated — needs Phase 2 extraction
crosssourcematches25,900Auto-matched, 0 human-verified

entities.db holds the unified entity/belonging model (from build_entities.py). The two databases are currently disjoint.

#Phase 0: Unify the models

Before building mapping engines, bridge public-archives.db into entities.db.

  • [ ] Write schema mapping for public-archives.dbentities.db
  • KG persons → person: entities
  • ICIJ entities → org:, person:, address: entities
  • Documents → document: entities (519K DOJ + 251K cables + 297 DDoSecrets)
  • Sources → source: entities
  • [ ] Migrate cross-source matches → shared belongings (two entity IDs belong to the same canonical person entity)
  • [ ] Keep public-archives.db as the raw ingest store; entities.db becomes the unified graph

Depends on: nothing — can start immediately.

#Phase 1: Epstein situation engine

The first mapping engine. Decomposes the 2,412 aggregate contacts into individual situation entities.

#1a: Decompose knowledge graph contacts

The knowledge graph has aggregate edges with counts. For edges that carry individual evidence (document IDs in evidencedocids JSON):

  • [ ] For each contact row with evidence docs: create one situation: entity per evidence doc
  • [ ] Participants belong to the situation; method (email/flight/payment) belongs to it; evidence doc belongs to it
  • [ ] For contacts without individual evidence: create a single aggregate situation entity flagged granularity:aggregate
  • [ ] Materialized view: contacts_derived replaces the contacts table

#1b: Flight log decomposition

The 1,449 flight records from the knowledge graph. Each row represents a shared flight. The flight manifest is the evidence document.

  • [ ] Each flight → situation:flight-YYYY-MM-DD-NNN entity
  • [ ] All passengers belong to it (not just pairs — a flight with 4 passengers generates one situation, not 6 pairs)
  • [ ] Method: method:flight

#1c: Email decomposition (requires S3 pull)

The 325 email edges (directed) represent aggregate counts. Decomposing them into individual situations requires pulling the actual emails from S3.

  • [ ] Pull email documents from exoscale:filix-epstein-files for known email contacts
  • [ ] OCR/parse each email: extract From, To, CC, Date, Subject
  • [ ] Each email → situation:email-YYYY-MM-DD-NNN entity
  • [ ] Sender and all recipients belong to it
  • [ ] Method: method:email

Depends on: S3 uploads complete (done as of 2026-03-10)

#Phase 2: WikiLeaks cable engine

Each cable is a situation. The metadata is already in public-archives.db.

  • [ ] YAML mapping: cable → situation, sender embassy → participant, recipient → participant
  • [ ] Classification level → belonging to a policy domain entity
  • [ ] Reference cables → cross-references between situations
  • [ ] 251,287 cables → 251,287 situation entities

Depends on: Phase 0 (cables already indexed, just need entity migration)

#Phase 3: ICIJ offshore engine

Corporate relationships are standing situations with extended temporal bounds.

  • [ ] Officer→entity relationships → situation entities with officer and company as participants
  • [ ] Incorporation dates → temporal bounds on the situation
  • [ ] Jurisdiction → domain belonging
  • [ ] Intermediary relationships → situations linking intermediary to entity
  • [ ] 839K entities, potentially millions of situations

Depends on: Phase 0

#Phase 4: Generic engine + YAML mappings

Extract the common pattern from Phases 1–3 into a generic builder.

  • [ ] Define YAML mapping schema (extends 22-lifecycle.md format):

```yaml source: data/public-archives.db situationmapping: table: contacts idtemplate: "situation:{source}-{rowid}" methodcolumn: contact_type participants:

  • column: senderentityid

resolve_to: entities

  • column: receiverentityid

resolve_to: entities evidence:

  • column: evidencedocids

parse: jsonarray temporal: start: datefirst end: date_last ```

  • [ ] Generic builder reads YAML, emits entities + belongings
  • [ ] Migrate Phases 1–3 engines to YAML mappings
  • [ ] Document mapping format in 22-lifecycle.md

Depends on: Phase 1–3 (need concrete experience before abstracting)

#Phase 5: Inference engine

Graph-based inference over the situation graph. Separate from mapping — runs after situations exist.

  • [ ] Redaction resolver: for each [REDACTED] in a document, score candidate entities by co-participation graph constraints
  • [ ] Gap detector: identify temporal gaps in expected co-participation patterns
  • [ ] Cross-source linker: propose identity matches based on graph topology (beyond the current name-matching in crosssourcematches)
  • [ ] All inferences stored with confidence < 1.0 and explicit evidential chains
  • [ ] Inference review UI: present candidates for human verification

Depends on: Phase 1 (need enough situations for graph structure to be informative)

#Phase 6: Navigation

Browsable situation graph.

  • [ ] Extend entity browser (scripts/entity_browser.py) with situation navigation
  • [ ] Person view: all situations, co-participants ranked by frequency, temporal timeline
  • [ ] Situation view: all participants, method, evidence doc, date
  • [ ] Graph view: 2D layout of person nodes, weighted by shared situations
  • [ ] Search: find situations by participant, method, date range, or keyword in evidence

Depends on: Phase 4 (need the full graph to navigate)

#What this replaces

The contacts table in public-archives.db becomes a materialized view. The crosssourcematches table becomes shared belongings (two entity IDs belonging to the same canonical person). The document_entities table merges into standard belongings (document belongs to the entities it mentions).

The project-specific schema (public-archives.db) persists as the raw ingest store — it holds the original data as received from each source. The unified entities.db holds the ontologically correct representation: situations and belongings. The raw store is the archive. The entity store is the graph.

#Timeline

PhaseScopeBlocking?
0Unify modelsYes — everything depends on this
1aKG contact decompositionNo — can proceed with available data
1bFlight decompositionNo — parallel with 1a
1cEmail decompositionBlocked on S3 pull + OCR pipeline
2WikiLeaks cablesNo — parallel with Phase 1
3ICIJ offshoreNo — parallel with Phase 1
4Generic engineBlocked on Phases 1–3 (need examples first)
5InferenceBlocked on Phase 1 (need graph density)
6NavigationBlocked on Phase 4

Phase 0 first, then Phases 1–3 in parallel, then 4, then 5–6.


strategy · 2026-03-10 · zach + claude

Strategy 22 — Situation Mapping Engines — 2026 — Zachary F. Mainen / HAAK