Multiple agents work concurrently on the same machine. Multiple filix instances may share projects across machines. Media lives on Exoscale S3. This document specifies how data flows between these layers.
#The problem
HAAK has 7 SQLite databases, each serving a domain:
| DB | Entities | Writes from |
|---|---|---|
| entities.db | 11,000+ | build_entities.py (derived index) |
| papers.db | 1,983 | /papers skill, lit-search |
| music.db | 4,947 | ingest_music.py |
| contacts.db | 3,792 | Gmail sync, manual |
| transcripts.db | 74 | ingestzoom, ingestpodcasts, etc. |
| storage.db | objects | sync, triage |
| files.db | local files | files_watcher daemon |
Three concurrency pressures:
- Multiple local agents — several Claude sessions running simultaneously, all reading and potentially writing to the same DBs.
- Cross-machine sharing — two filix instances (e.g., laptop + server) working on a shared project need to see each other's entities.
- Media access — agents on any machine need to reference files stored in Antilia (Exoscale S3).
SQLite handles (1) acceptably with WAL mode — multiple readers, single writer, no corruption. But it cannot handle (2) at all. And (3) is already solved by S3 URLs.
#Architecture: SQLite + Postgres
| Layer | Engine | Location | Role |
|---|---|---|---|
| Local | SQLite (WAL) | Each machine, data/*.db | Fast reads for agents, offline-capable |
| Upstream | Postgres | Exoscale (or Railway) | Shared truth, concurrent writes, cross-instance sync |
| Media | Exoscale S3 | Antilia buckets (filix-*) | Binary storage, referenced by URL |
#Why not Postgres-only?
Latency. An agent scanning 11,000 entities during a session needs sub-millisecond reads. SQLite delivers that. Postgres over the network adds 10-50ms per query. Over a session with hundreds of queries, that's minutes of wait.
#Why not Redis?
Redis is a cache, not a database. You still need durable storage behind it. Postgres gives durability, SQL queries, JSONB for flexible tags, concurrent writes with proper isolation, and we already run it on Railway.
#Why not SQLite replication (Litestream, LiteFS)?
These replicate a single SQLite file to S3 or a primary node. They solve backup, not multi-writer. Two machines writing to the same replicated SQLite still need conflict resolution that these tools don't provide. Postgres does.
#Data flow
Agent (local)
↓ reads from
SQLite (data/*.db)
↓ writes to
SQLite (data/*.db) ← fast, local
↓ sync daemon pushes to
Postgres (upstream) ← shared truth
↓ other instances pull from
SQLite (other machine's data/*.db)
#Sync protocol
The sync daemon runs on each machine (launchd on macOS, systemd on Linux).
Push: After any local write, the daemon pushes changed rows to Postgres. Each table has a syncedat timestamp. Push sends rows where syncedat is NULL or newer than last push.
Pull: Periodically (or on demand), the daemon pulls rows from Postgres that are newer than local. Inserts or updates local SQLite.
Conflict resolution: Last-write-wins, keyed on (entityid, role, roletype) for events. Events are mostly append-only (new tags, new lifecycle stages), so conflicts are rare. When they occur, the most recent syncedat wins. Deletions are soft — status='ended' with an enddate, never hard delete.
#What syncs
Not everything needs to sync. Local-only DBs (files.db — tracks filesystem state for one machine) stay local. Shared knowledge syncs.
| DB | Syncs? | Why |
|---|---|---|
| entities.db | Yes | Core shared ontology |
| papers.db | Yes | Shared research library |
| contacts.db | Yes | Shared contacts |
| transcripts.db | Yes | Shared transcript library |
| music.db | Optional | Personal unless sharing DJ sets |
| storage.db | Yes | Antilia object inventory |
| files.db | No | Machine-specific filesystem state |
#Postgres schema
Mirrors the SQLite schema exactly. Same two tables for entities.db:
CREATE TABLE entities (
id TEXT PRIMARY KEY,
type TEXT NOT NULL,
name TEXT NOT NULL,
synced_at TIMESTAMPTZ DEFAULT now(),
synced_from TEXT -- which machine pushed this
);
CREATE TABLE events (
entity_id TEXT NOT NULL REFERENCES entities(id),
role TEXT NOT NULL,
role_type TEXT NOT NULL,
status TEXT DEFAULT 'active',
start_date TEXT,
end_date TEXT,
source TEXT,
synced_at TIMESTAMPTZ DEFAULT now(),
synced_from TEXT
);
CREATE INDEX idx_events_entity ON events(entity_id);
CREATE INDEX idx_events_role ON events(role);
CREATE INDEX idx_events_role_type ON events(role_type);
CREATE INDEX idx_events_synced ON events(synced_at);
Domain DBs (papers, contacts, transcripts) get the same treatment: add syncedat and syncedfrom columns, mirror to Postgres.
#Antilia and shared projects
When two filix instances share a project, they share:
- The project directory — git handles this (shared repo or branch)
- Entity references — entities tagged with that project sync through Postgres
- Media files — stored in Antilia (S3), referenced by URL in entities.db
The source column on events tracks provenance: which instance asserted what. If instance A tags a paper with domain 5-HT and instance B tags it with mouse, both events coexist — no conflict, both true.
#Media sharing via Antilia
Antilia is already shared — S3 is accessible from any machine with credentials. The storage.db objects table tracks what's in each bucket. When a filix instance needs a file:
- Check
entities.dbfor the entity'ssource='storage.db'event - Look up the object key in
storage.db - Fetch from S3 via rclone or direct HTTPS
No need to replicate media between machines. S3 is the single copy. Local caches (for offline work) are optional and managed per-machine.
#Implementation path
| Phase | What | Depends on |
|---|---|---|
| 0 (now) | SQLite with WAL mode, build_entities.py | Done |
| 1 | Add synced_at columns to domain DBs | Schema migration |
| 2 | Provision Postgres on Exoscale (or Railway) | Account setup |
| 3 | Sync daemon — push/pull for entities.db | Phase 1 + 2 |
| 4 | Extend sync to papers.db, contacts.db, transcripts.db | Phase 3 |
| 5 | Second filix instance (e.g., H100 server) joins | Phase 3 |
Phase 0 is operational. Phase 1-2 are prerequisites. Phase 3 is the core work. Phases 4-5 are incremental.
#Exoscale as home
Exoscale hosts both tiers of persistent infrastructure:
- Antilia (S3) — media storage, five
filix-*buckets - Postgres — shared entity database (Exoscale DBaaS or self-hosted on micro instance)
Both in the same zone (de-muc-1 or ch-gva-2), minimizing latency between the DB and S3. Swiss/German jurisdiction for data sovereignty.
The H100 (AWS EC2) is compute, not storage. It pulls from Exoscale, processes, pushes results back. Antilia and Postgres are the durable layer; compute instances are ephemeral.
Architecture 23 — Data Infrastructure — 2026 — Zachary F. Mainen / HAAK