Skip to content

Diff: substrate/nightly-datagraph-rebuild

From 1c02ec1 to 1c02ec1

+0 / −0 lines
BeforeAfter
--- ---
schema: foundry-doc-v1 schema: foundry-doc-v1
title: "Nightly Datagraph rebuild" title: "Nightly Datagraph rebuild"
slug: nightly-datagraph-rebuild slug: nightly-datagraph-rebuild
category: substrate category: substrate
type: concept type: concept
content_type: topic content_type: topic
status: stub status: stub
short_description: "The scheduled process that reconstructs the platform's knowledge graph from canonical flat-file sources each night, producing a fresh queryable substrate from deterministic inputs without AI involvement." short_description: "The scheduled process that reconstructs the platform's knowledge graph from canonical flat-file sources each night, producing a fresh queryable substrate from deterministic inputs without AI involvement."
bcsc_class: public-disclosure-safe bcsc_class: public-disclosure-safe
last_edited: 2026-05-18 last_edited: 2026-05-18
editor: pointsav-engineering editor: pointsav-engineering
paired_with: nightly-datagraph-rebuild.es.md paired_with: nightly-datagraph-rebuild.es.md
--- ---
The nightly datagraph rebuild is the scheduled pipeline that reconstructs the platform's full knowledge graph from its canonical flat-file sources. Every graph-queryable relationship — entity links, [[service-extraction|extraction outputs]], [[worm-ledger-architecture|ledger entries]], and [[location-intelligence-substrate|location intelligence indexes]] — is derived from the same deterministic inputs each cycle. The result is a fresh, stable snapshot available to all query consumers at the start of each operating day. The nightly datagraph rebuild is the scheduled pipeline that reconstructs the platform's full knowledge graph from its canonical flat-file sources. Every graph-queryable relationship — entity links, [[service-extraction|extraction outputs]], [[worm-ledger-architecture|ledger entries]], and [[location-intelligence-substrate|location intelligence indexes]] — is derived from the same deterministic inputs each cycle. The result is a fresh, stable snapshot available to all query consumers at the start of each operating day.
## Key Takeaways ## Key Takeaways
- The graph is rebuilt from flat-file sources nightly, not maintained by continuous mutation. Consumers read a stable snapshot, not a live partially-constructed graph. - The graph is rebuilt from flat-file sources nightly, not maintained by continuous mutation. Consumers read a stable snapshot, not a live partially-constructed graph.
- Every rebuild cycle can be replicated from archived flat files. The same inputs produce the same graph — no AI inference, no probabilistic classification. - Every rebuild cycle can be replicated from archived flat files. The same inputs produce the same graph — no AI inference, no probabilistic classification.
- The rebuild pattern enforces [SYS-ADR-07](governance/architecture-decisions) compliance: structured entity data is produced by deterministic rules, not AI model outputs. - The rebuild pattern enforces [SYS-ADR-07](governance/architecture-decisions) compliance: structured entity data is produced by deterministic rules, not AI model outputs.
- Each cycle compounds the prior cycle. Newly committed records extend the graph; no record is removed. The [[compounding-substrate]] mechanism means the graph grows monotonically accurate over time. - Each cycle compounds the prior cycle. Newly committed records extend the graph; no record is removed. The [[compounding-substrate]] mechanism means the graph grows monotonically accurate over time.
## Purpose ## Purpose
The rebuild pattern ensures that the queryable substrate reflects the committed state of the canonical record, not accumulated in-memory drift. Any single run can be replicated from the archived flat files. The rebuild pattern ensures that the queryable substrate reflects the committed state of the canonical record, not accumulated in-memory drift. Any single run can be replicated from the archived flat files.
The pipeline runs without AI inference. Relationships are computed by deterministic extraction rules and schema-driven joins, not by probabilistic classification. This is the SYS-ADR-07 enforcement boundary: structured graph data is a computed product of deterministic rules applied to verified records, not a model-generated artefact. The pipeline runs without AI inference. Relationships are computed by deterministic extraction rules and schema-driven joins, not by probabilistic classification. This is the SYS-ADR-07 enforcement boundary: structured graph data is a computed product of deterministic rules applied to verified records, not a model-generated artefact.
## Pipeline stages ## Pipeline stages
The rebuild pipeline follows a fixed sequence: The rebuild pipeline follows a fixed sequence:
1. **Ledger snapshot** — reads the current committed state of all [[worm-ledger-design|WORM ledger]] segments. The ledger is append-only; the snapshot is the complete history as of the scheduled start time. 1. **Ledger snapshot** — reads the current committed state of all [[worm-ledger-design|WORM ledger]] segments. The ledger is append-only; the snapshot is the complete history as of the scheduled start time.
2. **Extraction pass** — [[service-extraction|service-extraction]] runs its deterministic entity-recognition rules against the snapshot, producing entity records for persons, organisations, assets, and events. 2. **Extraction pass** — [[service-extraction|service-extraction]] runs its deterministic entity-recognition rules against the snapshot, producing entity records for persons, organisations, assets, and events.
3. **Schema-driven joins** — entity records are joined against the canonical taxonomy and location intelligence indexes using explicit foreign-key relationships. No fuzzy matching at this stage. 3. **Schema-driven joins** — entity records are joined against the canonical taxonomy and location intelligence indexes using explicit foreign-key relationships. No fuzzy matching at this stage.
4. **Graph construction** — joined records are assembled into the queryable graph substrate consumed by [[service-content|service-content]] and the [[doorman-protocol|Doorman inference layer]]. 4. **Graph construction** — joined records are assembled into the queryable graph substrate consumed by [[service-content|service-content]] and the [[doorman-protocol|Doorman inference layer]].
5. **Swap** — the completed graph replaces the prior snapshot atomically. Query consumers switch to the new version at the next request after the swap. 5. **Swap** — the completed graph replaces the prior snapshot atomically. Query consumers switch to the new version at the next request after the swap.
## Position in the substrate stack ## Position in the substrate stack
The nightly rebuild sits between the [[worm-ledger-design|WORM ledger]] (which accumulates append-only writes during the day) and the query-serving tier (which reads the most recently completed graph). Consumers of the [[knowledge-graph-grounded-apprenticeship|knowledge graph]] always read a stable snapshot, not a partially-constructed graph. The nightly rebuild sits between the [[worm-ledger-design|WORM ledger]] (which accumulates append-only writes during the day) and the query-serving tier (which reads the most recently completed graph). Consumers of the [[knowledge-graph-grounded-apprenticeship|knowledge graph]] always read a stable snapshot, not a partially-constructed graph.
The [[compounding-substrate]] mechanism means each rebuild cycle inherits the full prior graph, then adds newly committed records on top. Accuracy compounds over time: an entity that appeared in three ledger records two years ago and twelve ledger records last month has a richer graph node than a newly registered entity — without any manual curation step. The [[compounding-substrate]] mechanism means each rebuild cycle inherits the full prior graph, then adds newly committed records on top. Accuracy compounds over time: an entity that appeared in three ledger records two years ago and twelve ledger records last month has a richer graph node than a newly registered entity — without any manual curation step.
## See also ## See also
- [[compounding-substrate]] — the mechanism by which each rebuild cycle compounds prior knowledge - [[compounding-substrate]] — the mechanism by which each rebuild cycle compounds prior knowledge
- [[worm-ledger-design]] — the append-only ledger that feeds the rebuild pipeline - [[worm-ledger-design]] — the append-only ledger that feeds the rebuild pipeline
- [[service-extraction]] — the extraction service that produces entity records consumed by the rebuild - [[service-extraction]] — the extraction service that produces entity records consumed by the rebuild
- [[service-content]] — the query-serving service that reads the completed graph - [[service-content]] — the query-serving service that reads the completed graph
- [[doorman-protocol]] — the inference-layer client that queries the graph for entity context - [[doorman-protocol]] — the inference-layer client that queries the graph for entity context