Skip to content

Diff: services/service-slm-graph-store-migration

From 21d8df6 to 21d8df6

+0 / −0 lines
BeforeAfter
--- ---
schema: foundry-doc-v1 schema: foundry-doc-v1
title: "service-slm graph store migration" title: "service-slm graph store migration"
slug: service-slm-graph-store-migration slug: service-slm-graph-store-migration
category: services category: services
type: concept type: concept
quality: pre-build quality: pre-build
status: pre-build status: pre-build
audience: vendor-public audience: vendor-public
bcsc_class: current-fact bcsc_class: current-fact
language_protocol: PROSE-TOPIC language_protocol: PROSE-TOPIC
last_edited: 2026-05-25 last_edited: 2026-05-25
editor: pointsav-engineering editor: pointsav-engineering
paired_with: service-slm-graph-store-migration.es.md paired_with: service-slm-graph-store-migration.es.md
short_description: "service-slm migrated its graph store from LadybugDB to SQLite for fleet nodes and integrates a nightly DataGraph rebuild that processes the operator data corpus through the Doorman into the property graph used for inference context injection." short_description: "service-slm migrated its graph store from LadybugDB to SQLite for fleet nodes and integrates a nightly DataGraph rebuild that processes the operator data corpus through the Doorman into the property graph used for inference context injection."
cites: [] cites: []
--- ---
The [[service-slm]] graph store is a live property graph of named business entities extracted nightly from an operator's data corpus — the entity layer that [[service-content]] uses to inject structured business context into every inference request without sending proprietary data to an external model. The graph is stored in LadybugDB and rebuilt on a nightly schedule by the DataGraph rebuild script, which runs as Phase 1 of the Elastic Compute nightly window before the model-training phase claims the GPU. The [[service-slm]] graph store is a live property graph of named business entities extracted nightly from an operator's data corpus — the entity layer that [[service-content]] uses to inject structured business context into every inference request without sending proprietary data to an external model. The graph is stored in LadybugDB and rebuilt on a nightly schedule by the DataGraph rebuild script, which runs as Phase 1 of the Elastic Compute nightly window before the model-training phase claims the GPU.
Each night, the DataGraph rebuild script processes the operator data Each night, the DataGraph rebuild script processes the operator data
corpus and writes extracted named entities to a property graph stored in corpus and writes extracted named entities to a property graph stored in
LadybugDB. This property graph — the deployment DataGraph — is the entity layer LadybugDB. This property graph — the deployment DataGraph — is the entity layer
that service-content uses to inject structured business context into inference that service-content uses to inject structured business context into inference
requests. The rebuild runs as Phase 1 of the Elastic Compute #1 nightly window, before requests. The rebuild runs as Phase 1 of the Elastic Compute #1 nightly window, before
the training phase claims the GPU. The deployment DataGraph is live, with an the training phase claims the GPU. The deployment DataGraph is live, with an
11 MB LadybugDB file currently active at service-content. 11 MB LadybugDB file currently active at service-content.
## What the DataGraph contains ## What the DataGraph contains
The deployment DataGraph is a property graph of named business entities The deployment DataGraph is a property graph of named business entities
extracted from the operator deployment's data corpus. The graph holds five extracted from the operator deployment's data corpus. The graph holds five
entity classifications: Person (staff, contacts, counterparties), Company entity classifications: Person (staff, contacts, counterparties), Company
(vendors, customers, partner organisations), Project (active and historical (vendors, customers, partner organisations), Project (active and historical
engagements), Account (financial accounts and ledger references), and engagements), Account (financial accounts and ledger references), and
Location (offices, sites, and operational addresses). These entities are Location (offices, sites, and operational addresses). These entities are
extracted from three document streams: meeting transcript markdown files extracted from three document streams: meeting transcript markdown files
from the minutebook asset directory, research and background YAML and markdown from the minutebook asset directory, research and background YAML and markdown
files from the service-agents directory, and contact source JSON records from files from the service-agents directory, and contact source JSON records from
the [[service-people]] directory. the [[service-people]] directory.
## What the nightly rebuild does ## What the nightly rebuild does
For each unprocessed document, the rebuild script calls For each unprocessed document, the rebuild script calls
`POST :9080/v1/chat/completions` through the [[doorman-protocol|Doorman]] endpoint, passing the `POST :9080/v1/chat/completions` through the [[doorman-protocol|Doorman]] endpoint, passing the
document text with a JSON Schema grammar constraint. The language model — document text with a JSON Schema grammar constraint. The language model —
OLMo 3 32B Think running on Elastic Compute #1 via vLLM — returns a structured JSON OLMo 3 32B Think running on Elastic Compute #1 via vLLM — returns a structured JSON
array of entity objects. Each object carries the entity name, classification, array of entity objects. Each object carries the entity name, classification,
confidence score, and optional role, location, and contact vectors. The script confidence score, and optional role, location, and contact vectors. The script
then calls `POST :9081/v1/graph/mutate` on service-content to write those then calls `POST :9081/v1/graph/mutate` on service-content to write those
entities into LadybugDB. The health probe at the end of the cycle queries entities into LadybugDB. The health probe at the end of the cycle queries
service-content for the current entity count and writes a summary JSON file service-content for the current entity count and writes a summary JSON file
at `$DEPLOYMENT_ROOT/data/datagraph-health.json`. at `$DEPLOYMENT_ROOT/data/datagraph-health.json`.
The script processes three document batches each run: the full minutebook The script processes three document batches each run: the full minutebook
asset tree, the full service-agents tree, and the 50 most recent unprocessed asset tree, the full service-agents tree, and the 50 most recent unprocessed
service-people JSON files. A randomised inter-document delay (0.3 to 1.5 service-people JSON files. A randomised inter-document delay (0.3 to 1.5
seconds) prevents the Doorman from receiving a burst of requests that could seconds) prevents the Doorman from receiving a burst of requests that could
interfere with the training phase startup. interfere with the training phase startup.
## The routing parity principle ## The routing parity principle
The DataGraph rebuild script calls only the same two REST API The DataGraph rebuild script calls only the same two REST API
endpoints that any operator or community member running service-slm and endpoints that any operator or community member running service-slm and
service-content would call from their own automation: service-content would call from their own automation:
- `POST :9080/v1/chat/completions` — entity extraction through Doorman - `POST :9080/v1/chat/completions` — entity extraction through Doorman
- `POST :9081/v1/graph/mutate` — entity write through service-content - `POST :9081/v1/graph/mutate` — entity write through service-content
There is no file-watcher shortcut, no internal gRPC bypass, and no direct There is no file-watcher shortcut, no internal gRPC bypass, and no direct
database write. This is a deliberate design decision. If the rebuild script database write. This is a deliberate design decision. If the rebuild script
fails, the failure indicates a real defect in service-slm or service-content fails, the failure indicates a real defect in service-slm or service-content
that would also affect any operator or customer running the same API surface. that would also affect any operator or customer running the same API surface.
The nightly rebuild functions as a full-stack integration test that runs The nightly rebuild functions as a full-stack integration test that runs
against production services on production data every night. Failures are against production services on production data every night. Failures are
explicit and immediately actionable rather than hidden in an internal path explicit and immediately actionable rather than hidden in an internal path
that real callers would never exercise. that real callers would never exercise.
## Idempotency ## Idempotency
The script tracks processed documents using a local [[worm-ledger-design|ledger]] at The script tracks processed documents using a local [[worm-ledger-design|ledger]] at
`$DEPLOYMENT_ROOT/data/datagraph-processed.txt`. Each document is identified by `$DEPLOYMENT_ROOT/data/datagraph-processed.txt`. Each document is identified by
a hash of its file content, prefixed with a source tag (`mk-` for minutebook, a hash of its file content, prefixed with a source tag (`mk-` for minutebook,
`ag-` for service-agents, `sp-` for service-people). Before processing any `ag-` for service-agents, `sp-` for service-people). Before processing any
document, the script checks whether its identifier appears in the ledger. If document, the script checks whether its identifier appears in the ledger. If
it does, the document is skipped. After a successful `graph/mutate` call, the it does, the document is skipped. After a successful `graph/mutate` call, the
identifier is appended to the ledger. This mechanism ensures that documents identifier is appended to the ledger. This mechanism ensures that documents
are not re-processed across multiple nightly runs, even if the same content are not re-processed across multiple nightly runs, even if the same content
is present in the source directories. is present in the source directories.
The ledger is append-only and not pruned automatically. If service-content The ledger is append-only and not pruned automatically. If service-content
is restarted and the graph is rebuilt from scratch, the ledger can be cleared is restarted and the graph is rebuilt from scratch, the ledger can be cleared
to force a full re-extraction on the next nightly run. to force a full re-extraction on the next nightly run.
## Graph context injection ## Graph context injection
The deployment DataGraph is not a static reference store. service-content The deployment DataGraph is not a static reference store. service-content
queries it before each inference request. When the Doorman receives a queries it before each inference request. When the Doorman receives a
completion request from an operator or application, service-content retrieves completion request from an operator or application, service-content retrieves
entities relevant to the request context — based on module ID, entity entities relevant to the request context — based on module ID, entity
classification, and confidence thresholds — and injects them into the system classification, and confidence thresholds — and injects them into the system
message as a structured entity context block. The language model receives message as a structured entity context block. The language model receives
structured business context (who the relevant people are, what projects are structured business context (who the relevant people are, what projects are
active, which companies are counterparties) without requiring that structured active, which companies are counterparties) without requiring that structured
data to cross the external model boundary. The graph stays within the data to cross the external model boundary. The graph stays within the
deployment boundary; only the injected prose context leaves it. deployment boundary; only the injected prose context leaves it.
## Current status and gate criterion ## Current status and gate criterion
The deployment DataGraph is live. Three consecutive nightly runs reporting The deployment DataGraph is live. Three consecutive nightly runs reporting
HEALTHY status — defined as a non-negative entity count delta and a successful HEALTHY status — defined as a non-negative entity count delta and a successful
round trip on both the extraction and mutation endpoints — are the intended round trip on both the extraction and mutation endpoints — are the intended
criterion before the DataGraph pattern is extended to larger operational criterion before the DataGraph pattern is extended to larger operational
contexts. That gate has not yet been met; the rebuild pipeline is in its contexts. That gate has not yet been met; the rebuild pipeline is in its
initial operational period. initial operational period.
## See also ## See also
- [[elastic-compute-lora-training-pipeline]] — Phase 2 of the same nightly window (LoRA adapter training) - [[elastic-compute-lora-training-pipeline]] — Phase 2 of the same nightly window (LoRA adapter training)
- [[service-slm]] — the service that orchestrates the full nightly pipeline - [[service-slm]] — the service that orchestrates the full nightly pipeline