Diff: services/service-slm-graph-store-migration
From 1911223 to 1911223
+0 / −0 lines
| Before | After |
|---|---|
| --- | --- |
| schema: foundry-doc-v1 | schema: foundry-doc-v1 |
| title: "service-slm graph store migration" | title: "service-slm graph store migration" |
| slug: service-slm-graph-store-migration | slug: service-slm-graph-store-migration |
| category: services | category: services |
| type: concept | type: concept |
| quality: pre-build | quality: pre-build |
| status: pre-build | status: pre-build |
| audience: vendor-public | audience: vendor-public |
| bcsc_class: current-fact | bcsc_class: current-fact |
| language_protocol: PROSE-TOPIC | language_protocol: PROSE-TOPIC |
| last_edited: 2026-05-24 | last_edited: 2026-05-24 |
| editor: pointsav-engineering | editor: pointsav-engineering |
| paired_with: service-slm-graph-store-migration.es.md | paired_with: service-slm-graph-store-migration.es.md |
| short_description: "service-slm migrated its graph store from LadybugDB to SQLite for fleet nodes and integrates a nightly DataGraph rebuild that processes the operator data corpus through the Doorman into the property graph used for inference context injection." | short_description: "service-slm migrated its graph store from LadybugDB to SQLite for fleet nodes and integrates a nightly DataGraph rebuild that processes the operator data corpus through the Doorman into the property graph used for inference context injection." |
| cites: [] | cites: [] |
| --- | --- |
| Each night, `jennifer-datagraph-rebuild.sh` processes the operator data | Each night, `jennifer-datagraph-rebuild.sh` processes the operator data |
| corpus and writes extracted named entities to a property graph stored in | corpus and writes extracted named entities to a property graph stored in |
| LadybugDB. This property graph — the deployment DataGraph — is the entity layer | LadybugDB. This property graph — the deployment DataGraph — is the entity layer |
| that service-content uses to inject structured business context into inference | that service-content uses to inject structured business context into inference |
| requests. The rebuild runs as Phase 1 of the Elastic Compute #1 nightly window, before | requests. The rebuild runs as Phase 1 of the Elastic Compute #1 nightly window, before |
| the training phase claims the GPU. The deployment DataGraph is live, with an | the training phase claims the GPU. The deployment DataGraph is live, with an |
| 11 MB LadybugDB file currently active at service-content. | 11 MB LadybugDB file currently active at service-content. |
| ## What the DataGraph contains | ## What the DataGraph contains |
| The deployment DataGraph is a property graph of named business entities | The deployment DataGraph is a property graph of named business entities |
| extracted from the operator deployment's data corpus. The graph holds five | extracted from the operator deployment's data corpus. The graph holds five |
| entity classifications: Person (staff, contacts, counterparties), Company | entity classifications: Person (staff, contacts, counterparties), Company |
| (vendors, customers, partner organisations), Project (active and historical | (vendors, customers, partner organisations), Project (active and historical |
| engagements), Account (financial accounts and ledger references), and | engagements), Account (financial accounts and ledger references), and |
| Location (offices, sites, and operational addresses). These entities are | Location (offices, sites, and operational addresses). These entities are |
| extracted from three document streams: meeting transcript markdown files | extracted from three document streams: meeting transcript markdown files |
| from the minutebook asset directory, research and background YAML and markdown | from the minutebook asset directory, research and background YAML and markdown |
| files from the service-agents directory, and contact source JSON records from | files from the service-agents directory, and contact source JSON records from |
| the service-people directory. | the service-people directory. |
| ## What the nightly rebuild does | ## What the nightly rebuild does |
| For each unprocessed document, the rebuild script calls | For each unprocessed document, the rebuild script calls |
| `POST :9080/v1/chat/completions` through the Doorman endpoint, passing the | `POST :9080/v1/chat/completions` through the Doorman endpoint, passing the |
| document text with a JSON Schema grammar constraint. The language model — | document text with a JSON Schema grammar constraint. The language model — |
| OLMo 3 32B Think running on Elastic Compute #1 via vLLM — returns a structured JSON | OLMo 3 32B Think running on Elastic Compute #1 via vLLM — returns a structured JSON |
| array of entity objects. Each object carries the entity name, classification, | array of entity objects. Each object carries the entity name, classification, |
| confidence score, and optional role, location, and contact vectors. The script | confidence score, and optional role, location, and contact vectors. The script |
| then calls `POST :9081/v1/graph/mutate` on service-content to write those | then calls `POST :9081/v1/graph/mutate` on service-content to write those |
| entities into LadybugDB. The health probe at the end of the cycle queries | entities into LadybugDB. The health probe at the end of the cycle queries |
| service-content for the current entity count and writes a summary JSON file | service-content for the current entity count and writes a summary JSON file |
| at `$FOUNDRY_ROOT/data/datagraph-health.json`. | at `$FOUNDRY_ROOT/data/datagraph-health.json`. |
| The script processes three document batches each run: the full minutebook | The script processes three document batches each run: the full minutebook |
| asset tree, the full service-agents tree, and the 50 most recent unprocessed | asset tree, the full service-agents tree, and the 50 most recent unprocessed |
| service-people JSON files. A randomised inter-document delay (0.3 to 1.5 | service-people JSON files. A randomised inter-document delay (0.3 to 1.5 |
| seconds) prevents the Doorman from receiving a burst of requests that could | seconds) prevents the Doorman from receiving a burst of requests that could |
| interfere with the training phase startup. | interfere with the training phase startup. |
| ## The routing parity principle | ## The routing parity principle |
| The `jennifer-datagraph-rebuild.sh` script calls only the same two REST API | The `jennifer-datagraph-rebuild.sh` script calls only the same two REST API |
| endpoints that any operator or community member running service-slm and | endpoints that any operator or community member running service-slm and |
| service-content would call from their own automation: | service-content would call from their own automation: |
| - `POST :9080/v1/chat/completions` — entity extraction through Doorman | - `POST :9080/v1/chat/completions` — entity extraction through Doorman |
| - `POST :9081/v1/graph/mutate` — entity write through service-content | - `POST :9081/v1/graph/mutate` — entity write through service-content |
| There is no file-watcher shortcut, no internal gRPC bypass, and no direct | There is no file-watcher shortcut, no internal gRPC bypass, and no direct |
| database write. This is a deliberate design decision. If the rebuild script | database write. This is a deliberate design decision. If the rebuild script |
| fails, the failure indicates a real defect in service-slm or service-content | fails, the failure indicates a real defect in service-slm or service-content |
| that would also affect any operator or customer running the same API surface. | that would also affect any operator or customer running the same API surface. |
| The nightly rebuild functions as a full-stack integration test that runs | The nightly rebuild functions as a full-stack integration test that runs |
| against production services on production data every night. Failures are | against production services on production data every night. Failures are |
| explicit and immediately actionable rather than hidden in an internal path | explicit and immediately actionable rather than hidden in an internal path |
| that real callers would never exercise. | that real callers would never exercise. |
| ## Idempotency | ## Idempotency |
| The script tracks processed documents using a local ledger at | The script tracks processed documents using a local ledger at |
| `$FOUNDRY_ROOT/data/datagraph-processed.txt`. Each document is identified by | `$FOUNDRY_ROOT/data/datagraph-processed.txt`. Each document is identified by |
| a hash of its file content, prefixed with a source tag (`mk-` for minutebook, | a hash of its file content, prefixed with a source tag (`mk-` for minutebook, |
| `ag-` for service-agents, `sp-` for service-people). Before processing any | `ag-` for service-agents, `sp-` for service-people). Before processing any |
| document, the script checks whether its identifier appears in the ledger. If | document, the script checks whether its identifier appears in the ledger. If |
| it does, the document is skipped. After a successful `graph/mutate` call, the | it does, the document is skipped. After a successful `graph/mutate` call, the |
| identifier is appended to the ledger. This mechanism ensures that documents | identifier is appended to the ledger. This mechanism ensures that documents |
| are not re-processed across multiple nightly runs, even if the same content | are not re-processed across multiple nightly runs, even if the same content |
| is present in the source directories. | is present in the source directories. |
| The ledger is append-only and not pruned automatically. If service-content | The ledger is append-only and not pruned automatically. If service-content |
| is restarted and the graph is rebuilt from scratch, the ledger can be cleared | is restarted and the graph is rebuilt from scratch, the ledger can be cleared |
| to force a full re-extraction on the next nightly run. | to force a full re-extraction on the next nightly run. |
| ## Graph context injection | ## Graph context injection |
| The deployment DataGraph is not a static reference store. service-content | The deployment DataGraph is not a static reference store. service-content |
| queries it before each inference request. When the Doorman receives a | queries it before each inference request. When the Doorman receives a |
| completion request from an operator or application, service-content retrieves | completion request from an operator or application, service-content retrieves |
| entities relevant to the request context — based on module ID, entity | entities relevant to the request context — based on module ID, entity |
| classification, and confidence thresholds — and injects them into the system | classification, and confidence thresholds — and injects them into the system |
| message as a structured entity context block. The language model receives | message as a structured entity context block. The language model receives |
| structured business context (who the relevant people are, what projects are | structured business context (who the relevant people are, what projects are |
| active, which companies are counterparties) without requiring that structured | active, which companies are counterparties) without requiring that structured |
| data to cross the external model boundary. The graph stays within the | data to cross the external model boundary. The graph stays within the |
| deployment boundary; only the injected prose context leaves it. | deployment boundary; only the injected prose context leaves it. |
| ## Current status and gate criterion | ## Current status and gate criterion |
| The deployment DataGraph is live. Three consecutive nightly runs reporting | The deployment DataGraph is live. Three consecutive nightly runs reporting |
| HEALTHY status — defined as a non-negative entity count delta and a successful | HEALTHY status — defined as a non-negative entity count delta and a successful |
| round trip on both the extraction and mutation endpoints — are the intended | round trip on both the extraction and mutation endpoints — are the intended |
| criterion before the DataGraph pattern is extended to larger operational | criterion before the DataGraph pattern is extended to larger operational |
| contexts. That gate has not yet been met; the rebuild pipeline is in its | contexts. That gate has not yet been met; the rebuild pipeline is in its |
| initial operational period. | initial operational period. |
| ## See also | ## See also |
| - [[elastic-compute-lora-training-pipeline]] — Phase 2 of the same nightly window (LoRA adapter training) | - [[elastic-compute-lora-training-pipeline]] — Phase 2 of the same nightly window (LoRA adapter training) |
| - [[service-slm]] — the service that orchestrates the full nightly pipeline | - [[service-slm]] — the service that orchestrates the full nightly pipeline |