Skip to content

service-slm graph store migration

Topic

From the PointSav Documentation

service-slm migrated its graph store from LadybugDB to SQLite for fleet nodes and integrates a nightly DataGraph rebuild that processes the operator data corpus through the Doorman into the property graph used for inference context injection.

Updated 2026-05-25 · HistoryEspañol
vendor-public

The service-slm graph store is a live property graph of named business entities extracted nightly from an operator's data corpus — the entity layer that service-content uses to inject structured business context into every inference request without sending proprietary data to an external model. The graph is stored in LadybugDB and rebuilt on a nightly schedule by the DataGraph rebuild script, which runs as Phase 1 of the Elastic Compute nightly window before the model-training phase claims the GPU.

Each night, the DataGraph rebuild script processes the operator data corpus and writes extracted named entities to a property graph stored in LadybugDB. This property graph — the deployment DataGraph — is the entity layer that service-content uses to inject structured business context into inference requests. The rebuild runs as Phase 1 of the Elastic Compute #1 nightly window, before the training phase claims the GPU. The deployment DataGraph is live, with an 11 MB LadybugDB file currently active at service-content.

[edit]What the DataGraph contains

The deployment DataGraph is a property graph of named business entities extracted from the operator deployment's data corpus. The graph holds five entity classifications: Person (staff, contacts, counterparties), Company (vendors, customers, partner organisations), Project (active and historical engagements), Account (financial accounts and ledger references), and Location (offices, sites, and operational addresses). These entities are extracted from three document streams: meeting transcript markdown files from the minutebook asset directory, research and background YAML and markdown files from the service-agents directory, and contact source JSON records from the service-people directory.

[edit]What the nightly rebuild does

For each unprocessed document, the rebuild script calls POST :9080/v1/chat/completions through the Doorman endpoint, passing the document text with a JSON Schema grammar constraint. The language model — OLMo 3 32B Think running on Elastic Compute #1 via vLLM — returns a structured JSON array of entity objects. Each object carries the entity name, classification, confidence score, and optional role, location, and contact vectors. The script then calls POST :9081/v1/graph/mutate on service-content to write those entities into LadybugDB. The health probe at the end of the cycle queries service-content for the current entity count and writes a summary JSON file at $DEPLOYMENT_ROOT/data/datagraph-health.json.

The script processes three document batches each run: the full minutebook asset tree, the full service-agents tree, and the 50 most recent unprocessed service-people JSON files. A randomised inter-document delay (0.3 to 1.5 seconds) prevents the Doorman from receiving a burst of requests that could interfere with the training phase startup.

[edit]The routing parity principle

The DataGraph rebuild script calls only the same two REST API endpoints that any operator or community member running service-slm and service-content would call from their own automation:

  • POST :9080/v1/chat/completions — entity extraction through Doorman
  • POST :9081/v1/graph/mutate — entity write through service-content

There is no file-watcher shortcut, no internal gRPC bypass, and no direct database write. This is a deliberate design decision. If the rebuild script fails, the failure indicates a real defect in service-slm or service-content that would also affect any operator or customer running the same API surface. The nightly rebuild functions as a full-stack integration test that runs against production services on production data every night. Failures are explicit and immediately actionable rather than hidden in an internal path that real callers would never exercise.

[edit]Idempotency

The script tracks processed documents using a local ledger at $DEPLOYMENT_ROOT/data/datagraph-processed.txt. Each document is identified by a hash of its file content, prefixed with a source tag (mk- for minutebook, ag- for service-agents, sp- for service-people). Before processing any document, the script checks whether its identifier appears in the ledger. If it does, the document is skipped. After a successful graph/mutate call, the identifier is appended to the ledger. This mechanism ensures that documents are not re-processed across multiple nightly runs, even if the same content is present in the source directories.

The ledger is append-only and not pruned automatically. If service-content is restarted and the graph is rebuilt from scratch, the ledger can be cleared to force a full re-extraction on the next nightly run.

[edit]Graph context injection

The deployment DataGraph is not a static reference store. service-content queries it before each inference request. When the Doorman receives a completion request from an operator or application, service-content retrieves entities relevant to the request context — based on module ID, entity classification, and confidence thresholds — and injects them into the system message as a structured entity context block. The language model receives structured business context (who the relevant people are, what projects are active, which companies are counterparties) without requiring that structured data to cross the external model boundary. The graph stays within the deployment boundary; only the injected prose context leaves it.

[edit]Current status and gate criterion

The deployment DataGraph is live. Three consecutive nightly runs reporting HEALTHY status — defined as a non-negative entity count delta and a successful round trip on both the extraction and mutation endpoints — are the intended criterion before the DataGraph pattern is extended to larger operational contexts. That gate has not yet been met; the rebuild pipeline is in its initial operational period.

[edit]See also

Category:Services
Last edited:
Edit this page · View source