Diff: services/service-slm-graph-store-migration

From 1911223 to 1911223

+0 / −0 lines

Before	After
---	---
schema: foundry-doc-v1	schema: foundry-doc-v1
title: "service-slm graph store migration"	title: "service-slm graph store migration"
slug: service-slm-graph-store-migration	slug: service-slm-graph-store-migration
category: services	category: services
type: concept	type: concept
quality: pre-build	quality: pre-build
status: pre-build	status: pre-build
audience: vendor-public	audience: vendor-public
bcsc_class: current-fact	bcsc_class: current-fact
language_protocol: PROSE-TOPIC	language_protocol: PROSE-TOPIC
last_edited: 2026-05-24	last_edited: 2026-05-24
editor: pointsav-engineering	editor: pointsav-engineering
paired_with: service-slm-graph-store-migration.es.md	paired_with: service-slm-graph-store-migration.es.md
short_description: "service-slm migrated its graph store from LadybugDB to SQLite for fleet nodes and integrates a nightly DataGraph rebuild that processes the operator data corpus through the Doorman into the property graph used for inference context injection."	short_description: "service-slm migrated its graph store from LadybugDB to SQLite for fleet nodes and integrates a nightly DataGraph rebuild that processes the operator data corpus through the Doorman into the property graph used for inference context injection."
cites: []	cites: []
---	---

Each night, `jennifer-datagraph-rebuild.sh` processes the operator data	Each night, `jennifer-datagraph-rebuild.sh` processes the operator data
corpus and writes extracted named entities to a property graph stored in	corpus and writes extracted named entities to a property graph stored in
LadybugDB. This property graph — the deployment DataGraph — is the entity layer	LadybugDB. This property graph — the deployment DataGraph — is the entity layer
that service-content uses to inject structured business context into inference	that service-content uses to inject structured business context into inference
requests. The rebuild runs as Phase 1 of the Elastic Compute #1 nightly window, before	requests. The rebuild runs as Phase 1 of the Elastic Compute #1 nightly window, before
the training phase claims the GPU. The deployment DataGraph is live, with an	the training phase claims the GPU. The deployment DataGraph is live, with an
11 MB LadybugDB file currently active at service-content.	11 MB LadybugDB file currently active at service-content.

## What the DataGraph contains	## What the DataGraph contains

The deployment DataGraph is a property graph of named business entities	The deployment DataGraph is a property graph of named business entities
extracted from the operator deployment's data corpus. The graph holds five	extracted from the operator deployment's data corpus. The graph holds five
entity classifications: Person (staff, contacts, counterparties), Company	entity classifications: Person (staff, contacts, counterparties), Company
(vendors, customers, partner organisations), Project (active and historical	(vendors, customers, partner organisations), Project (active and historical
engagements), Account (financial accounts and ledger references), and	engagements), Account (financial accounts and ledger references), and
Location (offices, sites, and operational addresses). These entities are	Location (offices, sites, and operational addresses). These entities are
extracted from three document streams: meeting transcript markdown files	extracted from three document streams: meeting transcript markdown files
from the minutebook asset directory, research and background YAML and markdown	from the minutebook asset directory, research and background YAML and markdown
files from the service-agents directory, and contact source JSON records from	files from the service-agents directory, and contact source JSON records from
the service-people directory.	the service-people directory.

## What the nightly rebuild does	## What the nightly rebuild does

For each unprocessed document, the rebuild script calls	For each unprocessed document, the rebuild script calls
`POST :9080/v1/chat/completions` through the Doorman endpoint, passing the	`POST :9080/v1/chat/completions` through the Doorman endpoint, passing the
document text with a JSON Schema grammar constraint. The language model —	document text with a JSON Schema grammar constraint. The language model —
OLMo 3 32B Think running on Elastic Compute #1 via vLLM — returns a structured JSON	OLMo 3 32B Think running on Elastic Compute #1 via vLLM — returns a structured JSON
array of entity objects. Each object carries the entity name, classification,	array of entity objects. Each object carries the entity name, classification,
confidence score, and optional role, location, and contact vectors. The script	confidence score, and optional role, location, and contact vectors. The script
then calls `POST :9081/v1/graph/mutate` on service-content to write those	then calls `POST :9081/v1/graph/mutate` on service-content to write those
entities into LadybugDB. The health probe at the end of the cycle queries	entities into LadybugDB. The health probe at the end of the cycle queries
service-content for the current entity count and writes a summary JSON file	service-content for the current entity count and writes a summary JSON file
at `$FOUNDRY_ROOT/data/datagraph-health.json`.	at `$FOUNDRY_ROOT/data/datagraph-health.json`.

The script processes three document batches each run: the full minutebook	The script processes three document batches each run: the full minutebook
asset tree, the full service-agents tree, and the 50 most recent unprocessed	asset tree, the full service-agents tree, and the 50 most recent unprocessed
service-people JSON files. A randomised inter-document delay (0.3 to 1.5	service-people JSON files. A randomised inter-document delay (0.3 to 1.5
seconds) prevents the Doorman from receiving a burst of requests that could	seconds) prevents the Doorman from receiving a burst of requests that could
interfere with the training phase startup.	interfere with the training phase startup.

## The routing parity principle	## The routing parity principle

The `jennifer-datagraph-rebuild.sh` script calls only the same two REST API	The `jennifer-datagraph-rebuild.sh` script calls only the same two REST API
endpoints that any operator or community member running service-slm and	endpoints that any operator or community member running service-slm and
service-content would call from their own automation:	service-content would call from their own automation:

- `POST :9080/v1/chat/completions` — entity extraction through Doorman	- `POST :9080/v1/chat/completions` — entity extraction through Doorman
- `POST :9081/v1/graph/mutate` — entity write through service-content	- `POST :9081/v1/graph/mutate` — entity write through service-content

There is no file-watcher shortcut, no internal gRPC bypass, and no direct	There is no file-watcher shortcut, no internal gRPC bypass, and no direct
database write. This is a deliberate design decision. If the rebuild script	database write. This is a deliberate design decision. If the rebuild script
fails, the failure indicates a real defect in service-slm or service-content	fails, the failure indicates a real defect in service-slm or service-content
that would also affect any operator or customer running the same API surface.	that would also affect any operator or customer running the same API surface.
The nightly rebuild functions as a full-stack integration test that runs	The nightly rebuild functions as a full-stack integration test that runs
against production services on production data every night. Failures are	against production services on production data every night. Failures are
explicit and immediately actionable rather than hidden in an internal path	explicit and immediately actionable rather than hidden in an internal path
that real callers would never exercise.	that real callers would never exercise.

## Idempotency	## Idempotency

The script tracks processed documents using a local ledger at	The script tracks processed documents using a local ledger at
`$FOUNDRY_ROOT/data/datagraph-processed.txt`. Each document is identified by	`$FOUNDRY_ROOT/data/datagraph-processed.txt`. Each document is identified by
a hash of its file content, prefixed with a source tag (`mk-` for minutebook,	a hash of its file content, prefixed with a source tag (`mk-` for minutebook,
`ag-` for service-agents, `sp-` for service-people). Before processing any	`ag-` for service-agents, `sp-` for service-people). Before processing any
document, the script checks whether its identifier appears in the ledger. If	document, the script checks whether its identifier appears in the ledger. If
it does, the document is skipped. After a successful `graph/mutate` call, the	it does, the document is skipped. After a successful `graph/mutate` call, the
identifier is appended to the ledger. This mechanism ensures that documents	identifier is appended to the ledger. This mechanism ensures that documents
are not re-processed across multiple nightly runs, even if the same content	are not re-processed across multiple nightly runs, even if the same content
is present in the source directories.	is present in the source directories.

The ledger is append-only and not pruned automatically. If service-content	The ledger is append-only and not pruned automatically. If service-content
is restarted and the graph is rebuilt from scratch, the ledger can be cleared	is restarted and the graph is rebuilt from scratch, the ledger can be cleared
to force a full re-extraction on the next nightly run.	to force a full re-extraction on the next nightly run.

## Graph context injection	## Graph context injection

The deployment DataGraph is not a static reference store. service-content	The deployment DataGraph is not a static reference store. service-content
queries it before each inference request. When the Doorman receives a	queries it before each inference request. When the Doorman receives a
completion request from an operator or application, service-content retrieves	completion request from an operator or application, service-content retrieves
entities relevant to the request context — based on module ID, entity	entities relevant to the request context — based on module ID, entity
classification, and confidence thresholds — and injects them into the system	classification, and confidence thresholds — and injects them into the system
message as a structured entity context block. The language model receives	message as a structured entity context block. The language model receives
structured business context (who the relevant people are, what projects are	structured business context (who the relevant people are, what projects are
active, which companies are counterparties) without requiring that structured	active, which companies are counterparties) without requiring that structured
data to cross the external model boundary. The graph stays within the	data to cross the external model boundary. The graph stays within the
deployment boundary; only the injected prose context leaves it.	deployment boundary; only the injected prose context leaves it.

## Current status and gate criterion	## Current status and gate criterion

The deployment DataGraph is live. Three consecutive nightly runs reporting	The deployment DataGraph is live. Three consecutive nightly runs reporting
HEALTHY status — defined as a non-negative entity count delta and a successful	HEALTHY status — defined as a non-negative entity count delta and a successful
round trip on both the extraction and mutation endpoints — are the intended	round trip on both the extraction and mutation endpoints — are the intended
criterion before the DataGraph pattern is extended to larger operational	criterion before the DataGraph pattern is extended to larger operational
contexts. That gate has not yet been met; the rebuild pipeline is in its	contexts. That gate has not yet been met; the rebuild pipeline is in its
initial operational period.	initial operational period.

## See also	## See also

- [[elastic-compute-lora-training-pipeline]] — Phase 2 of the same nightly window (LoRA adapter training)	- [[elastic-compute-lora-training-pipeline]] — Phase 2 of the same nightly window (LoRA adapter training)
- [[service-slm]] — the service that orchestrates the full nightly pipeline	- [[service-slm]] — the service that orchestrates the full nightly pipeline