Diff: substrate/knowledge-graph-grounded-apprenticeship.es
From 3f1e0da to 3f1e0da
+0 / −0 lines
| Before | After |
|---|---|
| --- | --- |
| schema: foundry-doc-v1 | schema: foundry-doc-v1 |
| title: "Knowledge-graph-grounded apprenticeship" | title: "Knowledge-graph-grounded apprenticeship" |
| slug: knowledge-graph-grounded-apprenticeship | slug: knowledge-graph-grounded-apprenticeship |
| category: substrate | category: substrate |
| type: topic | type: topic |
| quality: complete | quality: complete |
| short_description: "The Doorman consults the per-tenant knowledge graph before every inference request, producing training tuples where the graph and the model adapter co-evolve." | short_description: "The Doorman consults the per-tenant knowledge graph before every inference request, producing training tuples where the graph and the model adapter co-evolve." |
| status: active | status: active |
| bcsc_class: public-disclosure-safe | bcsc_class: public-disclosure-safe |
| last_edited: 2026-05-15 | last_edited: 2026-05-15 |
| editor: pointsav-engineering | editor: pointsav-engineering |
| cites: [] | cites: [] |
| references: | references: |
| - id: 1 | - id: 1 |
| text: "Edge, D. et al. 'From Local to Global: A Graph RAG Approach to Query-Focused Summarization.' arXiv:2404.16130, 2024." | text: "Edge, D. et al. 'From Local to Global: A Graph RAG Approach to Query-Focused Summarization.' arXiv:2404.16130, 2024." |
| url: "https://arxiv.org/abs/2404.16130" | url: "https://arxiv.org/abs/2404.16130" |
| paired_with: knowledge-graph-grounded-apprenticeship.es.md | paired_with: knowledge-graph-grounded-apprenticeship.es.md |
| --- | --- |
| **Knowledge-Graph-Grounded Apprenticeship** is the pattern by which the [[compounding-doorman|Doorman]] ([[service-slm]]) consults the per-tenant knowledge graph in [[service-content]] before routing every substantive inference request. The grounding context — a subgraph of entities and relationships relevant to the query — is supplied to the model alongside the request. The resulting training tuple carries both the graph context and the model's response, which means the knowledge graph and the per-tenant adapter improve together over time. | **Knowledge-Graph-Grounded Apprenticeship** is the pattern by which the [[compounding-doorman|Doorman]] ([[service-slm]]) consults the per-tenant knowledge graph in [[service-content]] before routing every substantive inference request. The grounding context — a subgraph of entities and relationships relevant to the query — is supplied to the model alongside the request. The resulting training tuple carries both the graph context and the model's response, which means the knowledge graph and the per-tenant adapter improve together over time. |
| This pattern extends the [[apprenticeship-substrate]] with a graph-grounding layer. | This pattern extends the [[apprenticeship-substrate]] with a graph-grounding layer. |
| ## Pre-inference grounding | ## Pre-inference grounding |
| Before the [[compounding-doorman|Doorman]] dispatches a request to any compute tier, it calls [[service-content]]'s graph query tool to assemble a two-hop subgraph around the query terms. The subgraph is rendered as a structured prefix to the model's system prompt, presenting the relevant entities, their relationships, and their domain and theme classifications. | Before the [[compounding-doorman|Doorman]] dispatches a request to any compute tier, it calls [[service-content]]'s graph query tool to assemble a two-hop subgraph around the query terms. The subgraph is rendered as a structured prefix to the model's system prompt, presenting the relevant entities, their relationships, and their domain and theme classifications. |
| The model therefore receives not only the user's query but also the factual context that the knowledge graph already holds about the parties and topics involved. The grounded entity identifiers are recorded in the [[worm-ledger-architecture|audit ledger]] for subsequent citation verification. | The model therefore receives not only the user's query but also the factual context that the knowledge graph already holds about the parties and topics involved. The grounded entity identifiers are recorded in the [[worm-ledger-architecture|audit ledger]] for subsequent citation verification. |
| When a query has no relevant graph context — for example, a generic system administration question — the graph query returns an empty subgraph and the request proceeds without grounding. These ungrounded tuples are valid training data; the model learns that some questions do not require graph context. | When a query has no relevant graph context — for example, a generic system administration question — the graph query returns an empty subgraph and the request proceeds without grounding. These ungrounded tuples are valid training data; the model learns that some questions do not require graph context. |
| ## Post-inference graph mutation | ## Post-inference graph mutation |
| When the model's response includes structured outputs and the senior verdict accepts the response, the [[compounding-doorman|Doorman]] may propose graph mutations derived from the response — new entities, new relationships, or updated properties. It calls [[service-content]]'s graph mutation tool; `service-content` applies the changes atomically, per tenant, and the [[worm-ledger-architecture|audit ledger]] records the mutation event. | When the model's response includes structured outputs and the senior verdict accepts the response, the [[compounding-doorman|Doorman]] may propose graph mutations derived from the response — new entities, new relationships, or updated properties. It calls [[service-content]]'s graph mutation tool; `service-content` applies the changes atomically, per tenant, and the [[worm-ledger-architecture|audit ledger]] records the mutation event. |
| The loop closes: entities discovered during one inference interaction become grounding context for the next. The knowledge graph grows through use. | The loop closes: entities discovered during one inference interaction become grounding context for the next. The knowledge graph grows through use. |
| ## Training tuple shape | ## Training tuple shape |
| The [[apprenticeship-substrate|apprenticeship corpus]] tuple gains a graph context field alongside the brief, the model's attempt, and the verdict. Direct preference optimisation training treats verdict-signed tuples with populated graph context as higher-weight examples than ungrounded tuples. Supervised fine-tuning over unsigned tuples uses graph context as additional input signal. | The [[apprenticeship-substrate|apprenticeship corpus]] tuple gains a graph context field alongside the brief, the model's attempt, and the verdict. Direct preference optimisation training treats verdict-signed tuples with populated graph context as higher-weight examples than ungrounded tuples. Supervised fine-tuning over unsigned tuples uses graph context as additional input signal. |
| Because graph context is per-tenant — isolated by module identifier — the Woodfine [[adapter-composition|adapter]] trains on Woodfine graph context and the PointSav adapter trains on PointSav graph context. There is no cross-tenant leakage at training time. | Because graph context is per-tenant — isolated by module identifier — the Woodfine [[adapter-composition|adapter]] trains on Woodfine graph context and the PointSav adapter trains on PointSav graph context. There is no cross-tenant leakage at training time. |
| ## Graph-coherence quality metrics | ## Graph-coherence quality metrics |
| A model response can be evaluated against the knowledge graph on three dimensions: | A model response can be evaluated against the knowledge graph on three dimensions: |
| **Citation rate** — the fraction of named entities in the response that exist in the graph. A high citation rate indicates the model is staying within known facts. | **Citation rate** — the fraction of named entities in the response that exist in the graph. A high citation rate indicates the model is staying within known facts. |
| **Relationship accuracy** — the fraction of stated relationships that match edges in the graph. Inaccurate relationships signal model drift from the grounded record. | **Relationship accuracy** — the fraction of stated relationships that match edges in the graph. Inaccurate relationships signal model drift from the grounded record. |
| **Hallucination rate** — the fraction of named entities in the response that are not present in the graph. Hallucination rate is the primary failure mode; responses above a threshold are candidates for refinement or rejection. [^1] | **Hallucination rate** — the fraction of named entities in the response that are not present in the graph. Hallucination rate is the primary failure mode; responses above a threshold are candidates for refinement or rejection. [^1] |
| These metrics feed the verdict process. A response with high hallucination rate is rejected; one with low citation rate is a candidate for refinement before acceptance. | These metrics feed the verdict process. A response with high hallucination rate is rejected; one with low citation rate is a candidate for refinement before acceptance. |
| ## Dependency on single-boundary discipline | ## Dependency on single-boundary discipline |
| Knowledge-graph-grounded apprenticeship depends on the [[single-boundary-compute-discipline]]. If inference can bypass the Doorman, it bypasses graph grounding. An ungrounded inference call produces no graph context field in the training tuple, no citation rate measurement, and no proposed graph mutation. The two claims compose as a structural dependency: without single-boundary enforcement, graph-grounded apprenticeship cannot be guaranteed. | Knowledge-graph-grounded apprenticeship depends on the [[single-boundary-compute-discipline]]. If inference can bypass the Doorman, it bypasses graph grounding. An ungrounded inference call produces no graph context field in the training tuple, no citation rate measurement, and no proposed graph mutation. The two claims compose as a structural dependency: without single-boundary enforcement, graph-grounded apprenticeship cannot be guaranteed. |
| ## See also | ## See also |
| - [[single-boundary-compute-discipline]] — structural prerequisite; grounding happens at the Doorman boundary | - [[single-boundary-compute-discipline]] — structural prerequisite; grounding happens at the Doorman boundary |
| - [[seed-taxonomy-as-smb-bootstrap]] — the per-tenant taxonomy that seeds the knowledge graph used for grounding | - [[seed-taxonomy-as-smb-bootstrap]] — the per-tenant taxonomy that seeds the knowledge graph used for grounding |
| - [[mcp-substrate-protocol]] — the MCP tools (`graph_query`, `graph_mutate`) through which the Doorman interacts with `service-content` | - [[mcp-substrate-protocol]] — the MCP tools (`graph_query`, `graph_mutate`) through which the Doorman interacts with `service-content` |