Diff: substrate/language-protocol-substrate.es
From a16a78a to a16a78a
+0 / −0 lines
| Before | After |
|---|---|
| --- | --- |
| schema: foundry-doc-v1 | schema: foundry-doc-v1 |
| title: "The language-protocol substrate" | title: "The language-protocol substrate" |
| slug: language-protocol-substrate | slug: language-protocol-substrate |
| category: substrate | category: substrate |
| type: topic | type: topic |
| quality: complete | quality: complete |
| short_description: "The editorial infrastructure that encodes register, brand voice, document sub-type, and target audience as reusable prompt scaffolding — four adapter families, eighteen genre templates, a frontmatter validator, and a four-service split that lets a customer replace any single component without touching the rest." | short_description: "The editorial infrastructure that encodes register, brand voice, document sub-type, and target audience as reusable prompt scaffolding — four adapter families, eighteen genre templates, a frontmatter validator, and a four-service split that lets a customer replace any single component without touching the rest." |
| status: active | status: active |
| bcsc_class: public-disclosure-safe | bcsc_class: public-disclosure-safe |
| last_edited: 2026-05-15 | last_edited: 2026-05-15 |
| editor: pointsav-engineering | editor: pointsav-engineering |
| cites: | cites: |
| - ni-51-102 | - ni-51-102 |
| - osc-sn-51-721 | - osc-sn-51-721 |
| paired_with: language-protocol-substrate.es.md | paired_with: language-protocol-substrate.es.md |
| --- | --- |
| Every editorial action that passes through the PointSav platform — document generation, schema validation, training-tuple capture — is shaped by a substrate that encodes register, brand voice, document sub-type, and target audience as reusable prompt scaffolding rather than ad-hoc instruction. The result is editorial work that is audited, per-tenant, and replaceable at any layer without rebuilding the rest. | Every editorial action that passes through the PointSav platform — document generation, schema validation, training-tuple capture — is shaped by a substrate that encodes register, brand voice, document sub-type, and target audience as reusable prompt scaffolding rather than ad-hoc instruction. The result is editorial work that is audited, per-tenant, and replaceable at any layer without rebuilding the rest. |
| The substrate provides four artefact families, eighteen genre templates, a frontmatter validator that returns all schema violations in a single pass, and a banned-vocabulary list of eight cross-genre prohibited terms. These ship as a Rust crate (`service-disclosure`) split across four services — `service-content` (knowledge graph), `service-slm` (Doorman and inference), `service-disclosure` (schema and templates), `service-proofreader` (HTTP write-assistant) — where any single service can be replaced by a customer-owned equivalent while the rest hold. | The substrate provides four artefact families, eighteen genre templates, a frontmatter validator that returns all schema violations in a single pass, and a banned-vocabulary list of eight cross-genre prohibited terms. These ship as a Rust crate (`service-disclosure`) split across four services — `service-content` (knowledge graph), `service-slm` (Doorman and inference), `service-disclosure` (schema and templates), `service-proofreader` (HTTP write-assistant) — where any single service can be replaced by a customer-owned equivalent while the rest hold. |
| Three adapters compose at request time: base model, tenant adapter (brand voice), and protocol adapter (PROSE | COMMS | LEGAL | TRANSLATE). Register, brand voice, and target audience live as prompt scaffolding rather than additional adapters — five or more adapters per request crosses into multi-task interference per the 2025 LoRA literature; the platform stays at three. Every editorial action produces a verdict-signed training tuple through the [[apprenticeship-substrate]] pipeline, feeding continued pretraining on the customer's adapter without the customer's text leaving their infrastructure. | Three adapters compose at request time: base model, tenant adapter (brand voice), and protocol adapter (PROSE | COMMS | LEGAL | TRANSLATE). Register, brand voice, and target audience live as prompt scaffolding rather than additional adapters — five or more adapters per request crosses into multi-task interference per the 2025 LoRA literature; the platform stays at three. Every editorial action produces a verdict-signed training tuple through the [[apprenticeship-substrate]] pipeline, feeding continued pretraining on the customer's adapter without the customer's text leaving their infrastructure. |
| For regulated buyers, the four-service split matters because every editorial action is audited in the per-tenant ledger before it exits the customer's network. The customer can fork any adapter, inspect the prompt scaffolding, and verify that their brand voice is not pooled with another tenant's training data. Per `[ni-51-102]` and `[osc-sn-51-721]`, the training pipeline is described in planned terms; the substrate architecture is operational today. | For regulated buyers, the four-service split matters because every editorial action is audited in the per-tenant ledger before it exits the customer's network. The customer can fork any adapter, inspect the prompt scaffolding, and verify that their brand voice is not pooled with another tenant's training data. Per `[ni-51-102]` and `[osc-sn-51-721]`, the training pipeline is described in planned terms; the substrate architecture is operational today. |
| ## Overview | ## Overview |
| The substrate provides four artefacts: | The substrate provides four artefacts: |
| 1. **A 4-family adapter taxonomy.** PROSE for long-form English, COMMS for short-form interpersonal, LEGAL for volume-gated formal documents, TRANSLATE as a meta-protocol layered on top of any other family. | 1. **A 4-family adapter taxonomy.** PROSE for long-form English, COMMS for short-form interpersonal, LEGAL for volume-gated formal documents, TRANSLATE as a meta-protocol layered on top of any other family. |
| 2. **A genre-template registry.** Eighteen templates, each carrying its required sections, register parameters, bilingual-pair convention, frontmatter schema, and prompt scaffolding. | 2. **A genre-template registry.** Eighteen templates, each carrying its required sections, register parameters, bilingual-pair convention, frontmatter schema, and prompt scaffolding. |
| 3. **A frontmatter validator.** Returns every per-genre rule violation in one pass rather than first-fail. | 3. **A frontmatter validator.** Returns every per-genre rule violation in one pass rather than first-fail. |
| 4. **A banned-vocabulary list.** Eight cross-genre prohibited terms that survive in marketing prose and have no place in precise writing. | 4. **A banned-vocabulary list.** Eight cross-genre prohibited terms that survive in marketing prose and have no place in precise writing. |
| These four artefacts ship as a Rust crate (`service-disclosure`) that any platform component can consume. The Doorman composes the templates into prompts at request time; the per-tenant write-assistant validates inbound and outbound text against the schema; the apprenticeship pipeline produces verdict-signed training tuples on every editorial action. | These four artefacts ship as a Rust crate (`service-disclosure`) that any platform component can consume. The Doorman composes the templates into prompts at request time; the per-tenant write-assistant validates inbound and outbound text against the schema; the apprenticeship pipeline produces verdict-signed training tuples on every editorial action. |
| ## Ring and Role | ## Ring and Role |
| The Language-Protocol Substrate spans Ring 3 — Optional Intelligence (inference via the Doorman) and Ring 2 — Knowledge and Processing (schema validation and template management via `service-content` and `service-disclosure`). It has no Ring 1 component: editorial work begins after boundary ingest completes. The substrate is activated on every editorial action that passes through the Doorman, whether that action is a document generation request, a validation pass, or a training-tuple capture. | The Language-Protocol Substrate spans Ring 3 — Optional Intelligence (inference via the Doorman) and Ring 2 — Knowledge and Processing (schema validation and template management via `service-content` and `service-disclosure`). It has no Ring 1 component: editorial work begins after boundary ingest completes. The substrate is activated on every editorial action that passes through the Doorman, whether that action is a document generation request, a validation pass, or a training-tuple capture. |
| ## Architecture | ## Architecture |
| ### The four families | ### The four families |
| | Family | Generation responsibility | Templates | | | Family | Generation responsibility | Templates | |
| |---|---|---| | |---|---|---| |
| | **PROSE** | Long-form English prose | README (workspace / repo / project), TOPIC, GUIDE, MEMO, ARCHITECTURE, INVENTORY, license-explainer, CHANGELOG | | | **PROSE** | Long-form English prose | README (workspace / repo / project), TOPIC, GUIDE, MEMO, ARCHITECTURE, INVENTORY, license-explainer, CHANGELOG | |
| | **COMMS** | Short-form interpersonal | email, chat, ticket comment, meeting notes | | | **COMMS** | Short-form interpersonal | email, chat, ticket comment, meeting notes | |
| | **LEGAL** | Volume-gated formal | contract, CLA, policy, terms (default-routes to Tier C) | | | **LEGAL** | Volume-gated formal | contract, CLA, policy, terms (default-routes to Tier C) | |
| | **TRANSLATE** | Meta-protocol | Operates over the other families; not a separate generation track | | | **TRANSLATE** | Meta-protocol | Operates over the other families; not a separate generation track | |
| Three adapters compose at request time: | Three adapters compose at request time: |
| ``` | ``` |
| composed_weights = | composed_weights = |
| base_model | base_model |
| ⊕ tenant_adapter[<tenant_id>] // brand voice | ⊕ tenant_adapter[<tenant_id>] // brand voice |
| ⊕ protocol_adapter[PROSE | COMMS | LEGAL | TRANSLATE] | ⊕ protocol_adapter[PROSE | COMMS | LEGAL | TRANSLATE] |
| ``` | ``` |
| Five or more adapters per request crosses into multi-task interference per the 2025 LoRA literature (LoRAX, S-LoRA, TC-LoRA, LoRI). The platform stays at three. | Five or more adapters per request crosses into multi-task interference per the 2025 LoRA literature (LoRAX, S-LoRA, TC-LoRA, LoRI). The platform stays at three. |
| Register, brand voice, document sub-type, and target audience live as prompt scaffolding rather than additional adapters. Fewer adapters, richer scaffolding, retrieval grounding, decode-time constraints — this is the 2026 industry consensus for production editorial systems. | Register, brand voice, document sub-type, and target audience live as prompt scaffolding rather than additional adapters. Fewer adapters, richer scaffolding, retrieval grounding, decode-time constraints — this is the 2026 industry consensus for production editorial systems. |
| ### The four-service split | ### The four-service split |
| The editorial-write path runs through four services. Each owns one shape: | The editorial-write path runs through four services. Each owns one shape: |
| | Service | Shape | Owner cluster | | | Service | Shape | Owner cluster | |
| |---|---|---| | |---|---|---| |
| | `service-content` | Data — taxonomy ledger and knowledge graph | project-slm | | | `service-content` | Data — taxonomy ledger and knowledge graph | project-slm | |
| | `service-slm` | Inference — Doorman, tier routing, audit ledger | project-slm | | | `service-slm` | Inference — Doorman, tier routing, audit ledger | project-slm | |
| | `service-disclosure` | Schema — types, validators, CFG, templates | project-language | | | `service-disclosure` | Schema — types, validators, CFG, templates | project-language | |
| | `service-proofreader` | Operational — request-shaped HTTP write-assistant | project-proofreader | | | `service-proofreader` | Operational — request-shaped HTTP write-assistant | project-proofreader | |
| A customer can replace any one without touching the rest. Replace `service-slm` with a customer-owned GPU host while keeping `service-content` and `service-disclosure`. Replace `service-content` with a customer's existing knowledge graph while keeping `service-slm`. The contract between services is the only thing that needs to hold. | A customer can replace any one without touching the rest. Replace `service-slm` with a customer-owned GPU host while keeping `service-content` and `service-disclosure`. Replace `service-content` with a customer's existing knowledge graph while keeping `service-slm`. The contract between services is the only thing that needs to hold. |
| The platform's contribution to this pattern is the per-tenant audit ledger that makes each substitution composable across regulatory contexts — a replacement service produces the same audit trail the original produced. | The platform's contribution to this pattern is the per-tenant audit ledger that makes each substitution composable across regulatory contexts — a replacement service produces the same audit trail the original produced. |
| ### Multi-tenant via moduleId namespacing | ### Multi-tenant via moduleId namespacing |
| One `service-content` instance per platform deployment, with `moduleId` partitioning tenants inside. Per-tenant isolated deployment is the escalation path — when a customer needs key-management-per-tenant or stronger isolation, they spin up their own platform instance in their own infrastructure and get their own `service-content` there. | One `service-content` instance per platform deployment, with `moduleId` partitioning tenants inside. Per-tenant isolated deployment is the escalation path — when a customer needs key-management-per-tenant or stronger isolation, they spin up their own platform instance in their own infrastructure and get their own `service-content` there. |
| This is the meaning of "tenant escalation happens at the deployment boundary, not the service-naming boundary." The service stays multi-tenant; the deployment topology grows isolation when warranted. | This is the meaning of "tenant escalation happens at the deployment boundary, not the service-naming boundary." The service stays multi-tenant; the deployment topology grows isolation when warranted. |
| ## Configuration | ## Configuration |
| Eight editorial task-types are defined in the platform's editorial cluster manifest: `prose-edit`, `comms-edit`, `frontmatter-normalize`, `citation-insert`, `register-tighten`, `cross-link-verify`, `schema-validate`, `template-author`. Each generates verdict-signed training tuples through the [[apprenticeship-substrate]] pipeline. The tuples feed continued pretraining on the customer's adapter when corpus volume warrants. | Eight editorial task-types are defined in the platform's editorial cluster manifest: `prose-edit`, `comms-edit`, `frontmatter-normalize`, `citation-insert`, `register-tighten`, `cross-link-verify`, `schema-validate`, `template-author`. Each generates verdict-signed training tuples through the [[apprenticeship-substrate]] pipeline. The tuples feed continued pretraining on the customer's adapter when corpus volume warrants. |
| Per `[ni-51-102]` continuous-disclosure language and in accordance with the forward-looking information principles of `[osc-sn-51-721]`, the substrate's training pipeline is described in planned terms. The shape is in place; the operational throughput is what matures over time. The pipeline target: every editorial action a customer deployment performs is one tuple of training data for the customer's adapter. The customer's voice deepens over time without their text leaving their infrastructure. | Per `[ni-51-102]` continuous-disclosure language and in accordance with the forward-looking information principles of `[osc-sn-51-721]`, the substrate's training pipeline is described in planned terms. The shape is in place; the operational throughput is what matures over time. The pipeline target: every editorial action a customer deployment performs is one tuple of training data for the customer's adapter. The customer's voice deepens over time without their text leaving their infrastructure. |
| ## See also | ## See also |
| - [[style-guide-topic]] | - [[style-guide-topic]] |
| - [[customer-hostability]] | - [[customer-hostability]] |
| - [[anti-homogenization-discipline]] | - [[anti-homogenization-discipline]] |
| - [[apprenticeship-substrate]] | - [[apprenticeship-substrate]] |
| - [[citation-substrate]] | - [[citation-substrate]] |