Skip to content

Diff: substrate/language-protocol-substrate.es

From bf9d2e1 to bf9d2e1

+0 / −0 lines
BeforeAfter
--- ---
schema: foundry-doc-v1 schema: foundry-doc-v1
title: "The language-protocol substrate" title: "The language-protocol substrate"
slug: language-protocol-substrate slug: language-protocol-substrate
category: substrate category: substrate
type: topic type: topic
quality: complete quality: complete
short_description: "The editorial infrastructure that encodes register, brand voice, document sub-type, and target audience as reusable prompt scaffolding — four adapter families, eighteen genre templates, a frontmatter validator, and a four-service split that lets a customer replace any single component without touching the rest." short_description: "The editorial infrastructure that encodes register, brand voice, document sub-type, and target audience as reusable prompt scaffolding — four adapter families, eighteen genre templates, a frontmatter validator, and a four-service split that lets a customer replace any single component without touching the rest."
status: active status: active
bcsc_class: public-disclosure-safe bcsc_class: public-disclosure-safe
last_edited: 2026-05-15 last_edited: 2026-05-15
editor: pointsav-engineering editor: pointsav-engineering
cites: cites:
- ni-51-102 - ni-51-102
- osc-sn-51-721 - osc-sn-51-721
paired_with: language-protocol-substrate.es.md paired_with: language-protocol-substrate.es.md
--- ---
Every editorial action that passes through the PointSav platform — document generation, schema validation, training-tuple capture — is shaped by a substrate that encodes register, brand voice, document sub-type, and target audience as reusable prompt scaffolding rather than ad-hoc instruction. The result is editorial work that is audited, per-tenant, and replaceable at any layer without rebuilding the rest. Every editorial action that passes through the PointSav platform — document generation, schema validation, training-tuple capture — is shaped by a substrate that encodes register, brand voice, document sub-type, and target audience as reusable prompt scaffolding rather than ad-hoc instruction. The result is editorial work that is audited, per-tenant, and replaceable at any layer without rebuilding the rest.
The substrate provides four artefact families, eighteen genre templates, a frontmatter validator that returns all schema violations in a single pass, and a banned-vocabulary list of eight cross-genre prohibited terms. These ship as a Rust crate (`service-disclosure`) split across four services — [[service-content]] (knowledge graph), [[service-slm]] ([[compounding-doorman|Doorman]] and inference), `service-disclosure` (schema and templates), `service-proofreader` (HTTP write-assistant) — where any single service can be replaced by a [[customer-hostability|customer-owned equivalent]] while the rest hold. The substrate provides four artefact families, eighteen genre templates, a frontmatter validator that returns all schema violations in a single pass, and a banned-vocabulary list of eight cross-genre prohibited terms. These ship as a Rust crate (`service-disclosure`) split across four services — [[service-content]] (knowledge graph), [[service-slm]] ([[compounding-doorman|Doorman]] and inference), `service-disclosure` (schema and templates), `service-proofreader` (HTTP write-assistant) — where any single service can be replaced by a [[customer-hostability|customer-owned equivalent]] while the rest hold.
Three adapters compose at request time: base model, tenant adapter (brand voice), and protocol adapter (PROSE | COMMS | LEGAL | TRANSLATE). Register, brand voice, and target audience live as prompt scaffolding rather than additional adapters — five or more adapters per request crosses into multi-task interference per the 2025 LoRA literature; the platform stays at three. Every editorial action produces a verdict-signed training tuple through the [[apprenticeship-substrate]] pipeline, feeding continued pretraining on the customer's adapter without the customer's text leaving their infrastructure. Three adapters compose at request time: base model, tenant adapter (brand voice), and protocol adapter (PROSE | COMMS | LEGAL | TRANSLATE). Register, brand voice, and target audience live as prompt scaffolding rather than additional adapters — five or more adapters per request crosses into multi-task interference per the 2025 LoRA literature; the platform stays at three. Every editorial action produces a verdict-signed training tuple through the [[apprenticeship-substrate]] pipeline, feeding continued pretraining on the customer's adapter without the customer's text leaving their infrastructure.
For regulated buyers, the four-service split matters because every editorial action is audited in the per-tenant [[worm-ledger-architecture|ledger]] before it exits the customer's network. The customer can fork any [[adapter-composition|adapter]], inspect the prompt scaffolding, and verify that their brand voice is not pooled with another tenant's training data. Per `[ni-51-102]` and `[osc-sn-51-721]`, the training pipeline is described in planned terms; the substrate architecture is operational today. For regulated buyers, the four-service split matters because every editorial action is audited in the per-tenant [[worm-ledger-architecture|ledger]] before it exits the customer's network. The customer can fork any [[adapter-composition|adapter]], inspect the prompt scaffolding, and verify that their brand voice is not pooled with another tenant's training data. Per `[ni-51-102]` and `[osc-sn-51-721]`, the training pipeline is described in planned terms; the substrate architecture is operational today.
## Overview ## Overview
The substrate provides four artefacts: The substrate provides four artefacts:
1. **A 4-family adapter taxonomy.** PROSE for long-form English, COMMS for short-form interpersonal, LEGAL for volume-gated formal documents, TRANSLATE as a meta-protocol layered on top of any other family. 1. **A 4-family adapter taxonomy.** PROSE for long-form English, COMMS for short-form interpersonal, LEGAL for volume-gated formal documents, TRANSLATE as a meta-protocol layered on top of any other family.
2. **A genre-template registry.** Eighteen templates, each carrying its required sections, register parameters, bilingual-pair convention, frontmatter schema, and prompt scaffolding. 2. **A genre-template registry.** Eighteen templates, each carrying its required sections, register parameters, bilingual-pair convention, frontmatter schema, and prompt scaffolding.
3. **A frontmatter validator.** Returns every per-genre rule violation in one pass rather than first-fail. 3. **A frontmatter validator.** Returns every per-genre rule violation in one pass rather than first-fail.
4. **A banned-vocabulary list.** Eight cross-genre prohibited terms that survive in marketing prose and have no place in precise writing. 4. **A banned-vocabulary list.** Eight cross-genre prohibited terms that survive in marketing prose and have no place in precise writing.
These four artefacts ship as a Rust crate (`service-disclosure`) that any platform component can consume. The Doorman composes the templates into prompts at request time; the per-tenant write-assistant validates inbound and outbound text against the schema; the apprenticeship pipeline produces verdict-signed training tuples on every editorial action. These four artefacts ship as a Rust crate (`service-disclosure`) that any platform component can consume. The Doorman composes the templates into prompts at request time; the per-tenant write-assistant validates inbound and outbound text against the schema; the apprenticeship pipeline produces verdict-signed training tuples on every editorial action.
## Ring and Role ## Ring and Role
The Language-Protocol Substrate spans Ring 3 — Optional Intelligence (inference via the [[compounding-doorman|Doorman]]) and Ring 2 — Knowledge and Processing (schema validation and template management via [[service-content]] and `service-disclosure`). It has no Ring 1 component: editorial work begins after boundary ingest completes. The substrate is activated on every editorial action that passes through the Doorman, whether that action is a document generation request, a validation pass, or a training-tuple capture. The Language-Protocol Substrate spans Ring 3 — Optional Intelligence (inference via the [[compounding-doorman|Doorman]]) and Ring 2 — Knowledge and Processing (schema validation and template management via [[service-content]] and `service-disclosure`). It has no Ring 1 component: editorial work begins after boundary ingest completes. The substrate is activated on every editorial action that passes through the Doorman, whether that action is a document generation request, a validation pass, or a training-tuple capture.
## Architecture ## Architecture
### The four families ### The four families
| Family | Generation responsibility | Templates | | Family | Generation responsibility | Templates |
|---|---|---| |---|---|---|
| **PROSE** | Long-form English prose | README (workspace / repo / project), TOPIC, GUIDE, MEMO, ARCHITECTURE, INVENTORY, license-explainer, CHANGELOG | | **PROSE** | Long-form English prose | README (workspace / repo / project), TOPIC, GUIDE, MEMO, ARCHITECTURE, INVENTORY, license-explainer, CHANGELOG |
| **COMMS** | Short-form interpersonal | email, chat, ticket comment, meeting notes | | **COMMS** | Short-form interpersonal | email, chat, ticket comment, meeting notes |
| **LEGAL** | Volume-gated formal | contract, CLA, policy, terms (default-routes to Tier C) | | **LEGAL** | Volume-gated formal | contract, CLA, policy, terms (default-routes to Tier C) |
| **TRANSLATE** | Meta-protocol | Operates over the other families; not a separate generation track | | **TRANSLATE** | Meta-protocol | Operates over the other families; not a separate generation track |
Three adapters compose at request time: Three adapters compose at request time:
``` ```
composed_weights = composed_weights =
base_model base_model
⊕ tenant_adapter[<tenant_id>] // brand voice ⊕ tenant_adapter[<tenant_id>] // brand voice
⊕ protocol_adapter[PROSE | COMMS | LEGAL | TRANSLATE] ⊕ protocol_adapter[PROSE | COMMS | LEGAL | TRANSLATE]
``` ```
Five or more adapters per request crosses into multi-task interference per the 2025 LoRA literature (LoRAX, S-LoRA, TC-LoRA, LoRI). The platform stays at three. Five or more adapters per request crosses into multi-task interference per the 2025 LoRA literature (LoRAX, S-LoRA, TC-LoRA, LoRI). The platform stays at three.
Register, brand voice, document sub-type, and target audience live as prompt scaffolding rather than additional adapters. Fewer adapters, richer scaffolding, retrieval grounding, decode-time constraints — this is the 2026 industry consensus for production editorial systems. Register, brand voice, document sub-type, and target audience live as prompt scaffolding rather than additional adapters. Fewer adapters, richer scaffolding, retrieval grounding, decode-time constraints — this is the 2026 industry consensus for production editorial systems.
### The four-service split ### The four-service split
The editorial-write path runs through four services. Each owns one shape: The editorial-write path runs through four services. Each owns one shape:
| Service | Shape | Owner cluster | | Service | Shape | Owner cluster |
|---|---|---| |---|---|---|
| `service-content` | Data — taxonomy ledger and knowledge graph | project-slm | | `service-content` | Data — taxonomy ledger and knowledge graph | project-slm |
| `service-slm` | Inference — Doorman, tier routing, audit ledger | project-slm | | `service-slm` | Inference — Doorman, tier routing, audit ledger | project-slm |
| `service-disclosure` | Schema — types, validators, CFG, templates | project-language | | `service-disclosure` | Schema — types, validators, CFG, templates | project-language |
| `service-proofreader` | Operational — request-shaped HTTP write-assistant | project-proofreader | | `service-proofreader` | Operational — request-shaped HTTP write-assistant | project-proofreader |
A customer can replace any one without touching the rest. Replace `service-slm` with a customer-owned GPU host while keeping `service-content` and `service-disclosure`. Replace `service-content` with a customer's existing knowledge graph while keeping `service-slm`. The contract between services is the only thing that needs to hold. A customer can replace any one without touching the rest. Replace `service-slm` with a customer-owned GPU host while keeping `service-content` and `service-disclosure`. Replace `service-content` with a customer's existing knowledge graph while keeping `service-slm`. The contract between services is the only thing that needs to hold.
The platform's contribution to this pattern is the per-tenant [[worm-ledger-architecture|audit ledger]] that makes each substitution composable across regulatory contexts — a replacement service produces the same audit trail the original produced. The platform's contribution to this pattern is the per-tenant [[worm-ledger-architecture|audit ledger]] that makes each substitution composable across regulatory contexts — a replacement service produces the same audit trail the original produced.
### Multi-tenant via moduleId namespacing ### Multi-tenant via moduleId namespacing
One `service-content` instance per platform deployment, with `moduleId` partitioning tenants inside. Per-tenant isolated deployment is the escalation path — when a customer needs key-management-per-tenant or stronger isolation, they spin up their own platform instance in their own infrastructure and get their own `service-content` there. One `service-content` instance per platform deployment, with `moduleId` partitioning tenants inside. Per-tenant isolated deployment is the escalation path — when a customer needs key-management-per-tenant or stronger isolation, they spin up their own platform instance in their own infrastructure and get their own `service-content` there.
This is the meaning of "tenant escalation happens at the deployment boundary, not the service-naming boundary." The service stays multi-tenant; the deployment topology grows isolation when warranted. This is the meaning of "tenant escalation happens at the deployment boundary, not the service-naming boundary." The service stays multi-tenant; the deployment topology grows isolation when warranted.
## Configuration ## Configuration
Eight editorial task-types are defined in the platform's editorial cluster manifest: `prose-edit`, `comms-edit`, `frontmatter-normalize`, `citation-insert`, `register-tighten`, `cross-link-verify`, `schema-validate`, `template-author`. Each generates verdict-signed training tuples through the [[apprenticeship-substrate]] pipeline. The tuples feed continued pretraining on the customer's adapter when corpus volume warrants. Eight editorial task-types are defined in the platform's editorial cluster manifest: `prose-edit`, `comms-edit`, `frontmatter-normalize`, `citation-insert`, `register-tighten`, `cross-link-verify`, `schema-validate`, `template-author`. Each generates verdict-signed training tuples through the [[apprenticeship-substrate]] pipeline. The tuples feed continued pretraining on the customer's adapter when corpus volume warrants.
Per `[ni-51-102]` continuous-disclosure language and in accordance with the forward-looking information principles of `[osc-sn-51-721]`, the substrate's training pipeline is described in planned terms. The shape is in place; the operational throughput is what matures over time. The pipeline target: every editorial action a customer deployment performs is one tuple of training data for the customer's adapter. The customer's voice deepens over time without their text leaving their infrastructure. Per `[ni-51-102]` continuous-disclosure language and in accordance with the forward-looking information principles of `[osc-sn-51-721]`, the substrate's training pipeline is described in planned terms. The shape is in place; the operational throughput is what matures over time. The pipeline target: every editorial action a customer deployment performs is one tuple of training data for the customer's adapter. The customer's voice deepens over time without their text leaving their infrastructure.
## See also ## See also
- [[style-guide-topic]] - [[style-guide-topic]]
- [[customer-hostability]] - [[customer-hostability]]
- [[anti-homogenization-discipline]] - [[anti-homogenization-discipline]]
- [[apprenticeship-substrate]] - [[apprenticeship-substrate]]
- [[citation-substrate]] - [[citation-substrate]]