Diff: substrate/language-protocol-substrate.es

From 9fe89f6 to 9fe89f6

+0 / −0 lines

Before	After
---	---
schema: foundry-doc-v1	schema: foundry-doc-v1
title: "The language-protocol substrate"	title: "The language-protocol substrate"
slug: language-protocol-substrate	slug: language-protocol-substrate
category: substrate	category: substrate
type: topic	type: topic
quality: complete	quality: complete
short_description: "The editorial infrastructure that encodes register, brand voice, document sub-type, and target audience as reusable prompt scaffolding — four adapter families, eighteen genre templates, a frontmatter validator, and a four-service split that lets a customer replace any single component without touching the rest."	short_description: "The editorial infrastructure that encodes register, brand voice, document sub-type, and target audience as reusable prompt scaffolding — four adapter families, eighteen genre templates, a frontmatter validator, and a four-service split that lets a customer replace any single component without touching the rest."
status: active	status: active
bcsc_class: public-disclosure-safe	bcsc_class: public-disclosure-safe
last_edited: 2026-05-15	last_edited: 2026-05-15
editor: pointsav-engineering	editor: pointsav-engineering
cites:	cites:
- ni-51-102	- ni-51-102
- osc-sn-51-721	- osc-sn-51-721
paired_with: language-protocol-substrate.es.md	paired_with: language-protocol-substrate.es.md
---	---

Every editorial action that passes through the PointSav platform — document generation, schema validation, training-tuple capture — is shaped by a substrate that encodes register, brand voice, document sub-type, and target audience as reusable prompt scaffolding rather than ad-hoc instruction. The result is editorial work that is audited, per-tenant, and replaceable at any layer without rebuilding the rest.	Every editorial action that passes through the PointSav platform — document generation, schema validation, training-tuple capture — is shaped by a substrate that encodes register, brand voice, document sub-type, and target audience as reusable prompt scaffolding rather than ad-hoc instruction. The result is editorial work that is audited, per-tenant, and replaceable at any layer without rebuilding the rest.

The substrate provides four artefact families, eighteen genre templates, a frontmatter validator that returns all schema violations in a single pass, and a banned-vocabulary list of eight cross-genre prohibited terms. These ship as a Rust crate (`service-disclosure`) split across four services — [[service-content]] (knowledge graph), [[service-slm]] ([[compounding-doorman\|Doorman]] and inference), `service-disclosure` (schema and templates), `service-proofreader` (HTTP write-assistant) — where any single service can be replaced by a [[customer-hostability\|customer-owned equivalent]] while the rest hold.	The substrate provides four artefact families, eighteen genre templates, a frontmatter validator that returns all schema violations in a single pass, and a banned-vocabulary list of eight cross-genre prohibited terms. These ship as a Rust crate (`service-disclosure`) split across four services — [[service-content]] (knowledge graph), [[service-slm]] ([[compounding-doorman\|Doorman]] and inference), `service-disclosure` (schema and templates), `service-proofreader` (HTTP write-assistant) — where any single service can be replaced by a [[customer-hostability\|customer-owned equivalent]] while the rest hold.

Three adapters compose at request time: base model, tenant adapter (brand voice), and protocol adapter (PROSE \| COMMS \| LEGAL \| TRANSLATE). Register, brand voice, and target audience live as prompt scaffolding rather than additional adapters — five or more adapters per request crosses into multi-task interference per the 2025 LoRA literature; the platform stays at three. Every editorial action produces a verdict-signed training tuple through the [[apprenticeship-substrate]] pipeline, feeding continued pretraining on the customer's adapter without the customer's text leaving their infrastructure.	Three adapters compose at request time: base model, tenant adapter (brand voice), and protocol adapter (PROSE \| COMMS \| LEGAL \| TRANSLATE). Register, brand voice, and target audience live as prompt scaffolding rather than additional adapters — five or more adapters per request crosses into multi-task interference per the 2025 LoRA literature; the platform stays at three. Every editorial action produces a verdict-signed training tuple through the [[apprenticeship-substrate]] pipeline, feeding continued pretraining on the customer's adapter without the customer's text leaving their infrastructure.

For regulated buyers, the four-service split matters because every editorial action is audited in the per-tenant [[worm-ledger-architecture\|ledger]] before it exits the customer's network. The customer can fork any [[adapter-composition\|adapter]], inspect the prompt scaffolding, and verify that their brand voice is not pooled with another tenant's training data. Per `[ni-51-102]` and `[osc-sn-51-721]`, the training pipeline is described in planned terms; the substrate architecture is operational today.	For regulated buyers, the four-service split matters because every editorial action is audited in the per-tenant [[worm-ledger-architecture\|ledger]] before it exits the customer's network. The customer can fork any [[adapter-composition\|adapter]], inspect the prompt scaffolding, and verify that their brand voice is not pooled with another tenant's training data. Per `[ni-51-102]` and `[osc-sn-51-721]`, the training pipeline is described in planned terms; the substrate architecture is operational today.

## Overview	## Overview

The substrate provides four artefacts:	The substrate provides four artefacts:

1. A 4-family adapter taxonomy. PROSE for long-form English, COMMS for short-form interpersonal, LEGAL for volume-gated formal documents, TRANSLATE as a meta-protocol layered on top of any other family.	1. A 4-family adapter taxonomy. PROSE for long-form English, COMMS for short-form interpersonal, LEGAL for volume-gated formal documents, TRANSLATE as a meta-protocol layered on top of any other family.
2. A genre-template registry. Eighteen templates, each carrying its required sections, register parameters, bilingual-pair convention, frontmatter schema, and prompt scaffolding.	2. A genre-template registry. Eighteen templates, each carrying its required sections, register parameters, bilingual-pair convention, frontmatter schema, and prompt scaffolding.
3. A frontmatter validator. Returns every per-genre rule violation in one pass rather than first-fail.	3. A frontmatter validator. Returns every per-genre rule violation in one pass rather than first-fail.
4. A banned-vocabulary list. Eight cross-genre prohibited terms that survive in marketing prose and have no place in precise writing.	4. A banned-vocabulary list. Eight cross-genre prohibited terms that survive in marketing prose and have no place in precise writing.

These four artefacts ship as a Rust crate (`service-disclosure`) that any platform component can consume. The Doorman composes the templates into prompts at request time; the per-tenant write-assistant validates inbound and outbound text against the schema; the apprenticeship pipeline produces verdict-signed training tuples on every editorial action.	These four artefacts ship as a Rust crate (`service-disclosure`) that any platform component can consume. The Doorman composes the templates into prompts at request time; the per-tenant write-assistant validates inbound and outbound text against the schema; the apprenticeship pipeline produces verdict-signed training tuples on every editorial action.

## Ring and Role	## Ring and Role

The Language-Protocol Substrate spans Ring 3 — Optional Intelligence (inference via the [[compounding-doorman\|Doorman]]) and Ring 2 — Knowledge and Processing (schema validation and template management via [[service-content]] and `service-disclosure`). It has no Ring 1 component: editorial work begins after boundary ingest completes. The substrate is activated on every editorial action that passes through the Doorman, whether that action is a document generation request, a validation pass, or a training-tuple capture.	The Language-Protocol Substrate spans Ring 3 — Optional Intelligence (inference via the [[compounding-doorman\|Doorman]]) and Ring 2 — Knowledge and Processing (schema validation and template management via [[service-content]] and `service-disclosure`). It has no Ring 1 component: editorial work begins after boundary ingest completes. The substrate is activated on every editorial action that passes through the Doorman, whether that action is a document generation request, a validation pass, or a training-tuple capture.

## Architecture	## Architecture

### The four families	### The four families

\| Family \| Generation responsibility \| Templates \|	\| Family \| Generation responsibility \| Templates \|
\|---\|---\|---\|	\|---\|---\|---\|
\| PROSE \| Long-form English prose \| README (workspace / repo / project), TOPIC, GUIDE, MEMO, ARCHITECTURE, INVENTORY, license-explainer, CHANGELOG \|	\| PROSE \| Long-form English prose \| README (workspace / repo / project), TOPIC, GUIDE, MEMO, ARCHITECTURE, INVENTORY, license-explainer, CHANGELOG \|
\| COMMS \| Short-form interpersonal \| email, chat, ticket comment, meeting notes \|	\| COMMS \| Short-form interpersonal \| email, chat, ticket comment, meeting notes \|
\| LEGAL \| Volume-gated formal \| contract, CLA, policy, terms (default-routes to Tier C) \|	\| LEGAL \| Volume-gated formal \| contract, CLA, policy, terms (default-routes to Tier C) \|
\| TRANSLATE \| Meta-protocol \| Operates over the other families; not a separate generation track \|	\| TRANSLATE \| Meta-protocol \| Operates over the other families; not a separate generation track \|

Three adapters compose at request time:	Three adapters compose at request time:

```	```
composed_weights =	composed_weights =
base_model	base_model
⊕ tenant_adapter[<tenant_id>] // brand voice	⊕ tenant_adapter[<tenant_id>] // brand voice
⊕ protocol_adapter[PROSE \| COMMS \| LEGAL \| TRANSLATE]	⊕ protocol_adapter[PROSE \| COMMS \| LEGAL \| TRANSLATE]
```	```

Five or more adapters per request crosses into multi-task interference per the 2025 LoRA literature (LoRAX, S-LoRA, TC-LoRA, LoRI). The platform stays at three.	Five or more adapters per request crosses into multi-task interference per the 2025 LoRA literature (LoRAX, S-LoRA, TC-LoRA, LoRI). The platform stays at three.

Register, brand voice, document sub-type, and target audience live as prompt scaffolding rather than additional adapters. Fewer adapters, richer scaffolding, retrieval grounding, decode-time constraints — this is the 2026 industry consensus for production editorial systems.	Register, brand voice, document sub-type, and target audience live as prompt scaffolding rather than additional adapters. Fewer adapters, richer scaffolding, retrieval grounding, decode-time constraints — this is the 2026 industry consensus for production editorial systems.

### The four-service split	### The four-service split

The editorial-write path runs through four services. Each owns one shape:	The editorial-write path runs through four services. Each owns one shape:

\| Service \| Shape \| Owner cluster \|	\| Service \| Shape \| Owner cluster \|
\|---\|---\|---\|	\|---\|---\|---\|
\| `service-content` \| Data — taxonomy ledger and knowledge graph \| project-slm \|	\| `service-content` \| Data — taxonomy ledger and knowledge graph \| project-slm \|
\| `service-slm` \| Inference — Doorman, tier routing, audit ledger \| project-slm \|	\| `service-slm` \| Inference — Doorman, tier routing, audit ledger \| project-slm \|
\| `service-disclosure` \| Schema — types, validators, CFG, templates \| project-language \|	\| `service-disclosure` \| Schema — types, validators, CFG, templates \| project-language \|
\| `service-proofreader` \| Operational — request-shaped HTTP write-assistant \| project-proofreader \|	\| `service-proofreader` \| Operational — request-shaped HTTP write-assistant \| project-proofreader \|

A customer can replace any one without touching the rest. Replace `service-slm` with a customer-owned GPU host while keeping `service-content` and `service-disclosure`. Replace `service-content` with a customer's existing knowledge graph while keeping `service-slm`. The contract between services is the only thing that needs to hold.	A customer can replace any one without touching the rest. Replace `service-slm` with a customer-owned GPU host while keeping `service-content` and `service-disclosure`. Replace `service-content` with a customer's existing knowledge graph while keeping `service-slm`. The contract between services is the only thing that needs to hold.

The platform's contribution to this pattern is the per-tenant [[worm-ledger-architecture\|audit ledger]] that makes each substitution composable across regulatory contexts — a replacement service produces the same audit trail the original produced.	The platform's contribution to this pattern is the per-tenant [[worm-ledger-architecture\|audit ledger]] that makes each substitution composable across regulatory contexts — a replacement service produces the same audit trail the original produced.

### Multi-tenant via moduleId namespacing	### Multi-tenant via moduleId namespacing

One `service-content` instance per platform deployment, with `moduleId` partitioning tenants inside. Per-tenant isolated deployment is the escalation path — when a customer needs key-management-per-tenant or stronger isolation, they spin up their own platform instance in their own infrastructure and get their own `service-content` there.	One `service-content` instance per platform deployment, with `moduleId` partitioning tenants inside. Per-tenant isolated deployment is the escalation path — when a customer needs key-management-per-tenant or stronger isolation, they spin up their own platform instance in their own infrastructure and get their own `service-content` there.

This is the meaning of "tenant escalation happens at the deployment boundary, not the service-naming boundary." The service stays multi-tenant; the deployment topology grows isolation when warranted.	This is the meaning of "tenant escalation happens at the deployment boundary, not the service-naming boundary." The service stays multi-tenant; the deployment topology grows isolation when warranted.

## Configuration	## Configuration

Eight editorial task-types are defined in the platform's editorial cluster manifest: `prose-edit`, `comms-edit`, `frontmatter-normalize`, `citation-insert`, `register-tighten`, `cross-link-verify`, `schema-validate`, `template-author`. Each generates verdict-signed training tuples through the [[apprenticeship-substrate]] pipeline. The tuples feed continued pretraining on the customer's adapter when corpus volume warrants.	Eight editorial task-types are defined in the platform's editorial cluster manifest: `prose-edit`, `comms-edit`, `frontmatter-normalize`, `citation-insert`, `register-tighten`, `cross-link-verify`, `schema-validate`, `template-author`. Each generates verdict-signed training tuples through the [[apprenticeship-substrate]] pipeline. The tuples feed continued pretraining on the customer's adapter when corpus volume warrants.

Per `[ni-51-102]` continuous-disclosure language and in accordance with the forward-looking information principles of `[osc-sn-51-721]`, the substrate's training pipeline is described in planned terms. The shape is in place; the operational throughput is what matures over time. The pipeline target: every editorial action a customer deployment performs is one tuple of training data for the customer's adapter. The customer's voice deepens over time without their text leaving their infrastructure.	Per `[ni-51-102]` continuous-disclosure language and in accordance with the forward-looking information principles of `[osc-sn-51-721]`, the substrate's training pipeline is described in planned terms. The shape is in place; the operational throughput is what matures over time. The pipeline target: every editorial action a customer deployment performs is one tuple of training data for the customer's adapter. The customer's voice deepens over time without their text leaving their infrastructure.

## See also	## See also

- [[style-guide-topic]]	- [[style-guide-topic]]
- [[customer-hostability]]	- [[customer-hostability]]
- [[anti-homogenization-discipline]]	- [[anti-homogenization-discipline]]
- [[apprenticeship-substrate]]	- [[apprenticeship-substrate]]
- [[citation-substrate]]	- [[citation-substrate]]