Skip to content

Diff: substrate/apprenticeship-substrate

From d9d2bc1 to d9d2bc1

+0 / −0 lines
BeforeAfter
--- ---
schema: foundry-doc-v1 schema: foundry-doc-v1
title: "The apprenticeship substrate" title: "The apprenticeship substrate"
slug: apprenticeship-substrate slug: apprenticeship-substrate
category: substrate category: substrate
type: topic type: topic
quality: complete quality: complete
short_description: "The platform mechanism that routes code-shaped and editorial work first through a local Small Language Model, captures signed senior verdicts on every attempt, and uses the resulting preference pairs as continued-pretraining signal — compounding toward graduated task types that require no senior authoring." short_description: "The platform mechanism that routes code-shaped and editorial work first through a local Small Language Model, captures signed senior verdicts on every attempt, and uses the resulting preference pairs as continued-pretraining signal — compounding toward graduated task types that require no senior authoring."
status: active status: active
bcsc_class: public-disclosure-safe bcsc_class: public-disclosure-safe
last_edited: 2026-05-15 last_edited: 2026-05-15
editor: pointsav-engineering editor: pointsav-engineering
cites: cites:
- ni-51-102 - ni-51-102
paired_with: apprenticeship-substrate.es.md paired_with: apprenticeship-substrate.es.md
--- ---
The PointSav platform routes every code-shaped and editorial task through a local small language model before a senior reviewer sees it. The human operator shifts from primary author to verifier — and the signed disagreement between apprentice attempt and senior verdict produces preference-pair training data that compounds in value over time. The PointSav platform routes every code-shaped and editorial task through a local small language model before a senior reviewer sees it. The human operator shifts from primary author to verifier — and the signed disagreement between apprentice attempt and senior verdict produces preference-pair training data that compounds in value over time.
Three routing stages are tracked per task type: `review` (every apprentice diff reviewed before commit), `spot-check` (apprentice commits; 1-in-N sampled), and `autonomous` (apprentice commits; monthly batch audit). Promotion thresholds are quantified: 50 accepted verdicts at 0.85 acceptance rate to graduate from `review` to `spot-check`; 100 verdicts at 0.95 acceptance rate with zero post-commit reverts to reach `autonomous`. Three routing stages are tracked per task type: `review` (every apprentice diff reviewed before commit), `spot-check` (apprentice commits; 1-in-N sampled), and `autonomous` (apprentice commits; monthly batch audit). Promotion thresholds are quantified: 50 accepted verdicts at 0.85 acceptance rate to graduate from `review` to `spot-check`; 100 verdicts at 0.95 acceptance rate with zero post-commit reverts to reach `autonomous`.
The routing compounds. Each signed verdict is a training tuple; graduated task types eliminate senior-author tokens on that class of work permanently; shadow routing generates additional training data from every other code-shaped commit across the fleet, without a verdict or signing step. The first registered task type is `version-bump-manifest` — deterministic, verifiable, low-judgment. It graduates first; the next type registers. The routing compounds. Each signed verdict is a training tuple; graduated task types eliminate senior-author tokens on that class of work permanently; shadow routing generates additional training data from every other code-shaped commit across the fleet, without a verdict or signing step. The first registered task type is `version-bump-manifest` — deterministic, verifiable, low-judgment. It graduates first; the next type registers.
Four conditions make this work, and all four are structural properties of a customer-owned deployment: a per-customer governance charter, per-customer signing identities, per-customer task-type granularity in the promotion ledger, and per-customer continued pretraining. Cloud-managed AI platforms structurally lack all four — training on customer interaction data requires pooling it, which eliminates the per-customer isolation guarantee. Per `[ni-51-102]` continuous-disclosure language, the trajectory toward token elimination across graduated task types is forward-looking; the routing structure is operational today. Four conditions make this work, and all four are structural properties of a customer-owned deployment: a per-customer governance charter, per-customer signing identities, per-customer task-type granularity in the promotion ledger, and per-customer continued pretraining. Cloud-managed AI platforms structurally lack all four — training on customer interaction data requires pooling it, which eliminates the per-customer isolation guarantee. Per `[ni-51-102]` continuous-disclosure language, the trajectory toward token elimination across graduated task types is forward-looking; the routing structure is operational today.
## Overview ## Overview
Captured observation trains a model on what the senior wrote. Captured interaction — apprentice attempt plus signed senior verdict — trains an order of magnitude more efficiently per tuple. This is the central finding of the RLHF, DPO, and RLAIF literature from 2024–2026: signed preference data is the most valuable training input. Captured observation trains a model on what the senior wrote. Captured interaction — apprentice attempt plus signed senior verdict — trains an order of magnitude more efficiently per tuple. This is the central finding of the RLHF, DPO, and RLAIF literature from 2024–2026: signed preference data is the most valuable training input.
This routing pattern produces those interaction tuples on real production work, not synthetic benchmarks. Every session exercises the apprentice; every signed verdict is a training tuple; every graduated task-type eliminates external AI tokens monotonically. This routing pattern produces those interaction tuples on real production work, not synthetic benchmarks. Every session exercises the apprentice; every signed verdict is a training tuple; every graduated task-type eliminates external AI tokens monotonically.
## Ring and Role ## Ring and Role
The Apprenticeship Substrate spans Ring 3 — Optional Intelligence and the training-corpus infrastructure. `service-slm` (the Doorman) is the Ring 3 service that executes apprentice routing. The promotion ledger and corpus capture scripts live within the customer's deployment infrastructure. The substrate is active whenever a session issues a brief rather than authoring directly. The Apprenticeship Substrate spans Ring 3 — Optional Intelligence and the training-corpus infrastructure. `service-slm` (the Doorman) is the Ring 3 service that executes apprentice routing. The promotion ledger and corpus capture scripts live within the customer's deployment infrastructure. The substrate is active whenever a session issues a brief rather than authoring directly.
## Architecture ## Architecture
### The three stages ### The three stages
Routing operates per task-type. Promotion is automatic on threshold crossing; demotion is automatic on any post-commit revert traced to an apprentice diff. Routing operates per task-type. Promotion is automatic on threshold crossing; demotion is automatic on any post-commit revert traced to an apprentice diff.
| Stage | Routing | Senior review | | Stage | Routing | Senior review |
|---|---|---| |---|---|---|
| `review` | Apprentice attempts; senior reviews every diff before commit | Every diff | | `review` | Apprentice attempts; senior reviews every diff before commit | Every diff |
| `spot-check` | Apprentice commits; senior reviews 1-in-N sampled and auto-flagged anomalies | Sampled and flagged | | `spot-check` | Apprentice commits; senior reviews 1-in-N sampled and auto-flagged anomalies | Sampled and flagged |
| `autonomous` | Apprentice commits autonomously; monthly batch audit | Batch audit | | `autonomous` | Apprentice commits autonomously; monthly batch audit | Batch audit |
Initial promotion thresholds: Initial promotion thresholds:
- `review → spot-check`: at least 50 verdicts AND accept-rate at least 0.85 over the rolling 50. - `review → spot-check`: at least 50 verdicts AND accept-rate at least 0.85 over the rolling 50.
- `spot-check → autonomous`: at least 100 verdicts AND accept-rate at least 0.95 over the rolling 100 AND zero post-commit reverts traced to apprentice diffs. - `spot-check → autonomous`: at least 100 verdicts AND accept-rate at least 0.95 over the rolling 100 AND zero post-commit reverts traced to apprentice diffs.
Demotion: a single revert traced to an apprentice diff drops the task-type one stage. Recorded as a signed event in the ledger. New task-types start at `review`. Demotion: a single revert traced to an apprentice diff drops the task-type one stage. Recorded as a signed event in the ledger. New task-types start at `review`.
### The brief, the attempt, the verdict ### The brief, the attempt, the verdict
A senior who would author a diff issues a **brief** instead. The brief states what is being done, the invariants the diff must preserve, the constraints cited, and the acceptance test the apprentice should make pass. A senior who would author a diff issues a **brief** instead. The brief states what is being done, the invariants the diff must preserve, the constraints cited, and the acceptance test the apprentice should make pass.
The apprentice responds with an **attempt**: chain-of-thought reasoning citing the brief invariants, a self-confidence value calibrated against its prior ledger record on this task-type, and a unified diff. If self-confidence falls below 0.5, the apprentice escalates without diff — surfacing "this task-type is harder than I can handle today" rather than producing a low-confidence diff that wastes senior review. The apprentice responds with an **attempt**: chain-of-thought reasoning citing the brief invariants, a self-confidence value calibrated against its prior ledger record on this task-type, and a unified diff. If self-confidence falls below 0.5, the apprentice escalates without diff — surfacing "this task-type is harder than I can handle today" rather than producing a low-confidence diff that wastes senior review.
The senior reads the attempt and signs a **verdict**: `accept`, `refine`, `reject`, or `defer-tier-c`. Verdicts on `refine` and `reject` carry one-sentence notes — these are the highest-signal training data the corpus produces. The signature uses `ssh-keygen -Y sign` with a namespace tag (`apprenticeship-verdict-v1`) that binds the signature to this protocol; a commit-signing signature cannot be repurposed as a verdict signature. The senior reads the attempt and signs a **verdict**: `accept`, `refine`, `reject`, or `defer-tier-c`. Verdicts on `refine` and `reject` carry one-sentence notes — these are the highest-signal training data the corpus produces. The signature uses `ssh-keygen -Y sign` with a namespace tag (`apprenticeship-verdict-v1`) that binds the signature to this protocol; a commit-signing signature cannot be repurposed as a verdict signature.
### The promotion ledger ### The promotion ledger
A single plain-text file tracks every task-type's stage and the event log that drives promotion and demotion. Every event line carries an embedded SSH signature block; the writer (the Doorman) appends only after verifying the senior's signature on the verdict batch. Single-writer concurrency via `flock(2)`; acceptable latency at the expected verdict rate of tens per day. A single plain-text file tracks every task-type's stage and the event log that drives promotion and demotion. Every event line carries an embedded SSH signature block; the writer (the Doorman) appends only after verifying the senior's signature on the verdict batch. Single-writer concurrency via `flock(2)`; acceptable latency at the expected verdict rate of tens per day.
Event types: `task-type-add`, `verdict-batch`, `promotion`, `demotion`, `verdict-supersession`, `task-type-retire`. The schema is closed; new event types require ledger discipline because promotion threshold computations depend on them. Event types: `task-type-add`, `verdict-batch`, `promotion`, `demotion`, `verdict-supersession`, `task-type-retire`. The schema is closed; new event types require ledger discipline because promotion threshold computations depend on them.
### Production routing vs shadow routing ### Production routing vs shadow routing
Two paths run in parallel. Two paths run in parallel.
**Production routing** runs on graduated task-types. The senior issues a brief before authoring the diff; the apprentice's attempt is the candidate diff; on `accept`, the apprentice's diff lands in the commit. This eliminates senior authoring tokens on graduated task-types. **Production routing** runs on graduated task-types. The senior issues a brief before authoring the diff; the apprentice's attempt is the candidate diff; on `accept`, the apprentice's diff lands in the commit. This eliminates senior authoring tokens on graduated task-types.
**Shadow routing** runs on every other code-shaped commit across every active cluster. After the diff is authored the existing way, the session fires a brief to the apprentice; the apprentice produces what it would have done; the (brief, attempt, actual-diff) triple is captured to the corpus as a training tuple. No verdict; no signing. The apprentice is exercised continuously; the corpus grows on every cluster's work. **Shadow routing** runs on every other code-shaped commit across every active cluster. After the diff is authored the existing way, the session fires a brief to the apprentice; the apprentice produces what it would have done; the (brief, attempt, actual-diff) triple is captured to the corpus as a training tuple. No verdict; no signing. The apprentice is exercised continuously; the corpus grows on every cluster's work.
Production routing eliminates senior tokens on graduated types. Shadow routing generates the training data that graduates the next type. The two paths compound. Production routing eliminates senior tokens on graduated types. Shadow routing generates the training data that graduates the next type. The two paths compound.
### Capture pipeline ### Capture pipeline
The apprenticeship corpus is a fourth corpus alongside the constitutional, engineering, and tenant-runtime corpora. Per-tenant partitioning lives at the directory level: The apprenticeship corpus is a fourth corpus alongside the constitutional, engineering, and tenant-runtime corpora. Per-tenant partitioning lives at the directory level:
``` ```
data/training-corpus/apprenticeship/<task-type>/<tenant>/<ulid>.jsonl data/training-corpus/apprenticeship/<task-type>/<tenant>/<ulid>.jsonl
``` ```
One file per (brief, attempt, verdict) triple. Tenant-private records never leave the tenant's infrastructure. One file per (brief, attempt, verdict) triple. Tenant-private records never leave the tenant's infrastructure.
A `refine` or `reject` verdict additionally produces a Direct Preference Optimisation triple: (rejected attempt, corrected diff, constraint-violation tag). DPO triples feed adapter training on the apprentice's policy. A `refine` or `reject` verdict additionally produces a Direct Preference Optimisation triple: (rejected attempt, corrected diff, constraint-violation tag). DPO triples feed adapter training on the apprentice's policy.
## Configuration ## Configuration
The first registered task-type is `version-bump-manifest`. Every platform MINOR and PATCH bump touches `MANIFEST.md` and `CHANGELOG.md`. Well-shaped, no architectural judgment required, easily verifiable. The apprentice graduates this type first; senior tokens drop on this class of work; the next task-type registers. The first registered task-type is `version-bump-manifest`. Every platform MINOR and PATCH bump touches `MANIFEST.md` and `CHANGELOG.md`. Well-shaped, no architectural judgment required, easily verifiable. The apprentice graduates this type first; senior tokens drop on this class of work; the next task-type registers.
The end state is a continuum — code-shaped work the apprentice handles autonomously, code-shaped work the apprentice handles with spot-check, code-shaped work that still requires senior review. The continuum shifts as the corpus matures. The end state is a continuum — code-shaped work the apprentice handles autonomously, code-shaped work the apprentice handles with spot-check, code-shaped work that still requires senior review. The continuum shifts as the corpus matures.
Per `[ni-51-102]` continuous-disclosure language, the trajectory toward token-elimination across graduated task-types is forward-looking. The shape is in place; the operational throughput matures as the corpus grows and task-types graduate. Per `[ni-51-102]` continuous-disclosure language, the trajectory toward token-elimination across graduated task-types is forward-looking. The shape is in place; the operational throughput matures as the corpus grows and task-types graduate.
## See also ## See also
- [[compounding-substrate]] - [[compounding-substrate]]
- [[contributor-model]] - [[contributor-model]]
- [[language-protocol-substrate]] - [[language-protocol-substrate]]
- [[trajectory-substrate]] - [[trajectory-substrate]]
- [[customer-hostability]] - [[customer-hostability]]