Skip to content

The apprenticeship substrate

Topic

From the PointSav Documentation

The platform mechanism that routes code-shaped and editorial work first through a local Small Language Model, captures a signed senior verdict on every attempt, and uses the resulting preference pairs as continued-pretraining signal β€” compounding toward task types that need no senior authoring.

Updated 2026-05-22 Β· HistoryEspaΓ±ol

When a senior engineer reviews a junior's work, the correction is the most valuable artifact in the room β€” and almost every organization discards it. The reasoning behind a fix is spoken once, applied once, and never captured. The next junior, and the next model, start from zero.

The apprenticeship substrate captures it. Every code-shaped and editorial task is routed first through a local Small Language Model; a senior reviewer then signs a verdict on the attempt. The signed disagreement between apprentice attempt and senior verdict becomes preference-pair training data.

Routing runs in three stages per task type β€” review, spot-check, and autonomous β€” with quantified promotion thresholds. A graduated task type eliminates senior-author tokens on that class of work, and shadow routing harvests training data from every other code-shaped commit across the fleet.

For a regulated buyer the consequence is concrete. The AI improves on the organization's own work, and that work never leaves the deployment. Per [ni-51-102] continuous-disclosure language, the trajectory toward token elimination across graduated task types is forward-looking; the routing structure is operational today.

[edit]Overview

Captured observation trains a model on what the senior wrote. Captured interaction β€” an apprentice attempt plus a signed senior verdict β€” trains an order of magnitude more efficiently per tuple. Signed preference data is the most valuable training input, a finding established across the 2024–2026 reinforcement-learning-from-feedback literature.

This routing pattern produces those interaction tuples on real production work, not synthetic benchmarks. Every session exercises the apprentice; every signed verdict is a training tuple; every graduated task type eliminates external AI tokens monotonically.

[edit]Ring and role

The apprenticeship substrate spans Ring 3 β€” Optional Intelligence β€” and the training-corpus infrastructure. service-slm, the Doorman, is the Ring 3 service that executes apprentice routing. The promotion ledger and the corpus-capture scripts live inside the customer's own deployment. The substrate is active whenever a session issues a brief rather than authoring directly.

[edit]The three stages

Routing operates per task type. Promotion is automatic on a threshold crossing; demotion is automatic on any post-commit revert traced to an apprentice diff.

Stage Routing Senior review
review Apprentice attempts; senior reviews every diff before commit Every diff
spot-check Apprentice commits; senior reviews a 1-in-N sample and auto-flagged anomalies Sampled and flagged
autonomous Apprentice commits autonomously; monthly batch audit Batch audit

Promotion from review to spot-check requires at least 50 verdicts at a 0.85 acceptance rate over the rolling 50. Promotion to autonomous requires at least 100 verdicts at a 0.95 acceptance rate over the rolling 100, with zero post-commit reverts traced to apprentice diffs. A single revert traced to an apprentice diff drops the task type one stage, recorded as a signed ledger event. New task types start at review.

[edit]The brief, the attempt, the verdict

A senior who would author a diff issues a brief instead. The brief states what is being done, the invariants the diff must preserve, the constraints cited, and the acceptance test the apprentice must make pass.

The apprentice responds with an attempt: reasoning that cites the brief's invariants, a self-confidence value calibrated against its prior ledger record on this task type, and a unified diff. If self-confidence falls below 0.5, the apprentice escalates without a diff β€” surfacing that the task type is harder than it can handle today, rather than wasting senior review on a low-confidence diff.

The senior reads the attempt and signs a verdict: accept, refine, reject, or defer-tier-c. Verdicts of refine and reject carry a one-sentence note β€” the highest-signal training data the corpus produces. The signature uses ssh-keygen -Y sign with a namespace tag that binds it to this protocol, so a commit-signing signature cannot be repurposed as a verdict.

[edit]The promotion ledger

A single plain-text file tracks every task type's stage and the event log that drives promotion and demotion. Every event line carries an embedded SSH signature block; the writer β€” the Doorman β€” appends only after verifying the senior's signature on the verdict batch. Single-writer concurrency is held by flock(2), with acceptable latency at the expected rate of tens of verdicts per day.

The event schema is closed β€” task-type-add, verdict-batch, promotion, demotion, verdict-supersession, task-type-retire β€” because the promotion-threshold computations depend on it.

[edit]Production routing and shadow routing

Two paths run in parallel.

Production routing runs on graduated task types: the senior issues a brief before authoring a diff, the apprentice's attempt is the candidate diff, and on accept that diff lands in the commit. This eliminates senior authoring tokens on graduated task types.

Shadow routing runs on every other code-shaped commit across every active cluster. After the diff is authored the usual way, the session fires a brief to the apprentice; the (brief, attempt, actual-diff) triple is captured to the corpus with no verdict and no signing.

Production routing eliminates senior tokens on graduated types. Shadow routing generates the training data that graduates the next type. The two paths compound.

[edit]Capture pipeline

The apprenticeship corpus is a fourth corpus alongside the constitutional, engineering, and tenant-runtime corpora. Per-tenant partitioning lives at the directory level (see Totebox storage layout):

data/training-corpus/apprenticeship/<task-type>/<tenant>/<ulid>.jsonl

One file holds one (brief, attempt, verdict) triple. Tenant-private records never leave the tenant's infrastructure. A refine or reject verdict additionally produces a Direct Preference Optimisation triple β€” rejected attempt, corrected diff, constraint-violation tag β€” which feeds adapter training on the apprentice's policy.

[edit]Configuration

The first registered task type is version-bump-manifest. Every platform MINOR and PATCH bump touches MANIFEST.md and CHANGELOG.md: well-shaped work, no architectural judgment, easily verifiable. The apprentice graduates this type first; senior tokens drop on this class of work; the next task type registers.

Four conditions make the substrate work, and all four are structural properties of a customer-owned deployment: a per-customer governance charter, per-customer signing identities, per-customer task-type granularity in the promotion ledger, and per-customer continued pretraining. A cloud-managed AI platform lacks all four β€” training on customer interaction data requires pooling it, which eliminates the per-customer isolation guarantee.

The end state is a continuum: code-shaped work the apprentice handles autonomously, work it handles under spot-check, and work that still requires senior review. The continuum shifts as the corpus matures. Per [ni-51-102], the trajectory toward token elimination is forward-looking; the routing shape is in place today.

[edit]See also

Edit this page Β· View source