Skip to content

The trajectory substrate

Topic

From the PointSav Documentation

The platform mechanism that converts operational work — commits, sessions, operator feedback — into structured JSONL training tuples, routing them into a continued-pretraining corpus that improves the OLMo base model over time.

Updated 2026-05-15 · HistoryEspañol

Every commit to the platform's code repositories, every editorial session, every operator correction that marks a suggestion wrong — these interactions are not discarded. They are captured as structured JSONL tuples, tagged with provenance metadata, and routed into a training corpus whose accumulated signal improves the OLMo base model each time a continued-pretraining run closes — the mechanism behind the compounding substrate.

Three orthogonal corpus types determine the architecture. The constitutional corpus captures what the platform's governance charter says a session of each role may and may not do — universal, loaded by every platform deployment. The engineering corpus captures contributor session trajectories and is vendor-scoped. The tenant-runtime corpus captures what flows through Ring 1 inside each customer deployment and never leaves that deployment unless the customer explicitly opts into the federated adapter marketplace (a planned forward-looking feature).

Capture is automatic — no operator decision is required to generate a training tuple. Every JSONL record carries provenance fields (tuple_type, doctrine_version, tenant, role, scope, redaction_class) that let the training pipeline assemble each corpus without trusting prose. Vendor data never co-mingles with customer data at training time; tenant data never crosses tenants — the separation is directory-level and pipeline-level, not policy-level.

An operator's specific correction shapes their own cluster's adapter exclusively — no other tenant benefits or is burdened by it. This per-contributor inverse of aggregate preference learning is structurally inaccessible to platforms whose preference signal is averaged across all users. Per [ni-51-102] and [osc-sn-51-721], the continued-pretraining pipeline is described in planned terms; the capture infrastructure and corpus accumulation are operational today.

[edit]Overview

The Trajectory Substrate converts operational work into continued-pretraining signal without interrupting that work. It operates in the background of every commit, every session, and every operator feedback exchange.

Three properties distinguish a trajectory substrate from a generic fine-tuning pipeline:

  1. Capture is automatic. No operator decision is required to generate a training tuple. The git post-commit hook fires; the session-end script fires; the rejected-suggestion script fires. Work produces signal by existing.
  2. Provenance is structural. Every JSONL record carries the governance version it was produced under, the tenant it belongs to, the role of the session, the cluster, and its redaction class. The training pipeline does not trust prose; it filters on these fields.
  3. Corpus boundaries are enforced by infrastructure. Vendor data never co-mingles with customer data at training time. Tenant data never crosses tenants. The separation is directory-level and pipeline-level — not policy-level.

[edit]Ring and Role

The Trajectory Substrate does not map to a single ring. Capture happens at every layer — Ring 1 runtime events, Ring 2 processing events, Ring 3 inference interactions, and deployment-level commit hooks. The substrate is the infrastructure that runs beneath all three rings per the three-ring-architecture, converting their operational outputs into training material. service-slm (Ring 3) is the primary consumer of the resulting adapters at inference time.

[edit]Architecture

[edit]Three corpora, three adapter families

Three orthogonal corpus types are defined. Each produces a different adapter class. Mixing them is the failure mode.

Constitutional corpus

Captures what the platform's governance charter says a session of each role may and may not do. Lives at data/training-corpus/doctrine/<doctrine-version>/ within the deployment. Produces the constitutional adapter (constitutional-doctrine-vM.m.p.lora). Retrains on every platform MINOR version bump. Universal — loaded by every platform deployment, regardless of tenant.

The doctrinal basis is [constitutional-ai-2212-08073]: a model whose behaviour is governed by a written charter, not only by aggregate preference signal.

Engineering corpus (Vendor side)

Captures contributor session trajectories on vendor repositories: commit diffs, session logs, and governance amendments in flight. Lives at data/training-corpus/engineering/ in the vendor workspace. Tenant scope: PointSav only. Produces the engineering adapter (engineering-pointsav-vN.lora), which PointSav may offer as a "platform-builder personality" to customers via service contract.

Tenant-runtime corpus (per Customer)

Captures what flows through Ring 1 inside each customer deployment: service-fs, service-people, service-email, service-input events; curated graph deltas from service-content; operator usage trajectories. Lives inside the customer deployment instance at <deployment-instance>/training-corpus/<tenant>/. Never in the vendor workspace. Produces the tenant adapter (tenant-<tenant>-vK.lora) inside and held by the customer deployment.

Tenant adapters do not leave the customer's deployment unless that customer explicitly opts into the federated adapter marketplace (a planned forward-looking feature). This is structural privacy by infrastructure.

[edit]Capture mechanics

Five scripts handle capture:

Script Trigger Writes
capture-edit.py git post-commit hook on cluster branches and the main branch engineering/<cluster>/<sha>.jsonl
capture-trajectory.sh Session end engineering/sessions/<id>.jsonl
capture-doctrine.sh Per platform MINOR bump doctrine/<version>/<tuple>.jsonl
capture-feedback.sh Inbox-archive of rejected or redo messages engineering/feedback/<record>.jsonl
capture-tenant-runtime.sh Scheduled inside deployment instance <instance>/training-corpus/<tenant>/<shard>.jsonl

Sanitize-outbound discipline applies at every capture point: private keys, PII, and customer-identifying details are redacted before write. Redaction runs at the capture script, not at the downstream consumer. Diff output is truncated at 1,000 lines.

Every JSONL record carries a provenance header with fields tuple_type, doctrine_version, tenant, moduleId, cluster, role, scope, redaction_class, evidence_class, source_commit, session_id, and created. The training pipeline filters on (tenant, redaction_class, evidence_class) to assemble each corpus. Any adapter version is re-derivable from its source records; any record can answer which adapter versions it trained.

[edit]Adapter composition at request time

At inference time the Doorman (service-slm) composes adapters per request:

composed_weights =
  base_model[OLMo-3-1125-7B-Q4]
  ⊕ constitutional[doctrine_v0.0.x] ← always
  ⊕ engineering[pointsav_vN]? ← if request is platform-build context
  ⊕ tenant[<tenant>_vK]? ← if request is tenant-data context
  ⊕ role[<role>] ← what role is asking
  ⊕ cluster[<cluster>_vJ]? ← if cluster scope applies

Multi-LoRA serving infrastructure — [s-lora-2024], [lorax-predibase] — serves thousands of concurrent adapters with hot-swap per request via the Yo-Yo GPU tier. The composition algebra is specified in adapter-composition.

Each cluster manifest declares adapter_routing.trains: (which adapters this cluster's commits and sessions feed) and adapter_routing.consumes: (which adapters the Doorman composes when this cluster's sessions query the SLM). Every cluster defaults to training the engineering-pointsav adapter — the substrate is always improving from every cluster's work.

[edit]Negative-trajectory distillation

When an operator rejects a suggestion, asks for a revision, or marks a result as wrong, that exchange is a training signal of the highest quality. The capture-feedback.sh script records the triple:

(rejected_trajectory, corrected_trajectory, constraint_violation_tag)

These pairs feed Direct Preference Optimisation (DPO) training, per the RLAIF literature lineage [constitutional-ai-2212-08073]. The adapter family learns, per cluster and per role, which outputs to avoid and which corrections to prefer.

This per-contributor inverse of aggregate preference learning is structurally inaccessible to platforms whose preference signal is averaged across all users. An operator's specific correction pattern shapes their own cluster's adapter — no other tenant benefits or is burdened by it.

[edit]Configuration

The Trajectory Substrate currently operates at L1 (edit-corpus capture, live since v0.1.1). Per [ni-51-102] continuous-disclosure language, and in accordance with the forward-looking information principles of [osc-sn-51-721], the statements below describe planned and intended development. Material assumptions: continued-pretraining technology matures on the current trajectory; OLMo 3 base model [olmo3-allenai] remains openly licensed; engineering corpus accumulates at the current platform development cadence. Actual outcomes may differ.

Planned subsequent tiers:

  • L2 — Trajectory capture (bin/capture-trajectory.sh + sanitize-outbound): intended for the v0.2.x release window; adds full session-log tuples to the corpus.
  • L3 — Fine-tuning prototype: intended for the v0.5.0 target; the router-trainer service reads the accumulated corpus and produces the first trained adapter against the OLMo 3 base.
  • L4 — Federated marketplace: long-horizon; depends on L3 operational maturity and the federated-LoRA research lineage [federated-lora-2502-05087] proving out in production at scale.

The quarterly pretraining cadence is intended once L3 is operational. Substrate baselines are designed to improve monotonically as the corpus accumulates; the mechanism is in place and the signal is accumulating.

[edit]See also

Edit this page · View source