Skip to content

Diff: substrate/tui-corpus-producer.es

From f82faeb to f82faeb

+0 / −0 lines
BeforeAfter
--- ---
schema: foundry-doc-v1 schema: foundry-doc-v1
title: "TUI as corpus producer" title: "TUI as corpus producer"
slug: tui-corpus-producer slug: tui-corpus-producer
category: substrate category: substrate
type: topic type: topic
quality: complete quality: complete
short_description: "Every terminal interaction with service-slm through the operator TUI is a curated training corpus contribution for the per-tenant adapter." short_description: "Every terminal interaction with service-slm through the operator TUI is a curated training corpus contribution for the per-tenant adapter."
status: active status: active
bcsc_class: public-disclosure-safe bcsc_class: public-disclosure-safe
last_edited: 2026-05-15 last_edited: 2026-05-15
editor: pointsav-engineering editor: pointsav-engineering
cites: [] cites: []
references: references:
- id: 1 - id: 1
text: "Rafailov, R. et al. 'Direct Preference Optimization: Your Language Model is Secretly a Reward Model.' NeurIPS, 2023." text: "Rafailov, R. et al. 'Direct Preference Optimization: Your Language Model is Secretly a Reward Model.' NeurIPS, 2023."
url: "https://arxiv.org/abs/2305.18290" url: "https://arxiv.org/abs/2305.18290"
- id: 2 - id: 2
text: "Zhou, C. et al. 'LIMA: Less Is More for Alignment.' NeurIPS, 2023." text: "Zhou, C. et al. 'LIMA: Less Is More for Alignment.' NeurIPS, 2023."
url: "https://arxiv.org/abs/2305.11206" url: "https://arxiv.org/abs/2305.11206"
paired_with: tui-corpus-producer.es.md paired_with: tui-corpus-producer.es.md
--- ---
The **TUI-as-Corpus-Producer** pattern designates the operator terminal interface (`slm-cli`) as a primary source of high-quality training data for the per-tenant model adapter. Every interaction with the Doorman through this interface is a curated corpus contribution. The **TUI-as-Corpus-Producer** pattern designates the operator terminal interface (`slm-cli`) as a primary source of high-quality training data for the per-tenant model adapter. Every interaction with the Doorman through this interface is a curated corpus contribution.
## Why terminal interactions are high-quality training data ## Why terminal interactions are high-quality training data
Three properties distinguish system administration and IT-support interactions from general training data: Three properties distinguish system administration and IT-support interactions from general training data:
**Verifiable ground truth.** When an operator follows AI advice — running a suggested command, applying a proposed configuration change — the system either recovers or it does not. Other domains such as creative writing or strategic reasoning lack this immediate-feedback property. IT-support has it by default. The operator knows immediately whether the response was correct. **Verifiable ground truth.** When an operator follows AI advice — running a suggested command, applying a proposed configuration change — the system either recovers or it does not. Other domains such as creative writing or strategic reasoning lack this immediate-feedback property. IT-support has it by default. The operator knows immediately whether the response was correct.
**Narrow domain.** Archive operations, system conventions, and customer-specific workflow vocabulary form a bounded command set and failure-mode space. Models train more efficiently on bounded domains than on general corpora because the signal-to-noise ratio is higher. **Narrow domain.** Archive operations, system conventions, and customer-specific workflow vocabulary form a bounded command set and failure-mode space. Models train more efficiently on bounded domains than on general corpora because the signal-to-noise ratio is higher.
**Domain-expert feedback.** The operator issuing a verdict is the person who knows whether the response was correct — not a proxy labeler separated from the actual work. Published reinforcement-learning-from-human-feedback literature consistently reports that high-quality verdict-signed interaction tuples train an order of magnitude more efficiently than observation-only tuples. [^1] **Domain-expert feedback.** The operator issuing a verdict is the person who knows whether the response was correct — not a proxy labeler separated from the actual work. Published reinforcement-learning-from-human-feedback literature consistently reports that high-quality verdict-signed interaction tuples train an order of magnitude more efficiently than observation-only tuples. [^1]
## The /feedback mechanism ## The /feedback mechanism
After every assistant response in the TUI, the operator is offered three explicit verdicts: After every assistant response in the TUI, the operator is offered three explicit verdicts:
**Good.** The response was correct and useful. The tuple is flagged as a positive direct preference optimisation example. **Good.** The response was correct and useful. The tuple is flagged as a positive direct preference optimisation example.
**Refine.** The response was close but needed adjustment. The operator provides a correction inline; the tuple captures the response-and-refinement pair as training signal. **Refine.** The response was close but needed adjustment. The operator provides a correction inline; the tuple captures the response-and-refinement pair as training signal.
**Bad.** The response was wrong. The tuple is flagged as a negative direct preference optimisation example. **Bad.** The response was wrong. The tuple is flagged as a negative direct preference optimisation example.
If the operator dismisses without providing a verdict, the tuple is captured as unsigned and contributes to supervised fine-tuning but not to direct preference optimisation. If the operator dismisses without providing a verdict, the tuple is captured as unsigned and contributes to supervised fine-tuning but not to direct preference optimisation.
## Adapter quality budget ## Adapter quality budget
Published fine-tuning literature suggests 200 to 500 high-quality verdict-signed interactions are sufficient for a first adapter training cycle in a narrow domain. [^2] The platform's intended sequence for each tenant is: accumulate signed interactions from dogfood operations, train the first per-tenant adapter, apply a validation quality gate, and promote the adapter to the deployment. Each subsequent training cycle incorporates additional interactions, progressively tuning the adapter to the customer's specific environment — their systemd units, their seed taxonomy, their workflow vocabulary. Published fine-tuning literature suggests 200 to 500 high-quality verdict-signed interactions are sufficient for a first adapter training cycle in a narrow domain. [^2] The platform's intended sequence for each tenant is: accumulate signed interactions from dogfood operations, train the first per-tenant adapter, apply a validation quality gate, and promote the adapter to the deployment. Each subsequent training cycle incorporates additional interactions, progressively tuning the adapter to the customer's specific environment — their systemd units, their seed taxonomy, their workflow vocabulary.
## Per-tenant adapter ownership ## Per-tenant adapter ownership
The corpus produced by a customer's operators trains that customer's adapter, not a general adapter. Per the [[customer-owned-graph-ip]] convention, the trained adapter weights are the customer's property. The platform distributes the model architecture and the training pipeline; the customer retains the trained adapter that results. The corpus produced by a customer's operators trains that customer's adapter, not a general adapter. Per the [[customer-owned-graph-ip]] convention, the trained adapter weights are the customer's property. The platform distributes the model architecture and the training pipeline; the customer retains the trained adapter that results.
## Verdict capture discipline ## Verdict capture discipline
Some terminal sessions should not contribute to the training corpus: test sessions initiated with a no-corpus flag, sessions interrupted by unavailable tiers before completion, and sessions using forced-tier debug mode are audit-logged but excluded from normal training data. The boundary between operational corpus and test corpus is enforced at the Doorman's verdict intake endpoint. Some terminal sessions should not contribute to the training corpus: test sessions initiated with a no-corpus flag, sessions interrupted by unavailable tiers before completion, and sessions using forced-tier debug mode are audit-logged but excluded from normal training data. The boundary between operational corpus and test corpus is enforced at the Doorman's verdict intake endpoint.
## See also ## See also
- [[single-boundary-compute-discipline]] — the TUI never calls inference tiers directly; all calls route through the Doorman - [[single-boundary-compute-discipline]] — the TUI never calls inference tiers directly; all calls route through the Doorman
- [[customer-owned-graph-ip]] — per-tenant adapter weights are the customer's intellectual property - [[customer-owned-graph-ip]] — per-tenant adapter weights are the customer's intellectual property
- [[knowledge-graph-grounded-apprenticeship]] — training tuples carry graph context when the Doorman grounds the request - [[knowledge-graph-grounded-apprenticeship]] — training tuples carry graph context when the Doorman grounds the request