Diff: substrate/tier-zero-customer-side-sovereign-specialist.es
From f82faeb to f82faeb
+0 / −0 lines
| Before | After |
|---|---|
| --- | --- |
| schema: foundry-doc-v1 | schema: foundry-doc-v1 |
| title: "Tier 0 customer-side sovereign specialist" | title: "Tier 0 customer-side sovereign specialist" |
| slug: tier-zero-customer-side-sovereign-specialist | slug: tier-zero-customer-side-sovereign-specialist |
| category: substrate | category: substrate |
| type: topic | type: topic |
| quality: complete | quality: complete |
| short_description: "The Tier 0 Totebox is a sovereign specialist deployment running on the customer's own hardware with no required cloud dependency and a 1 GB total footprint." | short_description: "The Tier 0 Totebox is a sovereign specialist deployment running on the customer's own hardware with no required cloud dependency and a 1 GB total footprint." |
| status: active | status: active |
| bcsc_class: public-disclosure-safe | bcsc_class: public-disclosure-safe |
| last_edited: 2026-05-01 | last_edited: 2026-05-01 |
| editor: pointsav-engineering | editor: pointsav-engineering |
| cites: [] | cites: [] |
| paired_with: tier-zero-customer-side-sovereign-specialist.es.md | paired_with: tier-zero-customer-side-sovereign-specialist.es.md |
| --- | --- |
| The **Tier 0 Customer-Side Sovereign Specialist** is the reference deployment model for the platform: the complete platform stack running on the customer's own hardware, with no required cloud dependency, no required internet connectivity, and a total disk footprint of approximately one gigabyte. | The **Tier 0 Customer-Side Sovereign Specialist** is the reference deployment model for the platform: the complete platform stack running on the customer's own hardware, with no required cloud dependency, no required internet connectivity, and a total disk footprint of approximately one gigabyte. |
| ## The reference unit | ## The reference unit |
| The reference Tier 0 deployment is a Totebox — a small-form-factor x86 or ARM appliance. The full stack occupies approximately one gigabyte of disk and a two-to-four gigabyte working memory set on two to four CPU cores. No GPU is required. | The reference Tier 0 deployment is a Totebox — a small-form-factor x86 or ARM appliance. The full stack occupies approximately one gigabyte of disk and a two-to-four gigabyte working memory set on two to four CPU cores. No GPU is required. |
| The stack includes the WORM file ledger (`service-fs`), the knowledge runtime (`service-content`), the Doorman boundary (`service-slm`), the local specialist model (OLMo 2 1B at roughly 600 MB on disk), the operator TUI (`slm-cli`), and the input, extraction, and egress services. All components are self-contained binaries with no runtime dependencies beyond the operating system. | The stack includes the WORM file ledger (`service-fs`), the knowledge runtime (`service-content`), the Doorman boundary (`service-slm`), the local specialist model (OLMo 2 1B at roughly 600 MB on disk), the operator TUI (`slm-cli`), and the input, extraction, and egress services. All components are self-contained binaries with no runtime dependencies beyond the operating system. |
| Hardware at this scale costs in the range of three hundred to fifteen hundred dollars depending on the customer's size and requirements. The intended monthly operating cost is zero — there is no subscription, no recurring cloud fee, and no per-seat charge. | Hardware at this scale costs in the range of three hundred to fifteen hundred dollars depending on the customer's size and requirements. The intended monthly operating cost is zero — there is no subscription, no recurring cloud fee, and no per-seat charge. |
| ## Why a specialist rather than a generalist | ## Why a specialist rather than a generalist |
| The local model on the Totebox is a purpose-routed sysadmin specialist. It handles system administration and IT-support questions, mechanical edits such as commit messages and schema validation, routine queries against the customer's audit ledger and knowledge graph, and short-output tasks. | The local model on the Totebox is a purpose-routed sysadmin specialist. It handles system administration and IT-support questions, mechanical edits such as commit messages and schema validation, routine queries against the customer's audit ledger and knowledge graph, and short-output tasks. |
| It is not intended for editorial work, bilingual generation, or long-form reasoning. Those tasks route to the optional GPU burst tier when available, or return a graceful "tier unavailable" response when not. The specialist's value is that it handles a large fraction of daily operational queries quickly and with zero marginal cost — questions that would otherwise consume expensive API calls or require a heavier model. | It is not intended for editorial work, bilingual generation, or long-form reasoning. Those tasks route to the optional GPU burst tier when available, or return a graceful "tier unavailable" response when not. The specialist's value is that it handles a large fraction of daily operational queries quickly and with zero marginal cost — questions that would otherwise consume expensive API calls or require a heavier model. |
| ## Empirical basis for CPU-only inference | ## Empirical basis for CPU-only inference |
| The Tier A claim rests on measured performance rather than theoretical capability: on a four-vCPU CPU-only deployment, the 1B parameter model at four-bit quantization produces approximately seven tokens per second, yielding end-to-end responses in the range of six seconds for typical 40-token answers. This is fast enough for human-conversational use. The operator types a question; the specialist responds in seconds. | The Tier A claim rests on measured performance rather than theoretical capability: on a four-vCPU CPU-only deployment, the 1B parameter model at four-bit quantization produces approximately seven tokens per second, yielding end-to-end responses in the range of six seconds for typical 40-token answers. This is fast enough for human-conversational use. The operator types a question; the specialist responds in seconds. |
| No GPU acquisition, no driver maintenance, and no thermal management are required. The hardware profile is the same class as any other internal appliance the customer already operates. | No GPU acquisition, no driver maintenance, and no thermal management are required. The hardware profile is the same class as any other internal appliance the customer already operates. |
| ## Sovereignty properties | ## Sovereignty properties |
| The Totebox operates without the platform's servers, without any continuing relationship with the model's original authors (existing files work indefinitely), without external API keys (Tier C is opt-in and off by default), without internet connectivity, and without any cloud subscription. The substrate works fully offline. | The Totebox operates without the platform's servers, without any continuing relationship with the model's original authors (existing files work indefinitely), without external API keys (Tier C is opt-in and off by default), without internet connectivity, and without any cloud subscription. The substrate works fully offline. |
| The [[substrate-without-inference-base-case]] convention extends this: even the AI tier itself is optional. The deterministic Ring 1 and Ring 2 services — the ledger, the knowledge graph, and the processing services — operate independently of the AI tier. The Totebox is the customer's property in the strongest sense. | The [[substrate-without-inference-base-case]] convention extends this: even the AI tier itself is optional. The deterministic Ring 1 and Ring 2 services — the ledger, the knowledge graph, and the processing services — operate independently of the AI tier. The Totebox is the customer's property in the strongest sense. |
| ## Hardware scale | ## Hardware scale |
| For a five-person business, a mini-PC class appliance is sufficient. For a thirty-person firm, a slightly larger appliance handles concurrent Ring 1, Ring 2, and AI tier operations. For a three-hundred-person firm or a regional hospital, a multi-unit cluster with an optional GPU box is intended. The platform's commercial focus is the first two scales; larger deployments are possible but not the primary market. | For a five-person business, a mini-PC class appliance is sufficient. For a thirty-person firm, a slightly larger appliance handles concurrent Ring 1, Ring 2, and AI tier operations. For a three-hundred-person firm or a regional hospital, a multi-unit cluster with an optional GPU box is intended. The platform's commercial focus is the first two scales; larger deployments are possible but not the primary market. |
| ## Optional tiers | ## Optional tiers |
| Tier B (GPU burst capacity) is opt-in per tenant. The customer chooses between arranged GPU cloud capacity or a customer-owned GPU box. Tier B routes through the customer's local Doorman, preserving audit and boundary discipline. It is used for tasks the local specialist cannot handle efficiently — editorial, bilingual, and long-form reasoning work. | Tier B (GPU burst capacity) is opt-in per tenant. The customer chooses between arranged GPU cloud capacity or a customer-owned GPU box. Tier B routes through the customer's local Doorman, preserving audit and boundary discipline. It is used for tasks the local specialist cannot handle efficiently — editorial, bilingual, and long-form reasoning work. |
| Tier C (external API) is opt-in per tenant and off by default. When configured, external API calls are limited to an explicit allowlist of purposes, are audit-logged at the customer's ledger rather than the vendor's, and are disclosed to the operator. Most customers are intended to operate without Tier C entirely. | Tier C (external API) is opt-in per tenant and off by default. When configured, external API calls are limited to an explicit allowlist of purposes, are audit-logged at the customer's ledger rather than the vendor's, and are disclosed to the operator. Most customers are intended to operate without Tier C entirely. |
| ## See also | ## See also |
| - [[substrate-without-inference-base-case]] — deterministic-only operation when all AI tiers are unavailable | - [[substrate-without-inference-base-case]] — deterministic-only operation when all AI tiers are unavailable |
| - [[single-boundary-compute-discipline]] — all inference, including the local specialist, routes through the Doorman | - [[single-boundary-compute-discipline]] — all inference, including the local specialist, routes through the Doorman |
| - [[seed-taxonomy-as-smb-bootstrap]] — the per-tenant taxonomy that the Tier 0 deployment boots with | - [[seed-taxonomy-as-smb-bootstrap]] — the per-tenant taxonomy that the Tier 0 deployment boots with |