Skip to content

Tier 0 customer-side sovereign specialist

Topic

From the PointSav Documentation

The Tier 0 Totebox is a sovereign specialist deployment running on the customer's own hardware with no required cloud dependency and a 1 GB total footprint.

Updated 2026-05-01 · HistoryEspañol

The Tier 0 Customer-Side Sovereign Specialist is the reference deployment model for the platform: the complete platform stack running on the customer's own hardware, with no required cloud dependency, no required internet connectivity, and a total disk footprint of approximately one gigabyte. It is the operational form of customer-hostability.

[edit]The reference unit

The reference Tier 0 deployment is a Totebox — a small-form-factor x86 or ARM appliance. The full stack occupies approximately one gigabyte of disk and a two-to-four gigabyte working memory set on two to four CPU cores. No GPU is required.

The stack includes the WORM file ledger (service-fs), the knowledge runtime (`service-content`), the Doorman boundary (service-slm), the local specialist model (OLMo 2 1B at roughly 600 MB on disk), the operator TUI (slm-cli), and the input, extraction, and egress services. All components are self-contained binaries with no runtime dependencies beyond the operating system.

Hardware at this scale costs in the range of three hundred to fifteen hundred dollars depending on the customer's size and requirements. The intended monthly operating cost is zero — there is no subscription, no recurring cloud fee, and no per-seat charge.

[edit]Why a specialist rather than a generalist

The local model on the Totebox is a purpose-routed sysadmin specialist. It handles system administration and IT-support questions, mechanical edits such as commit messages and schema validation, routine queries against the customer's audit ledger and knowledge graph, and short-output tasks.

It is not intended for editorial work, bilingual generation, or long-form reasoning. Those tasks route to the optional GPU burst tier when available, or return a graceful "tier unavailable" response when not. The specialist's value is that it handles a large fraction of daily operational queries quickly and with zero marginal cost — questions that would otherwise consume expensive API calls or require a heavier model.

[edit]Empirical basis for CPU-only inference

The Tier A claim rests on measured performance rather than theoretical capability: on a four-vCPU CPU-only deployment, the 1B parameter model at four-bit quantization produces approximately seven tokens per second, yielding end-to-end responses in the range of six seconds for typical 40-token answers. This is fast enough for human-conversational use. The operator types a question; the specialist responds in seconds.

No GPU acquisition, no driver maintenance, and no thermal management are required. The hardware profile is the same class as any other internal appliance the customer already operates.

[edit]Sovereignty properties

The Totebox operates without the platform's servers, without any continuing relationship with the model's original authors (existing files work indefinitely), without external API keys (Tier C is opt-in and off by default), without internet connectivity, and without any cloud subscription. The substrate works fully offline.

The substrate-without-inference-base-case convention extends this: even the AI tier itself is optional. The deterministic Ring 1 and Ring 2 services — the ledger, the knowledge graph, and the processing services — operate independently of the AI tier. The Totebox is the customer's property in the strongest sense.

[edit]Hardware scale

For a five-person business, a mini-PC class appliance is sufficient. For a thirty-person firm, a slightly larger appliance handles concurrent Ring 1, Ring 2, and AI tier operations. For a three-hundred-person firm or a regional hospital, a multi-unit cluster with an optional GPU box is intended. The platform's commercial focus is the first two scales; larger deployments are possible but not the primary market.

[edit]Optional tiers

Tier B (GPU burst capacity) is opt-in per tenant. The customer chooses between arranged GPU cloud capacity or a customer-owned GPU box. Tier B routes through the customer's local Doorman, preserving audit and boundary discipline. It is used for tasks the local specialist cannot handle efficiently — editorial, bilingual, and long-form reasoning work.

Tier C (external API) is opt-in per tenant and off by default. When configured, external API calls are limited to an explicit allowlist of purposes, are audit-logged at the customer's ledger rather than the vendor's, and are disclosed to the operator. Most customers are intended to operate without Tier C entirely.

[edit]See also

Edit this page · View source