Skip to content

Diff: substrate/four-tier-slm-substrate

From 6546238 to 6546238

+0 / −0 lines
BeforeAfter
--- ---
schema: foundry-doc-v1 schema: foundry-doc-v1
title: "The four-tier SLM substrate ladder" title: "The four-tier SLM substrate ladder"
slug: four-tier-slm-substrate slug: four-tier-slm-substrate
category: substrate category: substrate
type: topic type: topic
quality: complete quality: complete
short_description: "A graduated sovereignty path for AI deployment: four customer tiers from a lightweight API gateway with no local model up through a domain-specialist AI service trained on the vendor's aggregated corpus, each tier adding capability without breaking the lower-tier guarantee." short_description: "A graduated sovereignty path for AI deployment: four customer tiers from a lightweight API gateway with no local model up through a domain-specialist AI service trained on the vendor's aggregated corpus, each tier adding capability without breaking the lower-tier guarantee."
status: active status: active
bcsc_class: public-disclosure-safe bcsc_class: public-disclosure-safe
last_edited: 2026-05-15 last_edited: 2026-05-15
editor: pointsav-engineering editor: pointsav-engineering
cites: [] cites: []
references: references:
- id: 1 - id: 1
text: "Federated LoRA research. arXiv:2502.05087, 2025." text: "Federated LoRA research. arXiv:2502.05087, 2025."
url: "https://arxiv.org/abs/2502.05087" url: "https://arxiv.org/abs/2502.05087"
- id: 2 - id: 2
text: "AI2. 'OLMo 3.' Allen Institute for AI, 2025." text: "AI2. 'OLMo 3.' Allen Institute for AI, 2025."
url: "https://allenai.org/blog/olmo3" url: "https://allenai.org/blog/olmo3"
paired_with: four-tier-slm-substrate.es.md paired_with: four-tier-slm-substrate.es.md
--- ---
The PointSav platform structures AI deployment as a four-tier ladder. Customers start at the tier that matches their current hardware, budget, and sovereignty requirements. Each higher tier adds capability. Dropping back to a lower tier at any point does not break the substrate the customer already operates — it simply removes the premium capability. The PointSav platform structures AI deployment as a four-tier ladder. Customers start at the tier that matches their current hardware, budget, and sovereignty requirements. Each higher tier adds capability. Dropping back to a lower tier at any point does not break the substrate the customer already operates — it simply removes the premium capability.
The ladder is the operational form of the platform's designed-for-breakout principle: every customer owns their substrate end-to-end at every tier, and the commercial value the platform adds is in the capabilities above the floor, not in lock-in to vendor infrastructure. The ladder is the operational form of the platform's designed-for-breakout principle: every customer owns their substrate end-to-end at every tier, and the commercial value the platform adds is in the capabilities above the floor, not in lock-in to vendor infrastructure.
## Tier 0 — API Gateway Abstraction ## Tier 0 — API Gateway Abstraction
At Tier 0 the [[compounding-doorman|Doorman]] operates as a pure API gateway. No language model runs locally. The Doorman holds the customer's keys for whichever Tier C external service they have configured, routes requests through a per-purpose allowlist, and logs every call to the [[worm-ledger-architecture|local audit ledger]]. At Tier 0 the [[compounding-doorman|Doorman]] operates as a pure API gateway. No language model runs locally. The Doorman holds the customer's keys for whichever Tier C external service they have configured, routes requests through a per-purpose allowlist, and logs every call to the [[worm-ledger-architecture|local audit ledger]].
Tier 0 is available on any hardware that can run [[totebox-orchestration|ToteboxOS]]. It is appropriate for solo operators, community contributors, and customers evaluating the platform before committing to local hardware. The Ring 1 and Ring 2 services — all deterministic knowledge and processing — function fully without a language model at Ring 3. Intelligence is optional. Tier 0 is available on any hardware that can run [[totebox-orchestration|ToteboxOS]]. It is appropriate for solo operators, community contributors, and customers evaluating the platform before committing to local hardware. The Ring 1 and Ring 2 services — all deterministic knowledge and processing — function fully without a language model at Ring 3. Intelligence is optional.
## Tier 1 — Local Edge Inference ## Tier 1 — Local Edge Inference
At Tier 1 the customer runs OLMo 3 7B Think locally. A consumer GPU with 8 GB of VRAM is sufficient; the quantised model works on CPU with 8 GB of RAM at reduced throughput. The Doorman routes most requests to the local model at Tier A; it routes heavier requests to Tier C external services when configured; and it routes requests to the vendor-hosted 32B model if the customer has subscribed to Tier 2. At Tier 1 the customer runs OLMo 3 7B Think locally. A consumer GPU with 8 GB of VRAM is sufficient; the quantised model works on CPU with 8 GB of RAM at reduced throughput. The Doorman routes most requests to the local model at Tier A; it routes heavier requests to Tier C external services when configured; and it routes requests to the vendor-hosted 32B model if the customer has subscribed to Tier 2.
Tier 1 is the baseline for SMB Customer deployments. It provides offline-capable narrow AI participation and selective access to larger inference capacity, without giving up data locality for routine operations. Tier 1 is the baseline for SMB Customer deployments. It provides offline-capable narrow AI participation and selective access to larger inference capacity, without giving up data locality for routine operations.
At Tier 1, the customer's per-tenant LoRA adapter training is available (see [[adapter-composition]]). A first adapter can be trained on a corpus of roughly 1,000 to 5,000 high-quality preference pairs from the customer's own operational history. That adapter lives on the customer's [[totebox-archive|ToteboxOS instance]] and does not leave it unless the customer explicitly opts into the [[sovereign-ai-commons|federated marketplace]]. [^1] At Tier 1, the customer's per-tenant LoRA adapter training is available (see [[adapter-composition]]). A first adapter can be trained on a corpus of roughly 1,000 to 5,000 high-quality preference pairs from the customer's own operational history. That adapter lives on the customer's [[totebox-archive|ToteboxOS instance]] and does not leave it unless the customer explicitly opts into the [[sovereign-ai-commons|federated marketplace]]. [^1]
## Tier 2 — Vendor-Hosted Burst Compute ## Tier 2 — Vendor-Hosted Burst Compute
At Tier 2 the vendor operates a 32B model on a GPU burst instance with idle-shutdown discipline. From the customer's perspective, this is a Tier C service accessed by their local Doorman — a hosted endpoint that handles requests the local 7B model is not well-suited to. From the vendor's perspective, it is the same infrastructure the vendor uses for its own heavyweight inference tasks, made available to customers as a subscription service. At Tier 2 the vendor operates a 32B model on a GPU burst instance with idle-shutdown discipline. From the customer's perspective, this is a Tier C service accessed by their local Doorman — a hosted endpoint that handles requests the local 7B model is not well-suited to. From the vendor's perspective, it is the same infrastructure the vendor uses for its own heavyweight inference tasks, made available to customers as a subscription service.
The Tier 2 model is OLMo 3.1 32B Think. At runtime, a [[adapter-composition|composition of adapters]] is applied per request: a constitutional adapter reflecting current platform doctrine, an engineering adapter trained on accumulated platform corpus, and tenant-specific adapters where applicable. No external API keys are held by the inference engine itself; key custody remains exclusively at the Doorman boundary. The Tier 2 model is OLMo 3.1 32B Think. At runtime, a [[adapter-composition|composition of adapters]] is applied per request: a constitutional adapter reflecting current platform doctrine, an engineering adapter trained on accumulated platform corpus, and tenant-specific adapters where applicable. No external API keys are held by the inference engine itself; key custody remains exclusively at the Doorman boundary.
The pricing structure is intended to sit structurally below that of fully-managed AI platforms requiring annual contracts in the six-figure range. This is a planned commercial position, not yet published rate-card pricing. The pricing structure is intended to sit structurally below that of fully-managed AI platforms requiring annual contracts in the six-figure range. This is a planned commercial position, not yet published rate-card pricing.
## Tier 3 — Domain-Specialist Inference ## Tier 3 — Domain-Specialist Inference
Tier 3 is a planned standalone AI service trained via continued pretraining on the vendor's aggregated multi-tenant corpus. It is not a LoRA adapter applied to a base model — it is a new base model, produced by following the published AI2 continued-pretraining recipe: 100 billion tokens of midtraining, long-context extension, and post-training alignment. [^2] Tier 3 is a planned standalone AI service trained via continued pretraining on the vendor's aggregated multi-tenant corpus. It is not a LoRA adapter applied to a base model — it is a new base model, produced by following the published AI2 continued-pretraining recipe: 100 billion tokens of midtraining, long-context extension, and post-training alignment. [^2]
The intended result, at the planned scale of 32B parameters, is a model with deep operational familiarity with the PointSav platform: [[totebox-archive|archive]] deployment and configuration, standard editorial patterns, code generation aligned to platform conventions, and the mechanics of [[sovereign-ai-commons|federated contribution]]. It would operate as a multi-tenant API service, accessible from customer [[compounding-doorman|Doorman]] instances via a per-customer authentication token with the same shape as any Tier C external key. The intended result, at the planned scale of 32B parameters, is a model with deep operational familiarity with the PointSav platform: [[totebox-archive|archive]] deployment and configuration, standard editorial patterns, code generation aligned to platform conventions, and the mechanics of [[sovereign-ai-commons|federated contribution]]. It would operate as a multi-tenant API service, accessible from customer [[compounding-doorman|Doorman]] instances via a per-customer authentication token with the same shape as any Tier C external key.
Tier 3 incorporates a structured escalation path: queries the model cannot handle with adequate confidence are flagged for human review. Human responses to flagged queries are captured as training signal that feeds the next continued-pretraining cycle, closing the loop between customer support and model improvement (see [[apprenticeship-substrate]]). Tier 3 incorporates a structured escalation path: queries the model cannot handle with adequate confidence are flagged for human review. Human responses to flagged queries are captured as training signal that feeds the next continued-pretraining cycle, closing the loop between customer support and model improvement (see [[apprenticeship-substrate]]).
The first PointSav-OLMo-N continued-pretraining run is planned to begin in 2027, subject to corpus accumulation targets and operational readiness. The timeline carries material uncertainty. The first PointSav-OLMo-N continued-pretraining run is planned to begin in 2027, subject to corpus accumulation targets and operational readiness. The timeline carries material uncertainty.
## Cryptographic Boundary Custody ## Cryptographic Boundary Custody
A single rule applies across all tiers: API keys are held only at the [[compounding-doorman|Doorman]] boundary. No downstream service, no inference engine, and no Ring 2 process holds a vendor key. The Doorman is the single point of key custody, the single point of call auditing, and the single point of per-purpose allowlist enforcement. This matches the consensus pattern for production AI gateway deployments in 2026. A single rule applies across all tiers: API keys are held only at the [[compounding-doorman|Doorman]] boundary. No downstream service, no inference engine, and no Ring 2 process holds a vendor key. The Doorman is the single point of key custody, the single point of call auditing, and the single point of per-purpose allowlist enforcement. This matches the consensus pattern for production AI gateway deployments in 2026.
## Non-Destructive Tier Transitions ## Non-Destructive Tier Transitions
Tier graduation is additive. Moving from Tier 0 to Tier 1 adds local hardware and the first LoRA training cycle. Moving from Tier 1 to Tier 2 adds the burst subscription. Moving from Tier 2 to Tier 3 adds the specialist service subscription. At each graduation, the capabilities of the lower tier remain fully functional. Tier graduation is additive. Moving from Tier 0 to Tier 1 adds local hardware and the first LoRA training cycle. Moving from Tier 1 to Tier 2 adds the burst subscription. Moving from Tier 2 to Tier 3 adds the specialist service subscription. At each graduation, the capabilities of the lower tier remain fully functional.
Downgrading is equally clean. A customer who drops the Tier 3 subscription retains their local Tier 1 substrate. Their LoRA adapters, their audit ledger, and their [[knowledge-graph-grounded-apprenticeship|knowledge graph]] remain on their own hardware. This is the structural guarantee that the commercial relationship is about capability, not captivity (see [[customer-hostability]]). Downgrading is equally clean. A customer who drops the Tier 3 subscription retains their local Tier 1 substrate. Their LoRA adapters, their audit ledger, and their [[knowledge-graph-grounded-apprenticeship|knowledge graph]] remain on their own hardware. This is the structural guarantee that the commercial relationship is about capability, not captivity (see [[customer-hostability]]).
## Unclaimed Market Positions ## Unclaimed Market Positions
The [[sovereign-ai-commons|federated LoRA marketplace]] — where customers contribute privacy-preserved adapter signal to a commons that improves the base for all participants — has no shipping commercial analogue in 2026. All technical components (privacy-preserving federated learning frameworks, differential privacy primitives, adapter-only exchange protocols) are mature. The marketplace with payment rails is an unclaimed position. The [[sovereign-ai-commons|federated LoRA marketplace]] — where customers contribute privacy-preserved adapter signal to a commons that improves the base for all participants — has no shipping commercial analogue in 2026. All technical components (privacy-preserving federated learning frameworks, differential privacy primitives, adapter-only exchange protocols) are mature. The marketplace with payment rails is an unclaimed position.
The open-substrate customer-service specialist — a domain-expert AI accessible at per-token pricing within reach of SMB contract values, built on a fully open model base — is also unclaimed in 2026. Managed AI services for the customer-service vertical operate at price floors that structurally exclude PointSav's target market. Tier 3 as described is intended to occupy this gap, pending the continued-pretraining timeline above. The open-substrate customer-service specialist — a domain-expert AI accessible at per-token pricing within reach of SMB contract values, built on a fully open model base — is also unclaimed in 2026. Managed AI services for the customer-service vertical operate at price floors that structurally exclude PointSav's target market. Tier 3 as described is intended to occupy this gap, pending the continued-pretraining timeline above.
## See also ## See also
- [[compounding-doorman]] — the Doorman boundary that enforces the key custody rule across all tiers - [[compounding-doorman]] — the Doorman boundary that enforces the key custody rule across all tiers
- [[llm-substrate-decision]] — why OLMo 3 is the base model across all tiers - [[llm-substrate-decision]] — why OLMo 3 is the base model across all tiers
- [[apprenticeship-substrate]] — the training loop that makes higher tiers compound over time - [[apprenticeship-substrate]] — the training loop that makes higher tiers compound over time
- [[economic-model]] — how the four tiers map to Community and SMB Customer commercial tiers - [[economic-model]] — how the four tiers map to Community and SMB Customer commercial tiers