Diff: substrate/four-tier-slm-substrate.es
From 84411fe to 84411fe
+0 / −0 lines
| Before | After |
|---|---|
| --- | --- |
| schema: foundry-doc-v1 | schema: foundry-doc-v1 |
| title: "The four-tier SLM substrate ladder" | title: "The four-tier SLM substrate ladder" |
| slug: four-tier-slm-substrate | slug: four-tier-slm-substrate |
| category: substrate | category: substrate |
| type: topic | type: topic |
| quality: complete | quality: complete |
| short_description: "A graduated sovereignty path for AI deployment: four customer tiers from a lightweight API gateway with no local model up through a domain-specialist AI service trained on the vendor's aggregated corpus, each tier adding capability without breaking the lower-tier guarantee." | short_description: "A graduated sovereignty path for AI deployment: four customer tiers from a lightweight API gateway with no local model up through a domain-specialist AI service trained on the vendor's aggregated corpus, each tier adding capability without breaking the lower-tier guarantee." |
| status: active | status: active |
| bcsc_class: public-disclosure-safe | bcsc_class: public-disclosure-safe |
| last_edited: 2026-05-15 | last_edited: 2026-05-15 |
| editor: pointsav-engineering | editor: pointsav-engineering |
| cites: [] | cites: [] |
| references: | references: |
| - id: 1 | - id: 1 |
| text: "Federated LoRA research. arXiv:2502.05087, 2025." | text: "Federated LoRA research. arXiv:2502.05087, 2025." |
| url: "https://arxiv.org/abs/2502.05087" | url: "https://arxiv.org/abs/2502.05087" |
| - id: 2 | - id: 2 |
| text: "AI2. 'OLMo 3.' Allen Institute for AI, 2025." | text: "AI2. 'OLMo 3.' Allen Institute for AI, 2025." |
| url: "https://allenai.org/blog/olmo3" | url: "https://allenai.org/blog/olmo3" |
| paired_with: four-tier-slm-substrate.es.md | paired_with: four-tier-slm-substrate.es.md |
| --- | --- |
| The PointSav platform structures AI deployment as a four-tier ladder. Customers start at the tier that matches their current hardware, budget, and sovereignty requirements. Each higher tier adds capability. Dropping back to a lower tier at any point does not break the substrate the customer already operates — it simply removes the premium capability. | The PointSav platform structures AI deployment as a four-tier ladder. Customers start at the tier that matches their current hardware, budget, and sovereignty requirements. Each higher tier adds capability. Dropping back to a lower tier at any point does not break the substrate the customer already operates — it simply removes the premium capability. |
| The ladder is the operational form of the platform's designed-for-breakout principle: every customer owns their substrate end-to-end at every tier, and the commercial value the platform adds is in the capabilities above the floor, not in lock-in to vendor infrastructure. | The ladder is the operational form of the platform's designed-for-breakout principle: every customer owns their substrate end-to-end at every tier, and the commercial value the platform adds is in the capabilities above the floor, not in lock-in to vendor infrastructure. |
| ## Tier 0 — API Gateway Abstraction | ## Tier 0 — API Gateway Abstraction |
| At Tier 0 the [[compounding-doorman|Doorman]] operates as a pure API gateway. No language model runs locally. The Doorman holds the customer's keys for whichever Tier C external service they have configured, routes requests through a per-purpose allowlist, and logs every call to the [[worm-ledger-architecture|local audit ledger]]. | At Tier 0 the [[compounding-doorman|Doorman]] operates as a pure API gateway. No language model runs locally. The Doorman holds the customer's keys for whichever Tier C external service they have configured, routes requests through a per-purpose allowlist, and logs every call to the [[worm-ledger-architecture|local audit ledger]]. |
| Tier 0 is available on any hardware that can run [[totebox-orchestration|ToteboxOS]]. It is appropriate for solo operators, community contributors, and customers evaluating the platform before committing to local hardware. The Ring 1 and Ring 2 services — all deterministic knowledge and processing — function fully without a language model at Ring 3. Intelligence is optional. | Tier 0 is available on any hardware that can run [[totebox-orchestration|ToteboxOS]]. It is appropriate for solo operators, community contributors, and customers evaluating the platform before committing to local hardware. The Ring 1 and Ring 2 services — all deterministic knowledge and processing — function fully without a language model at Ring 3. Intelligence is optional. |
| ## Tier 1 — Local Edge Inference | ## Tier 1 — Local Edge Inference |
| At Tier 1 the customer runs OLMo 3 7B Think locally. A consumer GPU with 8 GB of VRAM is sufficient; the quantised model works on CPU with 8 GB of RAM at reduced throughput. The Doorman routes most requests to the local model at Tier A; it routes heavier requests to Tier C external services when configured; and it routes requests to the vendor-hosted 32B model if the customer has subscribed to Tier 2. | At Tier 1 the customer runs OLMo 3 7B Think locally. A consumer GPU with 8 GB of VRAM is sufficient; the quantised model works on CPU with 8 GB of RAM at reduced throughput. The Doorman routes most requests to the local model at Tier A; it routes heavier requests to Tier C external services when configured; and it routes requests to the vendor-hosted 32B model if the customer has subscribed to Tier 2. |
| Tier 1 is the baseline for SMB Customer deployments. It provides offline-capable narrow AI participation and selective access to larger inference capacity, without giving up data locality for routine operations. | Tier 1 is the baseline for SMB Customer deployments. It provides offline-capable narrow AI participation and selective access to larger inference capacity, without giving up data locality for routine operations. |
| At Tier 1, the customer's per-tenant LoRA adapter training is available (see [[adapter-composition]]). A first adapter can be trained on a corpus of roughly 1,000 to 5,000 high-quality preference pairs from the customer's own operational history. That adapter lives on the customer's [[totebox-archive|ToteboxOS instance]] and does not leave it unless the customer explicitly opts into the [[sovereign-ai-commons|federated marketplace]]. [^1] | At Tier 1, the customer's per-tenant LoRA adapter training is available (see [[adapter-composition]]). A first adapter can be trained on a corpus of roughly 1,000 to 5,000 high-quality preference pairs from the customer's own operational history. That adapter lives on the customer's [[totebox-archive|ToteboxOS instance]] and does not leave it unless the customer explicitly opts into the [[sovereign-ai-commons|federated marketplace]]. [^1] |
| ## Tier 2 — Vendor-Hosted Burst Compute | ## Tier 2 — Vendor-Hosted Burst Compute |
| At Tier 2 the vendor operates a 32B model on a GPU burst instance with idle-shutdown discipline. From the customer's perspective, this is a Tier C service accessed by their local Doorman — a hosted endpoint that handles requests the local 7B model is not well-suited to. From the vendor's perspective, it is the same infrastructure the vendor uses for its own heavyweight inference tasks, made available to customers as a subscription service. | At Tier 2 the vendor operates a 32B model on a GPU burst instance with idle-shutdown discipline. From the customer's perspective, this is a Tier C service accessed by their local Doorman — a hosted endpoint that handles requests the local 7B model is not well-suited to. From the vendor's perspective, it is the same infrastructure the vendor uses for its own heavyweight inference tasks, made available to customers as a subscription service. |
| The Tier 2 model is OLMo 3.1 32B Think. At runtime, a [[adapter-composition|composition of adapters]] is applied per request: a constitutional adapter reflecting current platform doctrine, an engineering adapter trained on accumulated platform corpus, and tenant-specific adapters where applicable. No external API keys are held by the inference engine itself; key custody remains exclusively at the Doorman boundary. | The Tier 2 model is OLMo 3.1 32B Think. At runtime, a [[adapter-composition|composition of adapters]] is applied per request: a constitutional adapter reflecting current platform doctrine, an engineering adapter trained on accumulated platform corpus, and tenant-specific adapters where applicable. No external API keys are held by the inference engine itself; key custody remains exclusively at the Doorman boundary. |
| The pricing structure is intended to sit structurally below that of fully-managed AI platforms requiring annual contracts in the six-figure range. This is a planned commercial position, not yet published rate-card pricing. | The pricing structure is intended to sit structurally below that of fully-managed AI platforms requiring annual contracts in the six-figure range. This is a planned commercial position, not yet published rate-card pricing. |
| ## Tier 3 — Domain-Specialist Inference | ## Tier 3 — Domain-Specialist Inference |
| Tier 3 is a planned standalone AI service trained via continued pretraining on the vendor's aggregated multi-tenant corpus. It is not a LoRA adapter applied to a base model — it is a new base model, produced by following the published AI2 continued-pretraining recipe: 100 billion tokens of midtraining, long-context extension, and post-training alignment. [^2] | Tier 3 is a planned standalone AI service trained via continued pretraining on the vendor's aggregated multi-tenant corpus. It is not a LoRA adapter applied to a base model — it is a new base model, produced by following the published AI2 continued-pretraining recipe: 100 billion tokens of midtraining, long-context extension, and post-training alignment. [^2] |
| The intended result, at the planned scale of 32B parameters, is a model with deep operational familiarity with the PointSav platform: [[totebox-archive|archive]] deployment and configuration, standard editorial patterns, code generation aligned to platform conventions, and the mechanics of [[sovereign-ai-commons|federated contribution]]. It would operate as a multi-tenant API service, accessible from customer [[compounding-doorman|Doorman]] instances via a per-customer authentication token with the same shape as any Tier C external key. | The intended result, at the planned scale of 32B parameters, is a model with deep operational familiarity with the PointSav platform: [[totebox-archive|archive]] deployment and configuration, standard editorial patterns, code generation aligned to platform conventions, and the mechanics of [[sovereign-ai-commons|federated contribution]]. It would operate as a multi-tenant API service, accessible from customer [[compounding-doorman|Doorman]] instances via a per-customer authentication token with the same shape as any Tier C external key. |
| Tier 3 incorporates a structured escalation path: queries the model cannot handle with adequate confidence are flagged for human review. Human responses to flagged queries are captured as training signal that feeds the next continued-pretraining cycle, closing the loop between customer support and model improvement (see [[apprenticeship-substrate]]). | Tier 3 incorporates a structured escalation path: queries the model cannot handle with adequate confidence are flagged for human review. Human responses to flagged queries are captured as training signal that feeds the next continued-pretraining cycle, closing the loop between customer support and model improvement (see [[apprenticeship-substrate]]). |
| The first PointSav-OLMo-N continued-pretraining run is planned to begin in 2027, subject to corpus accumulation targets and operational readiness. The timeline carries material uncertainty. | The first PointSav-OLMo-N continued-pretraining run is planned to begin in 2027, subject to corpus accumulation targets and operational readiness. The timeline carries material uncertainty. |
| ## Cryptographic Boundary Custody | ## Cryptographic Boundary Custody |
| A single rule applies across all tiers: API keys are held only at the [[compounding-doorman|Doorman]] boundary. No downstream service, no inference engine, and no Ring 2 process holds a vendor key. The Doorman is the single point of key custody, the single point of call auditing, and the single point of per-purpose allowlist enforcement. This matches the consensus pattern for production AI gateway deployments in 2026. | A single rule applies across all tiers: API keys are held only at the [[compounding-doorman|Doorman]] boundary. No downstream service, no inference engine, and no Ring 2 process holds a vendor key. The Doorman is the single point of key custody, the single point of call auditing, and the single point of per-purpose allowlist enforcement. This matches the consensus pattern for production AI gateway deployments in 2026. |
| ## Non-Destructive Tier Transitions | ## Non-Destructive Tier Transitions |
| Tier graduation is additive. Moving from Tier 0 to Tier 1 adds local hardware and the first LoRA training cycle. Moving from Tier 1 to Tier 2 adds the burst subscription. Moving from Tier 2 to Tier 3 adds the specialist service subscription. At each graduation, the capabilities of the lower tier remain fully functional. | Tier graduation is additive. Moving from Tier 0 to Tier 1 adds local hardware and the first LoRA training cycle. Moving from Tier 1 to Tier 2 adds the burst subscription. Moving from Tier 2 to Tier 3 adds the specialist service subscription. At each graduation, the capabilities of the lower tier remain fully functional. |
| Downgrading is equally clean. A customer who drops the Tier 3 subscription retains their local Tier 1 substrate. Their LoRA adapters, their audit ledger, and their [[knowledge-graph-grounded-apprenticeship|knowledge graph]] remain on their own hardware. This is the structural guarantee that the commercial relationship is about capability, not captivity (see [[customer-hostability]]). | Downgrading is equally clean. A customer who drops the Tier 3 subscription retains their local Tier 1 substrate. Their LoRA adapters, their audit ledger, and their [[knowledge-graph-grounded-apprenticeship|knowledge graph]] remain on their own hardware. This is the structural guarantee that the commercial relationship is about capability, not captivity (see [[customer-hostability]]). |
| ## Unclaimed Market Positions | ## Unclaimed Market Positions |
| The [[sovereign-ai-commons|federated LoRA marketplace]] — where customers contribute privacy-preserved adapter signal to a commons that improves the base for all participants — has no shipping commercial analogue in 2026. All technical components (privacy-preserving federated learning frameworks, differential privacy primitives, adapter-only exchange protocols) are mature. The marketplace with payment rails is an unclaimed position. | The [[sovereign-ai-commons|federated LoRA marketplace]] — where customers contribute privacy-preserved adapter signal to a commons that improves the base for all participants — has no shipping commercial analogue in 2026. All technical components (privacy-preserving federated learning frameworks, differential privacy primitives, adapter-only exchange protocols) are mature. The marketplace with payment rails is an unclaimed position. |
| The open-substrate customer-service specialist — a domain-expert AI accessible at per-token pricing within reach of SMB contract values, built on a fully open model base — is also unclaimed in 2026. Managed AI services for the customer-service vertical operate at price floors that structurally exclude PointSav's target market. Tier 3 as described is intended to occupy this gap, pending the continued-pretraining timeline above. | The open-substrate customer-service specialist — a domain-expert AI accessible at per-token pricing within reach of SMB contract values, built on a fully open model base — is also unclaimed in 2026. Managed AI services for the customer-service vertical operate at price floors that structurally exclude PointSav's target market. Tier 3 as described is intended to occupy this gap, pending the continued-pretraining timeline above. |
| ## See also | ## See also |
| - [[compounding-doorman]] — the Doorman boundary that enforces the key custody rule across all tiers | - [[compounding-doorman]] — the Doorman boundary that enforces the key custody rule across all tiers |
| - [[llm-substrate-decision]] — why OLMo 3 is the base model across all tiers | - [[llm-substrate-decision]] — why OLMo 3 is the base model across all tiers |
| - [[apprenticeship-substrate]] — the training loop that makes higher tiers compound over time | - [[apprenticeship-substrate]] — the training loop that makes higher tiers compound over time |
| - [[economic-model]] — how the four tiers map to Community and SMB Customer commercial tiers | - [[economic-model]] — how the four tiers map to Community and SMB Customer commercial tiers |