Diff: substrate/llm-substrate-decision
From 2169822 to 2169822
+0 / −0 lines
| Before | After |
|---|---|
| --- | --- |
| schema: foundry-doc-v1 | schema: foundry-doc-v1 |
| title: "LLM substrate decision — OLMo 3 family" | title: "LLM substrate decision — OLMo 3 family" |
| slug: llm-substrate-decision | slug: llm-substrate-decision |
| category: substrate | category: substrate |
| type: topic | type: topic |
| quality: complete | quality: complete |
| short_description: "The rationale for selecting OLMo 3 as the local and GPU-burst language model substrate: the only fully open model family — training data, training code, and checkpoints included — that permits continued pretraining and satisfies a Canadian public-company procurement posture." | short_description: "The rationale for selecting OLMo 3 as the local and GPU-burst language model substrate: the only fully open model family — training data, training code, and checkpoints included — that permits continued pretraining and satisfies a Canadian public-company procurement posture." |
| status: active | status: active |
| bcsc_class: public-disclosure-safe | bcsc_class: public-disclosure-safe |
| last_edited: 2026-05-15 | last_edited: 2026-05-15 |
| editor: pointsav-engineering | editor: pointsav-engineering |
| cites: [] | cites: [] |
| references: | references: |
| - id: 1 | - id: 1 |
| text: "AI2. 'OLMo 3.' Allen Institute for AI, 2025." | text: "AI2. 'OLMo 3.' Allen Institute for AI, 2025." |
| url: "https://allenai.org/blog/olmo3" | url: "https://allenai.org/blog/olmo3" |
| - id: 2 | - id: 2 |
| text: "AI2. 'Dolma 3 Dataset.' Open Data Commons License. Allen Institute for AI, 2024." | text: "AI2. 'Dolma 3 Dataset.' Open Data Commons License. Allen Institute for AI, 2024." |
| url: "https://huggingface.co/datasets/allenai/dolma" | url: "https://huggingface.co/datasets/allenai/dolma" |
| paired_with: llm-substrate-decision.es.md | paired_with: llm-substrate-decision.es.md |
| --- | --- |
| The PointSav platform uses the OLMo 3 model family as its language model substrate. OLMo 3 7B runs locally on [[customer-hostability|customer hardware]]. OLMo 3.1 32B Think runs on a short-lived [[yoyo-compute-substrate|GPU burst instance]] for heavier inference tasks. The selection is not primarily about benchmark performance — it is about ownership depth. | The PointSav platform uses the OLMo 3 model family as its language model substrate. OLMo 3 7B runs locally on [[customer-hostability|customer hardware]]. OLMo 3.1 32B Think runs on a short-lived [[yoyo-compute-substrate|GPU burst instance]] for heavier inference tasks. The selection is not primarily about benchmark performance — it is about ownership depth. |
| ## Three levels of openness | ## Three levels of openness |
| The language model market in 2026 offers three distinct depths of openness, and the distinction matters for any organisation planning a five-year compounding trajectory. | The language model market in 2026 offers three distinct depths of openness, and the distinction matters for any organisation planning a five-year compounding trajectory. |
| **Level 1 — Open weights.** The model's trained parameter files are published. You can run inference and perform LoRA fine-tuning. You cannot inspect the training data, reproduce the training run, or continue pretraining the model from a known checkpoint. Most widely-discussed open models operate at this level. | **Level 1 — Open weights.** The model's trained parameter files are published. You can run inference and perform LoRA fine-tuning. You cannot inspect the training data, reproduce the training run, or continue pretraining the model from a known checkpoint. Most widely-discussed open models operate at this level. |
| **Level 2 — Open weights with a permissive license.** Same as Level 1, with a license clean enough to include in a commercial product. This is the level that makes a model usable in a shipped product without legal exposure. | **Level 2 — Open weights with a permissive license.** Same as Level 1, with a license clean enough to include in a commercial product. This is the level that makes a model usable in a shipped product without legal exposure. |
| **Level 3 — Fully open model.** Training data, training code, and checkpoints at every stage of training are published, alongside the weights and a permissive license. At this level you can continue pretraining — starting from a known checkpoint, on your own corpus, to produce a derivative model that your organisation subsequently owns. | **Level 3 — Fully open model.** Training data, training code, and checkpoints at every stage of training are published, alongside the weights and a permissive license. At this level you can continue pretraining — starting from a known checkpoint, on your own corpus, to produce a derivative model that your organisation subsequently owns. |
| OLMo 3 is the only model in the 2026 non-Chinese open model landscape that operates at Level 3. [^1] Its training data is the Dolma 3 corpus (9.3 trillion tokens, published under the Open Data Commons license). [^2] Its training code is published under Apache 2.0. Checkpoints at intermediate stages are available for download. | OLMo 3 is the only model in the 2026 non-Chinese open model landscape that operates at Level 3. [^1] Its training data is the Dolma 3 corpus (9.3 trillion tokens, published under the Open Data Commons license). [^2] Its training code is published under Apache 2.0. Checkpoints at intermediate stages are available for download. |
| For a platform designed to compound over a five-year horizon — where the intended outcome is a [[customer-hostability|customer-owned]], specialised base model trained on accumulated customer data — Level 3 is the only depth that makes that outcome possible. Level 1 and Level 2 produce fine-tuning capability but not ownership of the base. | For a platform designed to compound over a five-year horizon — where the intended outcome is a [[customer-hostability|customer-owned]], specialised base model trained on accumulated customer data — Level 3 is the only depth that makes that outcome possible. Level 1 and Level 2 produce fine-tuning capability but not ownership of the base. |
| ## Why each alternative was set aside | ## Why each alternative was set aside |
| Models considered and rejected: | Models considered and rejected: |
| **Phi-4 (Microsoft).** MIT license on the weights; training data closed and Microsoft-proprietary. Fine-tuning is possible; continued pretraining is not, because the training data necessary to reproduce or extend the base is not published. | **Phi-4 (Microsoft).** MIT license on the weights; training data closed and Microsoft-proprietary. Fine-tuning is possible; continued pretraining is not, because the training data necessary to reproduce or extend the base is not published. |
| **Granite 4 (IBM).** Apache 2.0 license on the weights; training data is a closed IBM-proprietary corpus. Same constraint as Phi-4. | **Granite 4 (IBM).** Apache 2.0 license on the weights; training data is a closed IBM-proprietary corpus. Same constraint as Phi-4. |
| **Llama 4 (Meta).** The Llama Community License is not OSI-approved and includes restrictions on commercial use above a user threshold. The license constraint disqualifies it independently of the training data question. | **Llama 4 (Meta).** The Llama Community License is not OSI-approved and includes restrictions on commercial use above a user threshold. The license constraint disqualifies it independently of the training data question. |
| **Mistral 7B.** Apache 2.0 license; training data partially documented but no full open release. Older architecture; weaker on code than 2026 alternatives. | **Mistral 7B.** Apache 2.0 license; training data partially documented but no full open release. Older architecture; weaker on code than 2026 alternatives. |
| **Chinese-origin models (Qwen, DeepSeek, and related).** Technically capable and in several cases released under MIT or Apache 2.0. The procurement constraint for a Canadian public company is the disqualifying factor: US National Defense Authorization Act precedent from fiscal year 2026 creates regulatory exposure for Canadian public companies incorporating Chinese-origin model infrastructure, regardless of license terms. | **Chinese-origin models (Qwen, DeepSeek, and related).** Technically capable and in several cases released under MIT or Apache 2.0. The procurement constraint for a Canadian public company is the disqualifying factor: US National Defense Authorization Act precedent from fiscal year 2026 creates regulatory exposure for Canadian public companies incorporating Chinese-origin model infrastructure, regardless of license terms. |
| ## Capability | ## Capability |
| OLMo 3 32B Think, in the updated OLMo 3.1 December 2025 release, reaches 91.4% on HumanEvalPlus — the measure of practical code generation accuracy — and sits within two percentage points of the leading open-weight model on the standard mathematical reasoning and instruction-following benchmarks. The 7B variant is strong on programming, reading comprehension, and mathematics with a 65,000-token context window. | OLMo 3 32B Think, in the updated OLMo 3.1 December 2025 release, reaches 91.4% on HumanEvalPlus — the measure of practical code generation accuracy — and sits within two percentage points of the leading open-weight model on the standard mathematical reasoning and instruction-following benchmarks. The 7B variant is strong on programming, reading comprehension, and mathematics with a 65,000-token context window. |
| These numbers do not place OLMo 3 at the absolute frontier of open-weight performance. They do place it firmly in the range where the [[compounding-doorman|Doorman]] — the service that mediates all AI inference calls in the PointSav architecture — produces useful results on the daily tasks it handles. The 7B local variant handles routine work; the 32B burst variant handles tasks requiring extended reasoning. The same vocabulary, tokenizer, and prompt format apply to both, which means the [[adapter-composition|adapter library]] trained on one is compatible with the other. | These numbers do not place OLMo 3 at the absolute frontier of open-weight performance. They do place it firmly in the range where the [[compounding-doorman|Doorman]] — the service that mediates all AI inference calls in the PointSav architecture — produces useful results on the daily tasks it handles. The 7B local variant handles routine work; the 32B burst variant handles tasks requiring extended reasoning. The same vocabulary, tokenizer, and prompt format apply to both, which means the [[adapter-composition|adapter library]] trained on one is compatible with the other. |
| ## The three compute tiers | ## The three compute tiers |
| The Doorman routes requests among three tiers: | The Doorman routes requests among three tiers: |
| **Tier A — local.** OLMo 3 7B running on the customer's own hardware. Approximately zero marginal cost once the hardware is in place. Default for most operations. | **Tier A — local.** OLMo 3 7B running on the customer's own hardware. Approximately zero marginal cost once the hardware is in place. Default for most operations. |
| **Tier B — GPU burst.** OLMo 3.1 32B Think on a short-lived GPU instance. Approximately $0.84 per hour at list pricing on major cloud providers, significantly less on spot/preemptible instances. Used for requests the local tier cannot handle efficiently. Idle-shutdown discipline means the instance runs only when a request requires it. | **Tier B — GPU burst.** OLMo 3.1 32B Think on a short-lived GPU instance. Approximately $0.84 per hour at list pricing on major cloud providers, significantly less on spot/preemptible instances. Used for requests the local tier cannot handle efficiently. Idle-shutdown discipline means the instance runs only when a request requires it. |
| **Tier C — external API.** Third-party language model services via a per-request allowlist. Used only for narrow precision tasks — [[citation-substrate|citation grounding]], initial [[knowledge-graph-grounded-apprenticeship|knowledge graph]] construction, entity disambiguation — where the precision requirement justifies the cost. Every Tier C call is logged at the customer's [[worm-ledger-architecture|audit ledger]]. | **Tier C — external API.** Third-party language model services via a per-request allowlist. Used only for narrow precision tasks — [[citation-substrate|citation grounding]], initial [[knowledge-graph-grounded-apprenticeship|knowledge graph]] construction, entity disambiguation — where the precision requirement justifies the cost. Every Tier C call is logged at the customer's [[worm-ledger-architecture|audit ledger]]. |
| The customer's routing configuration determines which tier handles which request. No per-request manual selection is required. | The customer's routing configuration determines which tier handles which request. No per-request manual selection is required. |
| ## The intended continued-pretraining path | ## The intended continued-pretraining path |
| The platform's intended multi-year trajectory, as currently planned, is to move from using OLMo 3 as a base through a process of continued pretraining that produces PointSav-OLMo-N — a derivative model trained on accumulated [[trajectory-substrate|platform corpus data]], customer [[adapter-composition|LoRA adapter]] distillation, and curated public material. Year two onwards is the intended start window for the first continued-pretraining run, targeting the 7B variant at an estimated cost of $30,000 to $100,000 on cloud GPU infrastructure. This trajectory is planned but not yet initiated; the current platform operates on the published OLMo 3 base. | The platform's intended multi-year trajectory, as currently planned, is to move from using OLMo 3 as a base through a process of continued pretraining that produces PointSav-OLMo-N — a derivative model trained on accumulated [[trajectory-substrate|platform corpus data]], customer [[adapter-composition|LoRA adapter]] distillation, and curated public material. Year two onwards is the intended start window for the first continued-pretraining run, targeting the 7B variant at an estimated cost of $30,000 to $100,000 on cloud GPU infrastructure. This trajectory is planned but not yet initiated; the current platform operates on the published OLMo 3 base. |
| The material assumption underlying this trajectory is that the Open Data Commons license on Dolma 3 and the Apache 2.0 license on OLMo 3's training code remain in effect and permit commercial continued pretraining. That assumption holds as of May 2026; it would require re-evaluation if the license terms changed. | The material assumption underlying this trajectory is that the Open Data Commons license on Dolma 3 and the Apache 2.0 license on OLMo 3's training code remain in effect and permit commercial continued pretraining. That assumption holds as of May 2026; it would require re-evaluation if the license terms changed. |
| ## See also | ## See also |
| - [[four-tier-slm-substrate]] — the four deployment tiers built on this substrate | - [[four-tier-slm-substrate]] — the four deployment tiers built on this substrate |
| - [[apprenticeship-substrate]] — how continued pretraining signal is generated from production work | - [[apprenticeship-substrate]] — how continued pretraining signal is generated from production work |
| - [[trajectory-substrate]] — the corpus capture mechanism that feeds continued pretraining | - [[trajectory-substrate]] — the corpus capture mechanism that feeds continued pretraining |
| - [[compounding-doorman]] — the service that routes all inference calls across the three compute tiers | - [[compounding-doorman]] — the service that routes all inference calls across the three compute tiers |