Diff: substrate/llm-substrate-decision

From 2169822 to 2169822

+0 / −0 lines

Before	After
---	---
schema: foundry-doc-v1	schema: foundry-doc-v1
title: "LLM substrate decision — OLMo 3 family"	title: "LLM substrate decision — OLMo 3 family"
slug: llm-substrate-decision	slug: llm-substrate-decision
category: substrate	category: substrate
type: topic	type: topic
quality: complete	quality: complete
short_description: "The rationale for selecting OLMo 3 as the local and GPU-burst language model substrate: the only fully open model family — training data, training code, and checkpoints included — that permits continued pretraining and satisfies a Canadian public-company procurement posture."	short_description: "The rationale for selecting OLMo 3 as the local and GPU-burst language model substrate: the only fully open model family — training data, training code, and checkpoints included — that permits continued pretraining and satisfies a Canadian public-company procurement posture."
status: active	status: active
bcsc_class: public-disclosure-safe	bcsc_class: public-disclosure-safe
last_edited: 2026-05-15	last_edited: 2026-05-15
editor: pointsav-engineering	editor: pointsav-engineering
cites: []	cites: []
references:	references:
- id: 1	- id: 1
text: "AI2. 'OLMo 3.' Allen Institute for AI, 2025."	text: "AI2. 'OLMo 3.' Allen Institute for AI, 2025."
url: "https://allenai.org/blog/olmo3"	url: "https://allenai.org/blog/olmo3"
- id: 2	- id: 2
text: "AI2. 'Dolma 3 Dataset.' Open Data Commons License. Allen Institute for AI, 2024."	text: "AI2. 'Dolma 3 Dataset.' Open Data Commons License. Allen Institute for AI, 2024."
url: "https://huggingface.co/datasets/allenai/dolma"	url: "https://huggingface.co/datasets/allenai/dolma"
paired_with: llm-substrate-decision.es.md	paired_with: llm-substrate-decision.es.md
---	---

The PointSav platform uses the OLMo 3 model family as its language model substrate. OLMo 3 7B runs locally on [[customer-hostability\|customer hardware]]. OLMo 3.1 32B Think runs on a short-lived [[yoyo-compute-substrate\|GPU burst instance]] for heavier inference tasks. The selection is not primarily about benchmark performance — it is about ownership depth.	The PointSav platform uses the OLMo 3 model family as its language model substrate. OLMo 3 7B runs locally on [[customer-hostability\|customer hardware]]. OLMo 3.1 32B Think runs on a short-lived [[yoyo-compute-substrate\|GPU burst instance]] for heavier inference tasks. The selection is not primarily about benchmark performance — it is about ownership depth.

## Three levels of openness	## Three levels of openness

The language model market in 2026 offers three distinct depths of openness, and the distinction matters for any organisation planning a five-year compounding trajectory.	The language model market in 2026 offers three distinct depths of openness, and the distinction matters for any organisation planning a five-year compounding trajectory.

Level 1 — Open weights. The model's trained parameter files are published. You can run inference and perform LoRA fine-tuning. You cannot inspect the training data, reproduce the training run, or continue pretraining the model from a known checkpoint. Most widely-discussed open models operate at this level.	Level 1 — Open weights. The model's trained parameter files are published. You can run inference and perform LoRA fine-tuning. You cannot inspect the training data, reproduce the training run, or continue pretraining the model from a known checkpoint. Most widely-discussed open models operate at this level.

Level 2 — Open weights with a permissive license. Same as Level 1, with a license clean enough to include in a commercial product. This is the level that makes a model usable in a shipped product without legal exposure.	Level 2 — Open weights with a permissive license. Same as Level 1, with a license clean enough to include in a commercial product. This is the level that makes a model usable in a shipped product without legal exposure.

Level 3 — Fully open model. Training data, training code, and checkpoints at every stage of training are published, alongside the weights and a permissive license. At this level you can continue pretraining — starting from a known checkpoint, on your own corpus, to produce a derivative model that your organisation subsequently owns.	Level 3 — Fully open model. Training data, training code, and checkpoints at every stage of training are published, alongside the weights and a permissive license. At this level you can continue pretraining — starting from a known checkpoint, on your own corpus, to produce a derivative model that your organisation subsequently owns.

OLMo 3 is the only model in the 2026 non-Chinese open model landscape that operates at Level 3. [^1] Its training data is the Dolma 3 corpus (9.3 trillion tokens, published under the Open Data Commons license). [^2] Its training code is published under Apache 2.0. Checkpoints at intermediate stages are available for download.	OLMo 3 is the only model in the 2026 non-Chinese open model landscape that operates at Level 3. [^1] Its training data is the Dolma 3 corpus (9.3 trillion tokens, published under the Open Data Commons license). [^2] Its training code is published under Apache 2.0. Checkpoints at intermediate stages are available for download.

For a platform designed to compound over a five-year horizon — where the intended outcome is a [[customer-hostability\|customer-owned]], specialised base model trained on accumulated customer data — Level 3 is the only depth that makes that outcome possible. Level 1 and Level 2 produce fine-tuning capability but not ownership of the base.	For a platform designed to compound over a five-year horizon — where the intended outcome is a [[customer-hostability\|customer-owned]], specialised base model trained on accumulated customer data — Level 3 is the only depth that makes that outcome possible. Level 1 and Level 2 produce fine-tuning capability but not ownership of the base.

## Why each alternative was set aside	## Why each alternative was set aside

Models considered and rejected:	Models considered and rejected:

Phi-4 (Microsoft). MIT license on the weights; training data closed and Microsoft-proprietary. Fine-tuning is possible; continued pretraining is not, because the training data necessary to reproduce or extend the base is not published.	Phi-4 (Microsoft). MIT license on the weights; training data closed and Microsoft-proprietary. Fine-tuning is possible; continued pretraining is not, because the training data necessary to reproduce or extend the base is not published.

Granite 4 (IBM). Apache 2.0 license on the weights; training data is a closed IBM-proprietary corpus. Same constraint as Phi-4.	Granite 4 (IBM). Apache 2.0 license on the weights; training data is a closed IBM-proprietary corpus. Same constraint as Phi-4.

Llama 4 (Meta). The Llama Community License is not OSI-approved and includes restrictions on commercial use above a user threshold. The license constraint disqualifies it independently of the training data question.	Llama 4 (Meta). The Llama Community License is not OSI-approved and includes restrictions on commercial use above a user threshold. The license constraint disqualifies it independently of the training data question.

Mistral 7B. Apache 2.0 license; training data partially documented but no full open release. Older architecture; weaker on code than 2026 alternatives.	Mistral 7B. Apache 2.0 license; training data partially documented but no full open release. Older architecture; weaker on code than 2026 alternatives.

Chinese-origin models (Qwen, DeepSeek, and related). Technically capable and in several cases released under MIT or Apache 2.0. The procurement constraint for a Canadian public company is the disqualifying factor: US National Defense Authorization Act precedent from fiscal year 2026 creates regulatory exposure for Canadian public companies incorporating Chinese-origin model infrastructure, regardless of license terms.	Chinese-origin models (Qwen, DeepSeek, and related). Technically capable and in several cases released under MIT or Apache 2.0. The procurement constraint for a Canadian public company is the disqualifying factor: US National Defense Authorization Act precedent from fiscal year 2026 creates regulatory exposure for Canadian public companies incorporating Chinese-origin model infrastructure, regardless of license terms.

## Capability	## Capability

OLMo 3 32B Think, in the updated OLMo 3.1 December 2025 release, reaches 91.4% on HumanEvalPlus — the measure of practical code generation accuracy — and sits within two percentage points of the leading open-weight model on the standard mathematical reasoning and instruction-following benchmarks. The 7B variant is strong on programming, reading comprehension, and mathematics with a 65,000-token context window.	OLMo 3 32B Think, in the updated OLMo 3.1 December 2025 release, reaches 91.4% on HumanEvalPlus — the measure of practical code generation accuracy — and sits within two percentage points of the leading open-weight model on the standard mathematical reasoning and instruction-following benchmarks. The 7B variant is strong on programming, reading comprehension, and mathematics with a 65,000-token context window.

These numbers do not place OLMo 3 at the absolute frontier of open-weight performance. They do place it firmly in the range where the [[compounding-doorman\|Doorman]] — the service that mediates all AI inference calls in the PointSav architecture — produces useful results on the daily tasks it handles. The 7B local variant handles routine work; the 32B burst variant handles tasks requiring extended reasoning. The same vocabulary, tokenizer, and prompt format apply to both, which means the [[adapter-composition\|adapter library]] trained on one is compatible with the other.	These numbers do not place OLMo 3 at the absolute frontier of open-weight performance. They do place it firmly in the range where the [[compounding-doorman\|Doorman]] — the service that mediates all AI inference calls in the PointSav architecture — produces useful results on the daily tasks it handles. The 7B local variant handles routine work; the 32B burst variant handles tasks requiring extended reasoning. The same vocabulary, tokenizer, and prompt format apply to both, which means the [[adapter-composition\|adapter library]] trained on one is compatible with the other.

## The three compute tiers	## The three compute tiers

The Doorman routes requests among three tiers:	The Doorman routes requests among three tiers:

Tier A — local. OLMo 3 7B running on the customer's own hardware. Approximately zero marginal cost once the hardware is in place. Default for most operations.	Tier A — local. OLMo 3 7B running on the customer's own hardware. Approximately zero marginal cost once the hardware is in place. Default for most operations.

Tier B — GPU burst. OLMo 3.1 32B Think on a short-lived GPU instance. Approximately $0.84 per hour at list pricing on major cloud providers, significantly less on spot/preemptible instances. Used for requests the local tier cannot handle efficiently. Idle-shutdown discipline means the instance runs only when a request requires it.	Tier B — GPU burst. OLMo 3.1 32B Think on a short-lived GPU instance. Approximately $0.84 per hour at list pricing on major cloud providers, significantly less on spot/preemptible instances. Used for requests the local tier cannot handle efficiently. Idle-shutdown discipline means the instance runs only when a request requires it.

Tier C — external API. Third-party language model services via a per-request allowlist. Used only for narrow precision tasks — [[citation-substrate\|citation grounding]], initial [[knowledge-graph-grounded-apprenticeship\|knowledge graph]] construction, entity disambiguation — where the precision requirement justifies the cost. Every Tier C call is logged at the customer's [[worm-ledger-architecture\|audit ledger]].	Tier C — external API. Third-party language model services via a per-request allowlist. Used only for narrow precision tasks — [[citation-substrate\|citation grounding]], initial [[knowledge-graph-grounded-apprenticeship\|knowledge graph]] construction, entity disambiguation — where the precision requirement justifies the cost. Every Tier C call is logged at the customer's [[worm-ledger-architecture\|audit ledger]].

The customer's routing configuration determines which tier handles which request. No per-request manual selection is required.	The customer's routing configuration determines which tier handles which request. No per-request manual selection is required.

## The intended continued-pretraining path	## The intended continued-pretraining path

The platform's intended multi-year trajectory, as currently planned, is to move from using OLMo 3 as a base through a process of continued pretraining that produces PointSav-OLMo-N — a derivative model trained on accumulated [[trajectory-substrate\|platform corpus data]], customer [[adapter-composition\|LoRA adapter]] distillation, and curated public material. Year two onwards is the intended start window for the first continued-pretraining run, targeting the 7B variant at an estimated cost of $30,000 to $100,000 on cloud GPU infrastructure. This trajectory is planned but not yet initiated; the current platform operates on the published OLMo 3 base.	The platform's intended multi-year trajectory, as currently planned, is to move from using OLMo 3 as a base through a process of continued pretraining that produces PointSav-OLMo-N — a derivative model trained on accumulated [[trajectory-substrate\|platform corpus data]], customer [[adapter-composition\|LoRA adapter]] distillation, and curated public material. Year two onwards is the intended start window for the first continued-pretraining run, targeting the 7B variant at an estimated cost of $30,000 to $100,000 on cloud GPU infrastructure. This trajectory is planned but not yet initiated; the current platform operates on the published OLMo 3 base.

The material assumption underlying this trajectory is that the Open Data Commons license on Dolma 3 and the Apache 2.0 license on OLMo 3's training code remain in effect and permit commercial continued pretraining. That assumption holds as of May 2026; it would require re-evaluation if the license terms changed.	The material assumption underlying this trajectory is that the Open Data Commons license on Dolma 3 and the Apache 2.0 license on OLMo 3's training code remain in effect and permit commercial continued pretraining. That assumption holds as of May 2026; it would require re-evaluation if the license terms changed.

## See also	## See also

- [[four-tier-slm-substrate]] — the four deployment tiers built on this substrate	- [[four-tier-slm-substrate]] — the four deployment tiers built on this substrate
- [[apprenticeship-substrate]] — how continued pretraining signal is generated from production work	- [[apprenticeship-substrate]] — how continued pretraining signal is generated from production work
- [[trajectory-substrate]] — the corpus capture mechanism that feeds continued pretraining	- [[trajectory-substrate]] — the corpus capture mechanism that feeds continued pretraining
- [[compounding-doorman]] — the service that routes all inference calls across the three compute tiers	- [[compounding-doorman]] — the service that routes all inference calls across the three compute tiers