Diff: substrate/yoyo-compute-substrate

From 3f798bf to 3f798bf

+0 / −0 lines

Before	After
---	---
schema: foundry-doc-v1	schema: foundry-doc-v1
title: "Yo-Yo compute substrate"	title: "Yo-Yo compute substrate"
slug: yoyo-compute-substrate	slug: yoyo-compute-substrate
category: substrate	category: substrate
type: topic	type: topic
quality: complete	quality: complete
short_description: "The three-ring compute substrate that lets service-slm spin GPU inference capacity up and down while retaining state, accumulating skill, and producing an audit ledger of every compute event."	short_description: "The three-ring compute substrate that lets service-slm spin GPU inference capacity up and down while retaining state, accumulating skill, and producing an audit ledger of every compute event."
status: active	status: active
bcsc_class: public-disclosure-safe	bcsc_class: public-disclosure-safe
last_edited: 2026-05-15	last_edited: 2026-05-15
editor: pointsav-engineering	editor: pointsav-engineering
cites: []	cites: []
references:	references:
- id: 1	- id: 1
text: "Open Source Security Foundation. 'SLSA: Supply chain Levels for Software Artifacts v1.0.' SLSA.dev, 2023."	text: "Open Source Security Foundation. 'SLSA: Supply chain Levels for Software Artifacts v1.0.' SLSA.dev, 2023."
url: "https://slsa.dev/spec/v1.0/"	url: "https://slsa.dev/spec/v1.0/"
paired_with: yoyo-compute-substrate.es.md	paired_with: yoyo-compute-substrate.es.md
---	---

The Yo-Yo Compute Substrate is the specification for how `service-slm` manages GPU inference across teardowns. A GPU inference node is expensive at idle. But a node that discards all state on shutdown forces a full re-computation on the next spin-up — slow, wasteful, and commercially corrosive at scale. The Yo-Yo substrate resolves this by decomposing compute state into three rings, each with a different persistence strategy, so that spin-up is fast, state is retained where it is worth retaining, and every event is recorded in a SOC 3-grade audit ledger.	The Yo-Yo Compute Substrate is the specification for how `service-slm` manages GPU inference across teardowns. A GPU inference node is expensive at idle. But a node that discards all state on shutdown forces a full re-computation on the next spin-up — slow, wasteful, and commercially corrosive at scale. The Yo-Yo substrate resolves this by decomposing compute state into three rings, each with a different persistence strategy, so that spin-up is fast, state is retained where it is worth retaining, and every event is recorded in a SOC 3-grade audit ledger.

The name is literal: the compute tier comes down and goes back up, repeatedly, without losing what matters.	The name is literal: the compute tier comes down and goes back up, repeatedly, without losing what matters.

## The three-ring memory model	## The three-ring memory model

\| Ring \| Name \| Storage \| Survives teardown? \|	\| Ring \| Name \| Storage \| Survives teardown? \|
\|---\|---\|---\|---\|	\|---\|---\|---\|---\|
\| 1 \| Bootstrap \| Container image + GCS-cached model weights \| Yes (as artefacts in cold storage) \|	\| 1 \| Bootstrap \| Container image + GCS-cached model weights \| Yes (as artefacts in cold storage) \|
\| 2 \| Working memory (KV cache) \| LMCache + Mooncake Store \| Yes (pooled, `moduleId`-isolated) \|	\| 2 \| Working memory (KV cache) \| LMCache + Mooncake Store \| Yes (pooled, `moduleId`-isolated) \|
\| 3a \| Long-term graph memory \| LadybugDB in `service-content` \| Yes (authoritative) \|	\| 3a \| Long-term graph memory \| LadybugDB in `service-content` \| Yes (authoritative) \|
\| 3b \| Long-term skill (adapters) \| LoRA adapter stack as OCI Artefacts \| Yes (portable, signed) \|	\| 3b \| Long-term skill (adapters) \| LoRA adapter stack as OCI Artefacts \| Yes (portable, signed) \|

Everything outside these rings is ephemeral and intentionally discarded.	Everything outside these rings is ephemeral and intentionally discarded.

## Ring 1 — Bootstrap: sub-thirty-second warm starts at zero idle cost	## Ring 1 — Bootstrap: sub-thirty-second warm starts at zero idle cost

The standard trade-off presented by managed GPU services is a false binary: pay continuously for a warm endpoint, or accept sixty-to-one-hundred-twenty-second cold starts for a fully serverless one. Neither is correct for a workload pattern that is bursty but predictable — weekly batch runs plus opportunistic query-time calls.	The standard trade-off presented by managed GPU services is a false binary: pay continuously for a warm endpoint, or accept sixty-to-one-hundred-twenty-second cold starts for a fully serverless one. Neither is correct for a workload pattern that is bursty but predictable — weekly batch runs plus opportunistic query-time calls.

The Yo-Yo substrate resolves this through four pre-staged bootstrap layers:	The Yo-Yo substrate resolves this through four pre-staged bootstrap layers: