Diff: services/service-slm-yoyo-operational

From 9106fc7 to 9106fc7

+0 / −0 lines

Before	After
---	---
schema: foundry-doc-v1	schema: foundry-doc-v1
title: "SLM and Yo-Yo operational state"	title: "SLM and Yo-Yo operational state"
slug: service-slm-yoyo-operational	slug: service-slm-yoyo-operational
category: services	category: services
type: topic	type: topic
quality: complete	quality: complete
short_description: "How service-SLM's three-tier inference router and the Yo-Yo GPU burst VM operate, including the Doorman boundary, Tier A/B configuration, apprenticeship brief queue, and idle-shutdown cost ceiling."	short_description: "How service-SLM's three-tier inference router and the Yo-Yo GPU burst VM operate, including the Doorman boundary, Tier A/B configuration, apprenticeship brief queue, and idle-shutdown cost ceiling."
status: active	status: active
bcsc_class: public-disclosure-safe	bcsc_class: public-disclosure-safe
last_edited: 2026-05-25	last_edited: 2026-05-25
editor: pointsav-engineering	editor: pointsav-engineering
cites:	cites:
- ni-51-102	- ni-51-102
- osc-sn-51-721	- osc-sn-51-721
- olmo3-allenai	- olmo3-allenai
paired_with: service-slm-yoyo-operational.es.md	paired_with: service-slm-yoyo-operational.es.md
---	---

service-SLM is the platform's Ring 3 component — the Optional Intelligence layer. It is a three-tier inference router that clusters and contributors use to delegate routine work: editorial polish, mechanical schema-conforming edits, bilingual translation drafts, and structured-output generation. The work is handled locally or on a dedicated GPU burst VM, without routing to a third-party API. Rings 1 and 2 (boundary ingest and knowledge processing) function fully without it; Ring 3 is structurally optional.	service-SLM is the platform's Ring 3 component — the Optional Intelligence layer. It is a three-tier inference router that clusters and contributors use to delegate routine work: editorial polish, mechanical schema-conforming edits, bilingual translation drafts, and structured-output generation. The work is handled locally or on a dedicated GPU burst VM, without routing to a third-party API. Rings 1 and 2 (boundary ingest and knowledge processing) function fully without it; Ring 3 is structurally optional.

The Yo-Yo is the name for the platform's on-demand GPU burst instance — a GCE VM that runs a 32-billion-parameter instruction-tuned model at approximately 50-100 tokens per second. It starts on demand, shuts down after 30 minutes of inactivity, and accumulates a brief queue through its idle windows. The combination — a lightweight always-available local model on the workspace VM and a capable on-demand burst VM — defines the two active inference tiers. A third tier (external API) is configured for future use; Tier C has no active keys in the current operational period.	The Yo-Yo is the name for the platform's on-demand GPU burst instance — a GCE VM that runs a 32-billion-parameter instruction-tuned model at approximately 50-100 tokens per second. It starts on demand, shuts down after 30 minutes of inactivity, and accumulates a brief queue through its idle windows. The combination — a lightweight always-available local model on the workspace VM and a capable on-demand burst VM — defines the two active inference tiers. A third tier (external API) is configured for future use; Tier C has no active keys in the current operational period.

This document describes how service-SLM and the Yo-Yo operate in the current operational period, when the inference-substrate design was marked complete.	This document describes how service-SLM and the Yo-Yo operate in the current operational period, when the inference-substrate design was marked complete.

## The Doorman boundary	## The Doorman boundary

Every inference request crosses the Doorman before reaching a model tier. The Doorman is a Rust binary running as systemd unit `local-doorman.service`, binding `127.0.0.1:9080`. Its responsibilities cover the full request lifecycle:	Every inference request crosses the Doorman before reaching a model tier. The Doorman is a Rust binary running as systemd unit `local-doorman.service`, binding `127.0.0.1:9080`. Its responsibilities cover the full request lifecycle:

- Hold all API keys — Tier C provider tokens and the Tier B bearer token. Keys exist nowhere else in the request path. This is the API-key boundary discipline: no key dispersal across call sites.	- Hold all API keys — Tier C provider tokens and the Tier B bearer token. Keys exist nowhere else in the request path. This is the API-key boundary discipline: no key dispersal across call sites.
- Route requests to the correct tier based on complexity heuristics: request size, structured-output requirements, and audit-ledger semantics.	- Route requests to the correct tier based on complexity heuristics: request size, structured-output requirements, and audit-ledger semantics.
- Sanitise outbound requests before they reach any external API (strip workspace identifiers; rehydrate on inbound).	- Sanitise outbound requests before they reach any external API (strip workspace identifiers; rehydrate on inbound).
- Append every transit to a per-tenant audit ledger at `/var/lib/local-doorman/audit/<tenant>/<YYYY-MM>.jsonl`.	- Append every transit to a per-tenant audit ledger at `/var/lib/local-doorman/audit/<tenant>/<YYYY-MM>.jsonl`.
- Drain the apprenticeship brief queue (described below).	- Drain the apprenticeship brief queue (described below).

The `/readyz` endpoint returns live tier-availability flags. An example response when all tiers are operational:	The `/readyz` endpoint returns live tier-availability flags. An example response when all tiers are operational:

```json	```json
{	{
"ready": true,	"ready": true,