Diff: architecture/slm-stack-architecture

From e6d5f15 to e6d5f15

+0 / −0 lines

Before	After
---	---
schema: foundry-doc-v1	schema: foundry-doc-v1
title: "service-slm Rust Stack Architecture"	title: "service-slm Rust Stack Architecture"
slug: slm-stack-architecture	slug: slm-stack-architecture
category: architecture	category: architecture
type: topic	type: topic
quality: published	quality: published
short_description: "The full Rust dependency graph and binary architecture for service-slm, the Doorman service that mediates every inference call in the PointSav platform."	short_description: "The full Rust dependency graph and binary architecture for service-slm, the Doorman service that mediates every inference call in the PointSav platform."
status: active	status: active
bcsc_class: public-disclosure-safe	bcsc_class: public-disclosure-safe
last_edited: 2026-05-01	last_edited: 2026-05-01
editor: pointsav-engineering	editor: pointsav-engineering
cites: []	cites: []
paired_with: slm-stack-architecture.es.md	paired_with: slm-stack-architecture.es.md
---	---

`service-slm` is built as a single, statically-linked Rust binary. Every direct dependency in the stack is either pure Rust or Rust bindings to a permissively licensed native library. No copyleft licenses appear anywhere in the dependency graph, which means PointSav holds an unrestricted right to fork, modify, and redistribute the entire codebase. This property is called the "We Own It" criterion.	`service-slm` is built as a single, statically-linked Rust binary. Every direct dependency in the stack is either pure Rust or Rust bindings to a permissively licensed native library. No copyleft licenses appear anywhere in the dependency graph, which means PointSav holds an unrestricted right to fork, modify, and redistribute the entire codebase. This property is called the "We Own It" criterion.

The choice of Rust is not a language preference. It is an engineering constraint imposed by the intended deployment target — ToteboxOS appliance hardware, where a CPython interpreter plus a large ML framework does not fit in the available memory envelope, and where cold-start predictability and the absence of a garbage collector are operational requirements rather than optional improvements.	The choice of Rust is not a language preference. It is an engineering constraint imposed by the intended deployment target — ToteboxOS appliance hardware, where a CPython interpreter plus a large ML framework does not fit in the available memory envelope, and where cold-start predictability and the absence of a garbage collector are operational requirements rather than optional improvements.

## Why L2 Rust, not L3 Rust	## Why L2 Rust, not L3 Rust

Three distinct levels of "Rust-ness" are commonly conflated:	Three distinct levels of "Rust-ness" are commonly conflated:

\| Level \| Meaning \| Achievable for service-slm? \|	\| Level \| Meaning \| Achievable for service-slm? \|
\|---\|---\|---\|	\|---\|---\|---\|
\| L1 — Source Rust \| All code written by PointSav is Rust \| Yes \|	\| L1 — Source Rust \| All code written by PointSav is Rust \| Yes \|
\| L2 — Direct-deps Rust \| Every crate directly depended on is a Rust crate (may internally FFI to C/C++) \| Yes \|	\| L2 — Direct-deps Rust \| Every crate directly depended on is a Rust crate (may internally FFI to C/C++) \| Yes \|
\| L3 — Transitive Rust \| Every line in the entire dependency tree, including GPU kernels, is Rust \| No — and not the right goal \|	\| L3 — Transitive Rust \| Every line in the entire dependency tree, including GPU kernels, is Rust \| No — and not the right goal \|

L3 is unachievable for GPU inference and graph databases in 2026 because CUDA kernels and columnar storage engines have a twenty-year C++ inheritance. L2 is achievable and sufficient. The "We Own It" test is a license question, not a language question: `MIT + Apache-2.0 = we own it`.	L3 is unachievable for GPU inference and graph databases in 2026 because CUDA kernels and columnar storage engines have a twenty-year C++ inheritance. L2 is achievable and sufficient. The "We Own It" test is a license question, not a language question: `MIT + Apache-2.0 = we own it`.

## The canonical stack	## The canonical stack

### Inference layer	### Inference layer

The inference runtime for service-slm is mistral.rs, a statically-linked Rust binary that ships with FlashAttention V2/V3, PagedAttention, prefix caching, and LoRA hot-swap per token. It exposes an OpenAI-compatible HTTP endpoint, which is the wire protocol the Doorman uses for Tier A (local) and Tier B (GPU burst) calls.	The inference runtime for service-slm is mistral.rs, a statically-linked Rust binary that ships with FlashAttention V2/V3, PagedAttention, prefix caching, and LoRA hot-swap per token. It exposes an OpenAI-compatible HTTP endpoint, which is the wire protocol the Doorman uses for Tier A (local) and Tier B (GPU burst) calls.

The foundation ML framework underneath mistral.rs is candle (Apache-2.0/MIT dual license, Hugging Face). If mistral.rs ever diverges from platform requirements, candle provides a clean rebuild path without re-architecting the stack.	The foundation ML framework underneath mistral.rs is candle (Apache-2.0/MIT dual license, Hugging Face). If mistral.rs ever diverges from platform requirements, candle provides a clean rebuild path without re-architecting the stack.