Institutional small language model

An AI request that leaves the building cannot be audited and cannot be recalled. The moment institutional intent reaches a frontier model in another company's cloud, the organization has surrendered both the record of the decision and control over it.

service-slm is the language-model service of the PointSav family. It is deliberately a Small Language Model — quantised, narrow, fast — and its job is not conversation but semantic translation: turning institutional intent into deterministic outputs.

The service runs in three compute tiers, and every inference call — local, burst, or external — transits the Doorman audit boundary, where each prompt and completion is captured to the per-tenant ledger before the response returns.

For a regulated buyer the consequence is concrete. No AI decision is unlogged, and no request reaches a third-party API without crossing a boundary the operator controls. This article covers the four operations, the three compute tiers, the Doorman boundary, and why a small model is a structural choice rather than a cost compromise.

What service-slm does

The service is invisible — there is no chat window, and the operator never types into service-slm directly. The surface above it presents a structured workflow; service-slm is the silent intermediary. It performs four operations, in order of increasing institutional weight.

Operation	Inputs	Output
Semantic command parsing	English intent from the F8 Terminal	Binary UDP command for `service-udp`
Gravity verification	50-word Gravity Vector from service-content	`VALID` or `REJECT` single token
Socket assignment	Entity bundle from service-extraction + Chart of Accounts	Sovereign-ID with Chart-of-Accounts socket
Theme suggestion	Recurring patterns the Gravity Engine flags	Proposed new entries to the Themes Seed Vault, for operator approval

The model never publishes structured data autonomously. Every output transits a human-in-the-loop verification step before it can be written to a verified ledger.

The three compute tiers

The same service-slm interface adapts to the host hardware through three execution modes.

Tier	Where it runs	Model size	Use case
Local	Operator workstation or `os-totebox` with at least 16 GB RAM	1B–7B-parameter quantised model loaded locally	Sovereign Iron Vault — institutional customers; no cloud egress
Elastic burst	Operator-provisioned ephemeral GPU node	Larger model on rented hardware; data tunnelled over an encrypted link	Cost-optimised heavy batch processing; the node is torn down after the run
External API	Licensed third-party API endpoint	Frontier model	Last-resort routing for tasks where local capacity is insufficient

All three tiers transit the Doorman audit boundary. No tier bypasses it.

The Doorman boundary

The Doorman is the audit-routing checkpoint between service-slm and the rest of the system. Every prompt and every completion is captured before the response returns to the caller. The audit trail lives in the local per-tenant ledger and forms the institutional record of every AI decision.

The Doorman exists for three reasons.

Regulatory. ISO/IEC 42001, the AI management-system standard ¹, requires an immutable log of AI-assisted decisions.
Operational. A self-healing system needs a corpus of its own past behaviour; the Doorman captures it.
Sovereign. No request reaches a third-party API without passing through a local boundary the operator controls.

Model selection

The canonical local model is from the OLMo family, which ships with fully open weights and training-data documentation ². Open weights and documented training data are a prerequisite for continued pre-training on an operator's own corpus — the long-term path to a domain-specialised institutional model.

Profile	Model	RAM target
Edge	OLMo-2-0425-1B-Instruct	~2 GB
Standard	OLMo-3-1125-7B-Think-Q4_K_M	~6 GB

Why a small model

A frontier-scale model imposes three costs service-slm cannot accept: it requires cloud egress, it consumes tens of gigabytes of RAM, and it cannot be audited in any meaningful sense. A 1B-parameter quantised model is sufficient for the one narrow task — translating institutional English into deterministic outputs — and fits inside the cost envelope of a low-cost cloud node alongside a Totebox.

Specialisation, not scale, is the design principle.

Navigate

Resources

PointSav network