Skip to content

services/service-slm

Topic

From the PointSav Documentation

An AI request that leaves the building cannot be audited and cannot be recalled. The moment institutional intent reaches a frontier model in another company's cloud, the organization has surrendered both the record of the decision and control over it.

service-slm is the language-model service of the PointSav family. It is deliberately a Small Language Model β€” quantised, narrow, fast β€” and its job is not conversation but semantic translation: turning institutional intent into deterministic outputs.

The service runs in three compute tiers, and every inference call β€” local, burst, or external β€” transits the Doorman audit boundary, where each prompt and completion is captured to the per-tenant ledger before the response returns.

For a regulated buyer the consequence is concrete. No AI decision is unlogged, and no request reaches a third-party API without crossing a boundary the operator controls. This article covers the four operations, the three compute tiers, the Doorman boundary, and why a small model is a structural choice rather than a cost compromise.

[edit]What service-slm does

The service is invisible β€” there is no chat window, and the operator never types into service-slm directly. The surface above it presents a structured workflow; service-slm is the silent intermediary. It performs four operations, in order of increasing institutional weight.

Operation Inputs Output
Semantic command parsing English intent from the F8 Terminal Binary UDP command for service-udp
Gravity verification 50-word Gravity Vector from service-content VALID or REJECT single token
Socket assignment Entity bundle from service-extraction + [[archetypes-and-chart-of-accounts Chart of Accounts]]
Theme suggestion Recurring patterns the Gravity Engine flags Proposed new entries to the Themes Seed Vault, for operator approval

The model never publishes structured data autonomously. Every output transits a human-in-the-loop verification step before it can be written to a verified ledger.

[edit]The three compute tiers

The same service-slm interface adapts to the host hardware through three execution modes.

Tier Where it runs Model size Use case
Local Operator workstation or [[totebox-os os-totebox]] with at least 16 GB RAM 1B–7B-parameter quantised model loaded locally
Elastic burst Operator-provisioned ephemeral GPU node Larger model on rented hardware; data tunnelled over an encrypted link Cost-optimised heavy batch processing; the node is torn down after the run
External API Licensed third-party API endpoint Frontier model Last-resort routing for tasks where local capacity is insufficient

All three tiers transit the Doorman audit boundary. No tier bypasses it.

[edit]The Doorman boundary

The Doorman is the audit-routing checkpoint between service-slm and the rest of the system. Every prompt and every completion is captured before the response returns to the caller. The audit trail lives in the local per-tenant ledger and forms the institutional record of every AI decision.

The Doorman exists for three reasons.

  1. Regulatory. ISO/IEC 42001, the AI management-system standard [^1], requires an immutable log of AI-assisted decisions.
  2. Operational. A self-healing system needs a corpus of its own past behaviour; the Doorman captures it.
  3. Sovereign. No request reaches a third-party API without passing through a local boundary the operator controls.

[edit]Model selection

The canonical local model is from the OLMo family, which ships with fully open weights and training-data documentation [^2]. Open weights and documented training data are a prerequisite for continued pre-training on an operator's own corpus β€” the long-term path to a domain-specialised institutional model.

Profile Model RAM target
Edge OLMo-2-0425-1B-Instruct ~2 GB
Standard OLMo-3-1125-7B-Think-Q4_K_M ~6 GB

[edit]Why a small model

A frontier-scale model imposes three costs service-slm cannot accept: it requires cloud egress, it consumes tens of gigabytes of RAM, and it cannot be audited in any meaningful sense. A 1B-parameter quantised model is sufficient for the one narrow task β€” translating institutional English into deterministic outputs β€” and fits inside the cost envelope of a low-cost cloud node alongside a Totebox.

Specialisation, not scale, is the design principle.

[edit]See also

  • service-content β€” the upstream Gravity Engine; primary caller of service-slm for gravity verification
  • os-network-admin β€” the F8 Terminal where semantic command parsing originates
  • totebox-os β€” the Totebox that hosts service-slm in Sovereign Iron mode
  • SYS-ADR-07 β€” structured data never routes through AI; service-slm implements this boundary
  • doorman-protocol β€” the Doorman audit-routing protocol in detail
  • run-local-slm-inference β€” step-by-step guide: start the SLM service and submit inference requests from the console or API
  • run-first-slm-query β€” step-by-step guide: read the Doorman health dashboard and submit your first prompt
Edit this page Β· View source