Skip to content

service-input

Topic

From the PointSav Documentation

service-input is the Ring 1 document ingest service that accepts files at the per-tenant boundary, routes them through format-specific parsers, and writes normalized output to the WORM ledger via service-fs.

Updated 2026-05-25 · HistoryEspañol

service-input is the Ring 1 document-intake service in the PointSav system architecture. It accepts files at the per-tenant boundary, routes them through format-specific parsers, and writes the normalized output into the per-tenant WORM Immutable Ledger via `service-fs`.

[edit]The Anchor position

The Ring 1 boundary is the trust perimeter of the system. Every artifact that crosses this boundary must pass through a deterministic, auditable processing step before it is committed to the WORM ledger. service-input occupies that position for structured documents.

SYS-ADR-10 designates this position The Anchor (function F12 on `os-console`): the single, mandatory intake point where document bytes enter Ring 1, undergo deterministic parsing, and are handed off to service-fs for permanent append-only storage. Nothing about parsing is delegated to an AI model; nothing about the output schema varies non-deterministically. The same input bytes, on the same parser version, always produce the same ParsedDocument.

This is the Anchor property: the document's presence in the WORM ledger is anchored to a reproducible deterministic transformation, not to a probabilistic AI inference.

[edit]Architecture

The service is structured as an MCP handler that receives document bytes via POST /mcp, dispatches them through a format detector and per-format parser, and forwards the parsed result to service-fs via its FsClient. The HTTP boundary exposes three endpoints: GET /healthz, GET /readyz, and POST /mcp. The MCP endpoint is the only write surface. There is no read surface in this service; reads happen at service-fs via its own MCP resource interface.

[edit]Format detection

Format detection runs before parser dispatch. The detect_format function applies two strategies in order.

Extension-first matching checks the filename extension against the known set: .pdf, .md or .markdown, .docx, .xlsx. Extension matching is case-insensitive.

Magic-byte fallback: when the extension is absent or unrecognized, the first four bytes of the document are inspected for known magic sequences. PDF documents begin with %PDF (hex 25 50 44 46). DOCX and XLSX documents are ZIP files sharing the PK header (50 4B 03 04).

Format detection returns an Option<Format>. A None result causes the MCP handler to return a -32602 Invalid params error to the caller; the file is not written to the ledger. This detection algorithm is entirely deterministic — no AI inference — per SYS-ADR-07.

[edit]Parsers

Four format-specific parsers are registered at daemon startup:

PDF — uses oxidize-pdf 2.x. Writes the input bytes to a temporary file, opens it via oxidize-pdf, extracts all page text, and deletes the temporary file via an RAII drop guard. The temporary file is never left on disk after the method returns.

Markdown — uses pulldown-cmark 0.12. Operates on the in-memory byte slice directly. Collects all Text and Code events from the pulldown-cmark event stream, strips HTML tags, and concatenates results.

DOCX — uses docx-rust 0.1.x. Confirms the magic-byte signature before attempting ZIP extraction. Extracts paragraph text with newline separators.

XLSX — uses calamine 0.34. Opens the workbook from an in-memory Cursor, iterates all sheets, and serializes all rows as tab-separated columns.

All four parsers return a normalized ParsedDocument with a format field, a source_id passed through from the caller, text as extracted text content, and a metadata JSON object including a "parser" key.

[edit]MCP interface

The MCP endpoint accepts JSON-RPC 2.0 requests via POST /mcp with a required X-Foundry-Module-ID header. The document.ingest tool accepts filename (used for format detection), source_id (the caller-side document identifier), and bytes_base64 (base64-encoded document bytes).

On success, the response includes the cursor — a monotonically increasing integer assigned by service-fs — as the stable ledger reference for this entry. On error, -32602 Invalid params indicates a format detection failure, and -32603 Internal error indicates a parser failure or service-fs transport error.

[edit]ADR-07 compliance

ADR-07 prohibits AI inference in Ring 1. service-input maintains this constraint throughout: format detection uses extension matching followed by magic-byte inspection; parsing uses purpose-built libraries that apply deterministic algorithms; text normalization is purely structural. A consequence of this constraint is that service-input does not attempt to extract meaning from the parsed text. Semantic interpretation is the responsibility of Ring 2 services downstream.

[edit]Deployment configuration

Variable Required Default Description
INPUT_MODULE_ID Yes Per-tenant module identifier; must match FS_MODULE_ID on the service-fs instance
INPUT_FS_URL Yes service-fs base URL
INPUT_BIND_ADDR No 0.0.0.0:9200 Address and port the MCP HTTP server binds to

[edit]Relations to other Ring 1 services

service-input is one of four Ring 1 boundary-ingest services. Each addresses a distinct document intake channel: service-input handles generic documents (PDF, DOCX, XLSX, Markdown); service-email handles Microsoft Exchange mailboxes; service-people handles identity records; and service-fs is the WORM ledger itself that all three write through. service-extraction in Ring 2 reads from service-fs and applies AI-assisted analysis downstream, outside the Ring 1 boundary.

[edit]See also

Category:Services
Last edited:
Edit this page · View source