PPN Hypervisor Resource Pool
TopicFrom the PointSav Documentation
The PPN hypervisor layer manages a per-node pool of CPU and RAM, dynamically allocating those resources across VMs using virtio_balloon for memory reclaim and cgroups v2 for CPU scheduling weights.
The PointSav Private Network (PPN) hypervisor layer manages a per-node pool of CPU and RAM, dynamically allocating those resources across the virtual machines it runs. This is the mechanism by which the PPN gives more or less compute capacity to each Totebox Archive VM in response to workload demand.
[edit]One pool per physical node
Each physical PPN node β a GCP instance, an on-premises server, a leased machine β controls a pool bounded by its own hardware. The pool is not shared across nodes. A node with 31 GB of RAM manages 31 GB; it does not borrow from a neighbouring node.
Cross-node workload placement is a separate concern: the Totebox Orchestration Layer (gateway-orchestration-command-1) decides which physical node a cluster-totebox instance runs on, based on MBA pairing and available capacity signals. Once that decision is made, the receiving node's hypervisor manages the local resource pool for that VM. The PPN pool and the Totebox scheduler are orthogonal.
[edit]Memory pool: virtio_balloon
The primary memory reclaim mechanism is the virtio_balloon paravirtual device. Every VM provisioned by os-infrastructure is started with a balloon driver, which runs as a standard kernel module inside the guest operating system.
How inflation works (reclaiming memory):
- The hypervisor (balloon controller) signals the balloon driver to inflate by N pages.
- The driver allocates those pages inside the guest, removing them from the guest's usable address space.
- The hypervisor recovers those physical pages for the node-level pool.
- The pool grows by N pages; the guest's available RAM shrinks by N pages.
How deflation works (giving memory back):
- The hypervisor signals the balloon driver to deflate.
- The driver releases balloon pages back into the guest's free list.
- The guest's available RAM grows; the pool shrinks.
The pool at any instant:
pool_available = physical_ram β Ξ£(balloon_minimum across all VMs)
Each VM has a minimum balloon reservation below which the controller will not inflate. This prevents a VM from being starved of memory when the node is under pressure.
[edit]CPU pool: vCPU scheduling weights
CPU pool management uses the Linux cgroups v2 cpu.weight interface. Each QEMU process (one per VM) is placed in a cgroup with a weight drawn from the capability ledger. Under CPU contention, the scheduler distributes vCPU time proportionally to those weights. When the node is not under contention, all VMs run at full speed regardless of weight.
A cluster-totebox VM running an active inference workload (via service-slm) can be assigned a higher weight than an idle archive VM. The ledger entry is the authoritative weight; os-infrastructure applies it at VM launch and can adjust it live.
[edit]Relationship to os-orchestration
os-orchestration is a data-layer aggregator. It aggregates data access across Totebox Archives using the PointSav Protocol (PSP) β capability-based queries that return only result rows, never raw records. It is stateless and holds no keys to archives.
os-orchestration does not allocate CPU. It does not adjust memory. It does not communicate with the hypervisor balloon controller. The two layers are designed to be blind to each other:
- The hypervisor knows a VM is consuming N pages and Y vCPU percent. It does not know whether the VM is running
os-totebox,os-orchestration, or anything else. - The Totebox Archive inside the VM knows nothing about balloon inflation, cgroup weights, or which physical node it is on.
This is the isolation invariant: the hypervisor has zero read capability over VM-internal state.
[edit]Freely transferable archives
Because the hypervisor manages only VM lifecycle and resource allocation β not the data inside the VMs β a Totebox Archive can be stopped, the disk image copied to another node, and restarted there without any change to its data or its identity. The destination node's hypervisor will allocate resources from its own pool for the relocated VM.
This is the freely transferable property of Totebox Archives: the bootable disk image is the archive; the resource pool is the node's infrastructure. Moving the image moves the archive. The new node's pool absorbs the workload.
[edit]Implementation status
The virtio_balloon device flag is available in QEMU 7.x. Adding -device virtio-balloon to the VM launch command installs the balloon driver in the guest.
The balloon controller β the component inside os-infrastructure that decides when to inflate or deflate each VM's balloon in response to demand signals β is a planned milestone. Until the controller is implemented, operators can exercise the mechanism manually via the QEMU monitor:
(qemu) info balloon # show current guest-visible RAM
(qemu) balloon 128 # request guest to give back memory down to 128 MB
(qemu) info balloon # confirm reclaim
The infrastructure/virt/vm-prove.sh script includes -device virtio-balloon so that the balloon driver is present in the test VM from the first boot.
[edit]Planned: cross-node resource extension
The per-node pool is the implemented layer. The planned distributed extension is intended to allow VMs to borrow compute from other physical nodes in the mesh when local capacity is under pressure.
Reboot not required. Standard pool operations β balloon inflation, deflation, and cgroups v2 weight changes β are dynamic. The balloon controller signals the in-guest driver; the driver responds; the node pool adjusts. No guest restart or host reboot is needed. This holds for both the current manual-operator flow (QEMU monitor) and the planned automated controller.
virtio-mem (upstream Linux kernel since 5.8; QEMU since 5.1) is the intended mechanism for the cross-node layer. Where virtio_balloon inflates and deflates a single device, virtio-mem supports fine-grained hot-plug and hot-unplug of individual memory blocks. The intended model: a lending node advertises unused blocks to a requesting VM on another node over the WireGuard mesh. The seL4 capability model is intended to ensure the lending node retains no read capability over the blocks it lends β the physical pages are exclusively mapped into the borrowing VM's address space.
Cross-node placement decisions are intended to remain with gateway-orchestration-command-1 (Totebox Orchestration layer). The distributed capability ledger β planned for development in moonshot-protocol and moonshot-database β is intended to carry cryptographically signed lending grants keyed to each node's pairing-ceremony identity. Revocation is intended to propagate as a Merkle DAG gossip across the mesh without relying on a central authority.
The automated balloon controller β the component inside os-infrastructure that would trigger inflation and deflation in response to demand signals β is a planned milestone that precedes the cross-node lending layer.
[edit]See also
- infrastructure-os β the Type I hypervisor that implements the balloon controller
- totebox-archive β the sovereign data vault running inside each VM
- ppn-distributed-vm-fabric β the planned cross-node extension: virtio-mem lending, distributed capability ledger, cross-node scheduler
- sovereign-mesh β the WireGuard transport layer connecting PPN nodes
- PointSav Private Network β infrastructure overview; the resource pool is one component in the PPN stack