PPN Distributed VM Fabric
TopicFrom the PointSav Documentation
The PPN Distributed VM Fabric is the planned extension of the per-node PPN hypervisor layer to a multi-node resource pool, intended to allow VMs to borrow compute from other mesh nodes and migrate across the fleet without per-move operator involvement.
The PPN Distributed VM Fabric is the planned extension of the per-node PPN hypervisor layer to a multi-node resource pool. Where the current hypervisor layer manages CPU and RAM within a single physical node, the distributed fabric is intended to allow VMs to borrow compute from other nodes in the mesh and to place and migrate VMs across the fleet without per-move operator involvement.
This topic describes the planned architecture. The per-node layer β virtio_balloon, cgroups v2 weights, and the vm-prove.sh proof β is implemented and proven as of 2026-05-28. The four distributed components described below are planned milestones; none is built yet.
[edit]Current state: per-node only
The implemented hypervisor layer allocates CPU and RAM within one physical node. The pool is bounded by that node's hardware. Expanding a VM's memory means taking from the same node's free pool. Placing a VM on a different node requires stopping it, transferring the disk image, and restarting β a manual operation the Totebox Orchestration layer is intended to automate.
The virtio_balloon mechanism proven in infrastructure/virt/vm-prove.sh operates entirely within a single node. No network communication is involved. No reboot of host or guest is required for balloon operations β inflation, deflation, and cgroups v2 weight changes are all dynamic adjustments to a running system.
[edit]Component 1 β Virtio-mem lending over WireGuard (planned)
virtio-mem (upstream Linux kernel since 5.8; QEMU since 5.1) supports hot-plug and hot-unplug of individual memory blocks in a running guest. Unlike virtio_balloon β which inflates a single device to reclaim pages β virtio-mem exposes a set of granular blocks that the guest kernel can accept or return one at a time. A VM can grow beyond its initial allocation without a restart.
The intended cross-node extension:
- A node with surplus RAM advertises available
virtio-memblocks over the WireGuard mesh - A VM on a different node that needs more RAM receives those blocks as a hot-plugged device
- The seL4 capability model on the lending node is intended to ensure no read capability is retained over the lent blocks β the physical pages are mapped exclusively into the borrowing VM's address space
- When the lending period ends, the blocks are hot-unplugged and returned to the lending node's pool
This approach differs from CXL 3.0 (Compute Express Link), which provides hardware-coherent memory sharing over PCIe fabric. CXL requires physical proximity β it does not work over a WAN or internet link. The WireGuard-based lending approach is intended to work over any network, including the existing encrypted mesh, at the cost of higher latency than PCIe.
[edit]Component 2 β Distributed capability ledger (planned; moonshot-protocol, moonshot-database)
The seL4 capability model operates within a single address space: capabilities are unforgeable handles that mediate all object access, but they do not cross machine boundaries natively. The distributed extension requires a capability protocol that works across nodes.
The intended design:
- Capability tokens are HMAC-signed grants issued by a lending node, keyed to the node's pairing-ceremony identity (established via CPace PAKE at join time). Each token is intended to encode:
{grantee_node, resource_type, resource_id, expires, sequence_number}. - Revocation is intended to be a signed revocation record appended to a Merkle DAG. Every node would maintain a local copy of the DAG. Nodes would gossip delta records over the WireGuard mesh; a revocation is intended to reach all peers in sub-second time on a LAN mesh.
- No central revocation authority is required. The pairing ceremony establishes trust roots; subsequent grants and revocations are intended to flow peer-to-peer.
The moonshot-protocol and moonshot-database project directories are reserved for this work. The existing 16-byte binary wire format (for app-network-admin mesh commands) is the prototype for the compact token encoding.
[edit]Component 3 β Cross-node VM scheduler (planned; os-orchestration)
gateway-orchestration-command-1 is the intended home for cross-node placement logic. It is stateless today β it aggregates PSP data queries across Totebox Archives and returns result rows. The planned extension is intended to add a resource scheduling layer:
- Placement: when a new VM needs to be launched, the scheduler would read resource-availability advertisements from each node's
os-network-admin, check the capability ledger for node trust status, and place the VM on the best-fit node. - Migration: QEMU live migration transfers a running VM's state (memory, CPU registers, device state) over a TCP connection tunnelled through WireGuard. The VM remains available during the transfer; the cutover pause is typically under one second on a LAN.
- Sovereignty constraint: the operator is intended to be able to pin any VM to a specific trusted node. A VM pinned to Laptop A would not be migrated to Laptop B or the GCP node regardless of load imbalance. Operator sovereignty over placement is not overridden by automated optimisation.
No new transport is needed. QEMU live migration over WireGuard uses the existing encrypted mesh. The scheduler is intended to remain stateless β placement decisions derived from the capability ledger and node advertisements, no persistent scheduler state.
[edit]Component 4 β Sovereign attestation chain (planned)
Intel TDX (5th-Gen Xeon, Azure GA 2025) and AMD SEV-SNP (EPYC 9000 Turin) provide hardware-enforced VM isolation where the hypervisor cannot read guest memory. Both require the CPU manufacturer's attestation infrastructure to verify isolation claims β a certificate chain that passes through Intel Trust Authority or AMD Key Management Service. The operator trusts the silicon vendor, not only their own hardware.
The PPN's intended attestation design is different:
- Attestation root: the pairing ceremony itself. When a node joins the mesh via CPace PAKE and SAS short-code comparison, the operator physically witnesses the exchange. The node's identity key is intended to serve as the attestation anchor β no TPM vendor, no silicon vendor, no cloud vendor in the trust chain.
- Guest image attestation:
dm-verity(Linux device mapper, standard since kernel 3.4) is intended to anchor the guest OS root filesystem to a hash committed at VM provision time. The hash would be signed by the provisioning node's pairing-ceremony key. A guest modified since provisioning would fail thedm-veritycheck and not boot. - Attestation report: a guest is intended to generate a signed statement β signed with the provisioning node's identity key β asserting that it is running an unmodified image. External auditors could verify this statement without contacting Intel, AMD, or any cloud provider.
This chain is shorter and more operator-controlled than TDX/SEV-SNP. The trade-off is intentional: the sovereignty model places the operator, not the silicon vendor, at the root of trust.
[edit]Comparison with major cloud provider capabilities
The table below describes what major cloud providers have deployed today and what the PPN distributed fabric is intended to provide. Competitor capabilities in the first column are present-tense factual statements. PPN items use planned/intended language.
| Capability | AWS / Azure / GCP (today) | PPN distributed fabric (planned) |
|---|---|---|
| Per-VM memory isolation | TDX (Azure GA Nov 2025), SEV-SNP (AMD EPYC 9005); hypervisor cannot read guest RAM in hardware | seL4 formal proof: machine-checked Isabelle/HOL invariant that hypervisor has zero read capability over VM state |
| Cross-node memory sharing | CXL 3.0 (PCIe fabric, data-centre internal; not exposed to tenants as a programmable API) | virtio-mem lending over WireGuard (intended to work over any network, including internet) |
| Capability revocation | IAM policies; centrally propagated; typically 10β60 seconds | Merkle DAG peer-to-peer gossip; intended sub-second revocation; no central authority |
| Attestation root | Silicon vendor CA (Intel Trust Authority / AMD Key Management Service) | Operator-witnessed pairing ceremony; operator intended as root of trust |
| Formal isolation proof | None in production hypervisors at any major cloud provider | seL4 Isabelle/HOL functional correctness and information-flow security proof (in vendor-sel4-kernel today) |
| SMB deployment time | Hours: cloud console, IAM, VPC networking configuration | Intended: under five minutes, two questions, short-code pairing ceremony |
| Operator sovereignty | None: cloud provider controls physical hardware and hypervisor | Full: operator owns the substrate; hypervisor cannot read VM state; cloud provider not in the trust chain |
[edit]Build sequence and reserved directories
The per-node layer is the foundation. The distributed fabric is intended to build on top of it in order:
| Step | State | Reserved directory |
|---|---|---|
| Per-node virtio_balloon + cgroups v2 | Complete β proven 2026-05-28 | os-infrastructure/ |
Ceremony deployment (service-ppn-pairing on GCP VM) |
Pending | service-ppn-pairing/ |
First real node join (os-network-admin on Laptop A) |
Pending | os-network-admin/ |
Genesis Protocol os-infrastructure rewrite |
Planned | os-infrastructure/, moonshot-hypervisor/ |
| Automated balloon controller | Planned | os-infrastructure/ |
| virtio-mem lending daemon | Planned | moonshot-network/ |
| Distributed capability ledger | Planned | moonshot-protocol/, moonshot-database/ |
| Cross-node VM scheduler | Planned | os-orchestration/ |
| Sovereign attestation chain (dm-verity + ceremony key) | Planned | os-infrastructure/, moonshot-kernel/ |
[edit]See also
- ppn-architecture-overview β four-layer PPN overview; the distributed fabric is the planned extension of the hypervisor layer
- ppn-hypervisor-resource-pool β the implemented per-node pool: virtio_balloon, cgroups v2, balloon controller milestone
- sovereign-mesh β the WireGuard transport the distributed fabric runs over
- genesis-protocol β the first-boot ceremony; the intended attestation root for the distributed fabric
- infrastructure-os β the Type I hypervisor; home of the balloon controller and intended virtio-mem lending daemon
- os-orchestration β the intended home for the cross-node VM scheduler
- PointSav Private Network β infrastructure overview; the distributed VM fabric extends this substrate