Skip to content

Diff: services/yoyo-daily-enrichment-cycle.es

From 1c02ec1 to 1c02ec1

+0 / −0 lines
BeforeAfter
--- ---
schema: foundry-doc-v1 schema: foundry-doc-v1
title: "Yo-Yo Daily Enrichment Cycle" title: "Yo-Yo Daily Enrichment Cycle"
slug: yoyo-daily-enrichment-cycle slug: yoyo-daily-enrichment-cycle
category: services category: services
type: topic type: topic
content_type: topic content_type: topic
status: stable status: stable
bcsc_class: no-disclosure-implication bcsc_class: no-disclosure-implication
last_edited: 2026-06-11 last_edited: 2026-06-11
editor: pointsav-engineering editor: pointsav-engineering
paired_with: yoyo-daily-enrichment-cycle.es.md paired_with: yoyo-daily-enrichment-cycle.es.md
--- ---
The Yo-Yo daily enrichment cycle is the automated batch window that runs a GPU-accelerated The Yo-Yo daily enrichment cycle is the automated batch window that runs a GPU-accelerated
inference VM once per day to enrich the DataGraph and accumulate training data for the inference VM once per day to enrich the DataGraph and accumulate training data for the
local language model. The cycle runs at a fixed time, enforces a hard cost cap, and local language model. The cycle runs at a fixed time, enforces a hard cost cap, and
terminates the VM whether the work finishes early or reaches the cap. terminates the VM whether the work finishes early or reaches the cap.
## Purpose ## Purpose
The workspace VM runs a 7-billion-parameter language model (OLMo 2 7B) on CPU for The workspace VM runs a 7-billion-parameter language model (OLMo 2 7B) on CPU for
interactive use. This model performs adequately for short prompts but extracts entities interactive use. This model performs adequately for short prompts but extracts entities
from documents with lower accuracy than a larger GPU-resident model. The daily cycle from documents with lower accuracy than a larger GPU-resident model. The daily cycle
addresses this gap by starting a separate GPU VM — the Yo-Yo batch node — that loads addresses this gap by starting a separate GPU VM — the Yo-Yo batch node — that loads
a 32-billion-parameter model and processes a queue of documents that accumulated during a 32-billion-parameter model and processes a queue of documents that accumulated during
the day. the day.
The products of each cycle are: The products of each cycle are:
- Additional named entities added to the DataGraph (graph store) - Additional named entities added to the DataGraph (graph store)
- Direct Preference Optimisation (DPO) training pairs written to the enrichment corpus - Direct Preference Optimisation (DPO) training pairs written to the enrichment corpus
Each DPO pair records what the 32B model extracted as the preferred output and what the Each DPO pair records what the 32B model extracted as the preferred output and what the
7B model extracted as the baseline, enabling the 7B model to be fine-tuned toward the 7B model extracted as the baseline, enabling the 7B model to be fine-tuned toward the
larger model's extraction quality over successive training runs. larger model's extraction quality over successive training runs.
## The eight phases ## The eight phases
The cycle is a single Bash script (`yoyo-daily-cycle.sh`) that executes eight sequential The cycle is a single Bash script (`yoyo-daily-cycle.sh`) that executes eight sequential
phases. The script writes a timestamped log file for each run. phases. The script writes a timestamped log file for each run.
**Phase 1 — VM start.** If the batch VM is not already running, a `gcloud instances start` **Phase 1 — VM start.** If the batch VM is not already running, a `gcloud instances start`
command is issued. The VM boots from a persistent disk that retains the model weights and command is issued. The VM boots from a persistent disk that retains the model weights and
the inference server configuration from the previous cycle. the inference server configuration from the previous cycle.
**Phase 2 — Inference server health.** The script polls the llama-server health endpoint **Phase 2 — Inference server health.** The script polls the llama-server health endpoint
(`/health`) at ten-second intervals until it returns `{"status":"ok"}`. Startup consistently (`/health`) at ten-second intervals until it returns `{"status":"ok"}`. Startup consistently
takes approximately 170 seconds from power-on to first healthy response. If the server takes approximately 170 seconds from power-on to first healthy response. If the server
does not respond within ten minutes, the cycle aborts and stops the VM. does not respond within ten minutes, the cycle aborts and stops the VM.
**Phase 3 — Tier B circuit.** The local inference gateway maintains a circuit breaker for **Phase 3 — Tier B circuit.** The local inference gateway maintains a circuit breaker for
the Yo-Yo node. The script waits up to two minutes for the circuit to close, confirming the Yo-Yo node. The script waits up to two minutes for the circuit to close, confirming
the gateway has registered the VM as reachable. If the circuit does not close, the cycle the gateway has registered the VM as reachable. If the circuit does not close, the cycle
continues with a Tier A fallback warning logged. continues with a Tier A fallback warning logged.
**Phase 4 — Enrichment drain.** For 40 percent of the cycle budget (18 minutes at the **Phase 4 — Enrichment drain.** For 40 percent of the cycle budget (18 minutes at the
45-minute cap), the script waits while the gateway processes the pending enrichment queue. 45-minute cap), the script waits while the gateway processes the pending enrichment queue.
During this window, the content service sends document chunks to the Yo-Yo node for During this window, the content service sends document chunks to the Yo-Yo node for
entity extraction and writes DPO pairs to the enrichment corpus. Progress is logged every entity extraction and writes DPO pairs to the enrichment corpus. Progress is logged every
60 seconds with entity counts, enrichment pair counts, GPU utilisation, and VRAM usage. 60 seconds with entity counts, enrichment pair counts, GPU utilisation, and VRAM usage.
**Phase 5 — Corpus threshold check.** After enrichment, `corpus-threshold.py` runs to **Phase 5 — Corpus threshold check.** After enrichment, `corpus-threshold.py` runs to
count accumulated training-ready data. If counts exceed the configured threshold, the count accumulated training-ready data. If counts exceed the configured threshold, the
script writes dated training marker files to `data/training-pending/`. These markers script writes dated training marker files to `data/training-pending/`. These markers
are the input to Phase 6. are the input to Phase 6.
**Phase 6 — LoRA training trigger.** Three gates must all pass for training to run: **Phase 6 — LoRA training trigger.** Three gates must all pass for training to run:
training markers must be present, the ML libraries must be installed in the training training markers must be present, the ML libraries must be installed in the training
virtual environment on the batch VM, and an operator-authored approval tag must exist virtual environment on the batch VM, and an operator-authored approval tag must exist
for the current date. If all three pass, the script stops the inference server to free for the current date. If all three pass, the script stops the inference server to free
approximately 16 gigabytes of VRAM, then invokes `run-dpo-training.py` over SSH with a approximately 16 gigabytes of VRAM, then invokes `run-dpo-training.py` over SSH with a
45-percent budget (20 minutes at the 45-minute cap). The `--resume` flag accumulates 45-percent budget (20 minutes at the 45-minute cap). The `--resume` flag accumulates
daily checkpoints so each run extends the previous day's training rather than starting daily checkpoints so each run extends the previous day's training rather than starting
from scratch. from scratch.
**Phase 7 — GCS sync.** If the `SLM_YOYO_WEIGHTS_GCS_BUCKET` environment variable is **Phase 7 — GCS sync.** If the `SLM_YOYO_WEIGHTS_GCS_BUCKET` environment variable is
set and training markers are present, the enrichment corpus is synchronised to the set and training markers are present, the enrichment corpus is synchronised to the
configured Cloud Storage bucket. This step is currently disabled pending a future session configured Cloud Storage bucket. This step is currently disabled pending a future session
that configures the bucket. that configures the bucket.
**Phase 8 — Hard stop.** The inference server is stopped via SSH, the VM is stopped via **Phase 8 — Hard stop.** The inference server is stopped via SSH, the VM is stopped via
`gcloud instances stop`, and the script waits up to three minutes for the VM to reach `gcloud instances stop`, and the script waits up to three minutes for the VM to reach
`TERMINATED` status. A summary line records total elapsed time, entity delta, DPO pair `TERMINATED` status. A summary line records total elapsed time, entity delta, DPO pair
delta, and VM final status. delta, and VM final status.
## Budget and cost ## Budget and cost
The daily cycle operates under a 45-minute hard cap. The VM is stopped unconditionally The daily cycle operates under a 45-minute hard cap. The VM is stopped unconditionally
at the end of Phase 8 regardless of whether phases completed normally. at the end of Phase 8 regardless of whether phases completed normally.
| Item | Value | | Item | Value |
|---|---| |---|---|
| VM type | g2-standard-4 with NVIDIA L4 24 GB | | VM type | g2-standard-4 with NVIDIA L4 24 GB |
| Zone | us-central1-a | | Zone | us-central1-a |
| Running cost | approximately $0.71 per hour | | Running cost | approximately $0.71 per hour |
| Cycle cost at 45-minute cap | approximately $0.53 per cycle | | Cycle cost at 45-minute cap | approximately $0.53 per cycle |
| TERMINATED cost | $0.00 | | TERMINATED cost | $0.00 |
| Monthly cost (daily cycles) | approximately $16 per month | | Monthly cost (daily cycles) | approximately $16 per month |
A kill switch file (`/srv/foundry/data/yoyo-disabled`) suppresses all VM lifecycle A kill switch file (`/srv/foundry/data/yoyo-disabled`) suppresses all VM lifecycle
operations immediately. Creating the file prevents Phase 1 from issuing a start command. operations immediately. Creating the file prevents Phase 1 from issuing a start command.
Removing the file resumes normal operation on the next scheduled cycle. Removing the file resumes normal operation on the next scheduled cycle.
An idle monitor timer checks every five minutes whether the VM has been running idle for An idle monitor timer checks every five minutes whether the VM has been running idle for
more than 30 minutes. If the daily cycle fails to stop the VM, the idle monitor will stop more than 30 minutes. If the daily cycle fails to stop the VM, the idle monitor will stop
it as a safety backstop, preventing uncapped cost accumulation. it as a safety backstop, preventing uncapped cost accumulation.
## DPO pair format ## DPO pair format
Each enrichment DPO pair is a JSON file written to the feedback directory. The format Each enrichment DPO pair is a JSON file written to the feedback directory. The format
is compatible with the TRL DPOTrainer: is compatible with the TRL DPOTrainer:
```json ```json
{ {
"prompt": "<document chunk text>", "prompt": "<document chunk text>",
"chosen": "[{\"classification\":\"Person\",\"entity_name\":\"...\"}]", "chosen": "[{\"classification\":\"Person\",\"entity_name\":\"...\"}]",
"rejected": "[{\"classification\":\"Person\",\"entity_name\":\"...\"}]", "rejected": "[{\"classification\":\"Person\",\"entity_name\":\"...\"}]",
"source_type": "datagraph-enrichment", "source_type": "datagraph-enrichment",
"worm_id": "<document identifier>", "worm_id": "<document identifier>",
"timestamp": "<ISO 8601>" "timestamp": "<ISO 8601>"
} }
``` ```
`chosen` is the 32B model's extraction. `rejected` is the 7B model's extraction. A pair `chosen` is the 32B model's extraction. `rejected` is the 7B model's extraction. A pair
is only written when both models found at least one entity and the results differ after is only written when both models found at least one entity and the results differ after
normalisation. Pairs where the 7B model found nothing are discarded — they contain no normalisation. Pairs where the 7B model found nothing are discarded — they contain no
genuine preference signal. genuine preference signal.
## Verified test results (2026-06-09) ## Verified test results (2026-06-09)
Three 10-minute test cycles confirmed the pipeline operates correctly end-to-end. Three 10-minute test cycles confirmed the pipeline operates correctly end-to-end.
| Cycle | Duration | Entity delta | DPO pairs added | VM final status | | Cycle | Duration | Entity delta | DPO pairs added | VM final status |
|---|---|---|---|---| |---|---|---|---|---|
| 1 | 10 min 43 s | +7 | +6 | TERMINATED | | 1 | 10 min 43 s | +7 | +6 | TERMINATED |
| 2 | 9 min 12 s | +8 | +4 | TERMINATED | | 2 | 9 min 12 s | +8 | +4 | TERMINATED |
| 3 | 10 min 38 s | +22 | +8 | TERMINATED | | 3 | 10 min 38 s | +22 | +8 | TERMINATED |
GPU diagnostics in cycle 3: 99% utilisation, 16,151 of 23,034 MB VRAM in use, 73°C. GPU diagnostics in cycle 3: 99% utilisation, 16,151 of 23,034 MB VRAM in use, 73°C.