Diff: services/yoyo-daily-enrichment-cycle
From 3f1e0da to 3f1e0da
+0 / −0 lines
| Before | After |
|---|---|
| --- | --- |
| schema: foundry-doc-v1 | schema: foundry-doc-v1 |
| title: "Yo-Yo Daily Enrichment Cycle" | title: "Yo-Yo Daily Enrichment Cycle" |
| slug: yoyo-daily-enrichment-cycle | slug: yoyo-daily-enrichment-cycle |
| category: services | category: services |
| type: topic | type: topic |
| status: stable | status: stable |
| bcsc_class: no-disclosure-implication | bcsc_class: no-disclosure-implication |
| last_edited: 2026-06-11 | last_edited: 2026-06-11 |
| editor: pointsav-engineering | editor: pointsav-engineering |
| paired_with: yoyo-daily-enrichment-cycle.es.md | paired_with: yoyo-daily-enrichment-cycle.es.md |
| --- | --- |
| The Yo-Yo daily enrichment cycle is the automated batch window that runs a GPU-accelerated | The Yo-Yo daily enrichment cycle is the automated batch window that runs a GPU-accelerated |
| inference VM once per day to enrich the DataGraph and accumulate training data for the | inference VM once per day to enrich the DataGraph and accumulate training data for the |
| local language model. The cycle runs at a fixed time, enforces a hard cost cap, and | local language model. The cycle runs at a fixed time, enforces a hard cost cap, and |
| terminates the VM whether the work finishes early or reaches the cap. | terminates the VM whether the work finishes early or reaches the cap. |
| ## Purpose | ## Purpose |
| The workspace VM runs a 7-billion-parameter language model (OLMo 2 7B) on CPU for | The workspace VM runs a 7-billion-parameter language model (OLMo 2 7B) on CPU for |
| interactive use. This model performs adequately for short prompts but extracts entities | interactive use. This model performs adequately for short prompts but extracts entities |
| from documents with lower accuracy than a larger GPU-resident model. The daily cycle | from documents with lower accuracy than a larger GPU-resident model. The daily cycle |
| addresses this gap by starting a separate GPU VM — the Yo-Yo batch node — that loads | addresses this gap by starting a separate GPU VM — the Yo-Yo batch node — that loads |
| a 32-billion-parameter model and processes a queue of documents that accumulated during | a 32-billion-parameter model and processes a queue of documents that accumulated during |
| the day. | the day. |
| The products of each cycle are: | The products of each cycle are: |
| - Additional named entities added to the DataGraph (graph store) | - Additional named entities added to the DataGraph (graph store) |
| - Direct Preference Optimisation (DPO) training pairs written to the enrichment corpus | - Direct Preference Optimisation (DPO) training pairs written to the enrichment corpus |
| Each DPO pair records what the 32B model extracted as the preferred output and what the | Each DPO pair records what the 32B model extracted as the preferred output and what the |
| 7B model extracted as the baseline, enabling the 7B model to be fine-tuned toward the | 7B model extracted as the baseline, enabling the 7B model to be fine-tuned toward the |
| larger model's extraction quality over successive training runs. | larger model's extraction quality over successive training runs. |
| ## The eight phases | ## The eight phases |
| The cycle is a single Bash script (`yoyo-daily-cycle.sh`) that executes eight sequential | The cycle is a single Bash script (`yoyo-daily-cycle.sh`) that executes eight sequential |
| phases. The script writes a timestamped log file for each run. | phases. The script writes a timestamped log file for each run. |
| **Phase 1 — VM start.** If the batch VM is not already running, a `gcloud instances start` | **Phase 1 — VM start.** If the batch VM is not already running, a `gcloud instances start` |
| command is issued. The VM boots from a persistent disk that retains the model weights and | command is issued. The VM boots from a persistent disk that retains the model weights and |
| the inference server configuration from the previous cycle. | the inference server configuration from the previous cycle. |
| **Phase 2 — Inference server health.** The script polls the llama-server health endpoint | **Phase 2 — Inference server health.** The script polls the llama-server health endpoint |
| (`/health`) at ten-second intervals until it returns `{"status":"ok"}`. Startup consistently | (`/health`) at ten-second intervals until it returns `{"status":"ok"}`. Startup consistently |
| takes approximately 170 seconds from power-on to first healthy response. If the server | takes approximately 170 seconds from power-on to first healthy response. If the server |
| does not respond within ten minutes, the cycle aborts and stops the VM. | does not respond within ten minutes, the cycle aborts and stops the VM. |
| **Phase 3 — Tier B circuit.** The local inference gateway maintains a circuit breaker for | **Phase 3 — Tier B circuit.** The local inference gateway maintains a circuit breaker for |
| the Yo-Yo node. The script waits up to two minutes for the circuit to close, confirming | the Yo-Yo node. The script waits up to two minutes for the circuit to close, confirming |
| the gateway has registered the VM as reachable. If the circuit does not close, the cycle | the gateway has registered the VM as reachable. If the circuit does not close, the cycle |
| continues with a Tier A fallback warning logged. | continues with a Tier A fallback warning logged. |
| **Phase 4 — Enrichment drain.** For 40 percent of the cycle budget (18 minutes at the | **Phase 4 — Enrichment drain.** For 40 percent of the cycle budget (18 minutes at the |
| 45-minute cap), the script waits while the gateway processes the pending enrichment queue. | 45-minute cap), the script waits while the gateway processes the pending enrichment queue. |
| During this window, the content service sends document chunks to the Yo-Yo node for | During this window, the content service sends document chunks to the Yo-Yo node for |
| entity extraction and writes DPO pairs to the enrichment corpus. Progress is logged every | entity extraction and writes DPO pairs to the enrichment corpus. Progress is logged every |
| 60 seconds with entity counts, enrichment pair counts, GPU utilisation, and VRAM usage. | 60 seconds with entity counts, enrichment pair counts, GPU utilisation, and VRAM usage. |
| **Phase 5 — Corpus threshold check.** After enrichment, `corpus-threshold.py` runs to | **Phase 5 — Corpus threshold check.** After enrichment, `corpus-threshold.py` runs to |
| count accumulated training-ready data. If counts exceed the configured threshold, the | count accumulated training-ready data. If counts exceed the configured threshold, the |
| script writes dated training marker files to `data/training-pending/`. These markers | script writes dated training marker files to `data/training-pending/`. These markers |
| are the input to Phase 6. | are the input to Phase 6. |
| **Phase 6 — LoRA training trigger.** Three gates must all pass for training to run: | **Phase 6 — LoRA training trigger.** Three gates must all pass for training to run: |
| training markers must be present, the ML libraries must be installed in the training | training markers must be present, the ML libraries must be installed in the training |
| virtual environment on the batch VM, and an operator-authored approval tag must exist | virtual environment on the batch VM, and an operator-authored approval tag must exist |
| for the current date. If all three pass, the script stops the inference server to free | for the current date. If all three pass, the script stops the inference server to free |
| approximately 16 gigabytes of VRAM, then invokes `run-dpo-training.py` over SSH with a | approximately 16 gigabytes of VRAM, then invokes `run-dpo-training.py` over SSH with a |
| 45-percent budget (20 minutes at the 45-minute cap). The `--resume` flag accumulates | 45-percent budget (20 minutes at the 45-minute cap). The `--resume` flag accumulates |
| daily checkpoints so each run extends the previous day's training rather than starting | daily checkpoints so each run extends the previous day's training rather than starting |
| from scratch. | from scratch. |
| **Phase 7 — GCS sync.** If the `SLM_YOYO_WEIGHTS_GCS_BUCKET` environment variable is | **Phase 7 — GCS sync.** If the `SLM_YOYO_WEIGHTS_GCS_BUCKET` environment variable is |
| set and training markers are present, the enrichment corpus is synchronised to the | set and training markers are present, the enrichment corpus is synchronised to the |
| configured Cloud Storage bucket. This step is currently disabled pending a future session | configured Cloud Storage bucket. This step is currently disabled pending a future session |
| that configures the bucket. | that configures the bucket. |
| **Phase 8 — Hard stop.** The inference server is stopped via SSH, the VM is stopped via | **Phase 8 — Hard stop.** The inference server is stopped via SSH, the VM is stopped via |
| `gcloud instances stop`, and the script waits up to three minutes for the VM to reach | `gcloud instances stop`, and the script waits up to three minutes for the VM to reach |
| `TERMINATED` status. A summary line records total elapsed time, entity delta, DPO pair | `TERMINATED` status. A summary line records total elapsed time, entity delta, DPO pair |
| delta, and VM final status. | delta, and VM final status. |
| ## Budget and cost | ## Budget and cost |
| The daily cycle operates under a 45-minute hard cap. The VM is stopped unconditionally | The daily cycle operates under a 45-minute hard cap. The VM is stopped unconditionally |
| at the end of Phase 8 regardless of whether phases completed normally. | at the end of Phase 8 regardless of whether phases completed normally. |
| | Item | Value | | | Item | Value | |
| |---|---| | |---|---| |
| | VM type | g2-standard-4 with NVIDIA L4 24 GB | | | VM type | g2-standard-4 with NVIDIA L4 24 GB | |
| | Zone | us-central1-a | | | Zone | us-central1-a | |
| | Running cost | approximately $0.71 per hour | | | Running cost | approximately $0.71 per hour | |
| | Cycle cost at 45-minute cap | approximately $0.53 per cycle | | | Cycle cost at 45-minute cap | approximately $0.53 per cycle | |
| | TERMINATED cost | $0.00 | | | TERMINATED cost | $0.00 | |
| | Monthly cost (daily cycles) | approximately $16 per month | | | Monthly cost (daily cycles) | approximately $16 per month | |
| A kill switch file (`/srv/foundry/data/yoyo-disabled`) suppresses all VM lifecycle | A kill switch file (`/srv/foundry/data/yoyo-disabled`) suppresses all VM lifecycle |
| operations immediately. Creating the file prevents Phase 1 from issuing a start command. | operations immediately. Creating the file prevents Phase 1 from issuing a start command. |
| Removing the file resumes normal operation on the next scheduled cycle. | Removing the file resumes normal operation on the next scheduled cycle. |
| An idle monitor timer checks every five minutes whether the VM has been running idle for | An idle monitor timer checks every five minutes whether the VM has been running idle for |
| more than 30 minutes. If the daily cycle fails to stop the VM, the idle monitor will stop | more than 30 minutes. If the daily cycle fails to stop the VM, the idle monitor will stop |
| it as a safety backstop, preventing uncapped cost accumulation. | it as a safety backstop, preventing uncapped cost accumulation. |
| ## DPO pair format | ## DPO pair format |
| Each enrichment DPO pair is a JSON file written to the feedback directory. The format | Each enrichment DPO pair is a JSON file written to the feedback directory. The format |
| is compatible with the TRL DPOTrainer: | is compatible with the TRL DPOTrainer: |
| ```json | ```json |
| { | { |
| "prompt": "<document chunk text>", | "prompt": "<document chunk text>", |
| "chosen": "[{\"classification\":\"Person\",\"entity_name\":\"...\"}]", | "chosen": "[{\"classification\":\"Person\",\"entity_name\":\"...\"}]", |
| "rejected": "[{\"classification\":\"Person\",\"entity_name\":\"...\"}]", | "rejected": "[{\"classification\":\"Person\",\"entity_name\":\"...\"}]", |
| "source_type": "datagraph-enrichment", | "source_type": "datagraph-enrichment", |
| "worm_id": "<document identifier>", | "worm_id": "<document identifier>", |
| "timestamp": "<ISO 8601>" | "timestamp": "<ISO 8601>" |
| } | } |
| ``` | ``` |
| `chosen` is the 32B model's extraction. `rejected` is the 7B model's extraction. A pair | `chosen` is the 32B model's extraction. `rejected` is the 7B model's extraction. A pair |
| is only written when both models found at least one entity and the results differ after | is only written when both models found at least one entity and the results differ after |
| normalisation. Pairs where the 7B model found nothing are discarded — they contain no | normalisation. Pairs where the 7B model found nothing are discarded — they contain no |
| genuine preference signal. | genuine preference signal. |
| ## Verified test results (2026-06-09) | ## Verified test results (2026-06-09) |
| Three 10-minute test cycles confirmed the pipeline operates correctly end-to-end. | Three 10-minute test cycles confirmed the pipeline operates correctly end-to-end. |
| | Cycle | Duration | Entity delta | DPO pairs added | VM final status | | | Cycle | Duration | Entity delta | DPO pairs added | VM final status | |
| |---|---|---|---|---| | |---|---|---|---|---| |
| | 1 | 10 min 43 s | +7 | +6 | TERMINATED | | | 1 | 10 min 43 s | +7 | +6 | TERMINATED | |
| | 2 | 9 min 12 s | +8 | +4 | TERMINATED | | | 2 | 9 min 12 s | +8 | +4 | TERMINATED | |
| | 3 | 10 min 38 s | +22 | +8 | TERMINATED | | | 3 | 10 min 38 s | +22 | +8 | TERMINATED | |
| GPU diagnostics in cycle 3: 99% utilisation, 16,151 of 23,034 MB VRAM in use, 73°C. | GPU diagnostics in cycle 3: 99% utilisation, 16,151 of 23,034 MB VRAM in use, 73°C. |