D4 — post-deploy flow (template manifest + scp + setup-binaries dispatch + verify) #6

Open
opened 2026-05-20 21:42:10 +00:00 by mik-tf · 0 comments
Owner

D4 — post-deploy flow (template manifest + scp + setup-binaries dispatch + verify)

Sub-issue of #? (v0.1 scope). The bridge between "VM exists" and "user has a working Hero OS environment".

What this does

After D3's VmBackend::deploy_vm returns a reachable VM, the deployer:

  1. Templates the per-user component manifest at ~driver/hero/cfg/cockpit/services.toml on the VM, format defined in hero_cockpit#1 §6. Profile selected by admin from the form (default = demo).

  2. scp's setup-binaries.sh from the deployer's local checkout of hero_demo/deploy/single-vm/scripts/setup-binaries.sh + the manifest + the d07_set.txt to /root/ on the VM.

  3. Runs the bootstrap via SSH:

    ssh -i <user_ssh_key> root@<mycelium_ip> \
        FORGE_TOKEN="<user_forge_token>" \
        bash /root/setup-binaries.sh
    

    Streams stdout back to deployer; admin UI shows progress as it runs (~5-30 min depending on per-user profile + cache state).

  4. Verifies by curl'ing https://<user>.<node>.grid.tf/health (via the deployed gateway) — expects HTTP 200 once hero_proxy + hero_router + cockpit_web are up.

  5. Updates sqlite — vm.state = 'ready', stores gateway_fqdn, provisioned_at; logs event 'bootstrap_run'.

SSH vs vm_exec

Initial implementation uses SSH (proven). Once Mahmoud confirms ComputeService.vm_exec streams stdout cleanly for long-running scripts (G2 in hero_compute#? (deployer integration)), we add an alternate path. The trait surface is the same — only the transport differs.

Per-user component manifest templating

The deployer holds default profiles (per cockpit spec §6):

  • demo — proxy + router + proc + cockpit + embedder-small + db + books
  • lightweight — proxy + router + proc + cockpit + books
  • books-only — proxy + router + proc + cockpit + books (same as lightweight; alias for now)
  • custom — proxy + router + proc + cockpit (always-on only; user adds via cockpit later)

Templates live at crates/hero_os_tfgrid_deployer_server/templates/profiles/<name>.toml. At deploy time:

let manifest = profile_template(&profile)
    .with_user_metadata(user, vm.id)
    .with_byok_unset()
    .render();
upload_file(&ssh, &manifest, "/home/driver/hero/cfg/cockpit/services.toml")?;

The cockpit on the VM reads + edits this file from there on.

Setup-binaries.sh dependency

This work needs hero_demo/setup-binaries.sh to be manifest-aware — currently (s132) it loops over a fixed d07_set.txt. A sibling issue / hero_demo PR refactors setup-binaries.sh to read the per-user manifest's [enabled] table and lab build $name --download --install only for entries where enabled = true. Tracked separately (see references).

Acceptance criteria

  • Given a fresh VM from D3, a username, and a profile, the post-deploy flow completes in ≤ 30 min
  • The user can curl https://<user>.<node>.grid.tf/health and get HTTP 200
  • Admin UI shows the bootstrap progress live (SSE-streamed from deployer_server)
  • VM record in sqlite has state='ready', gateway_fqdn populated, provisioned_at set
  • Re-running on a partially-bootstrapped VM is idempotent (skips already-installed binaries via lab's content-hash skip logic)

References

## D4 — post-deploy flow (template manifest + scp + setup-binaries dispatch + verify) Sub-issue of [`#?` (v0.1 scope)](https://forge.ourworld.tf/lhumina_code/hero_os_tfgrid_deployer/issues/2). The bridge between "VM exists" and "user has a working Hero OS environment". ## What this does After [D3](https://forge.ourworld.tf/lhumina_code/hero_os_tfgrid_deployer/issues/2)'s `VmBackend::deploy_vm` returns a reachable VM, the deployer: 1. **Templates the per-user component manifest** at `~driver/hero/cfg/cockpit/services.toml` on the VM, format defined in [`hero_cockpit#1` §6](https://forge.ourworld.tf/lhumina_code/hero_cockpit/issues/1). Profile selected by admin from the form (default = `demo`). 2. **scp's setup-binaries.sh** from the deployer's local checkout of [`hero_demo/deploy/single-vm/scripts/setup-binaries.sh`](https://forge.ourworld.tf/lhumina_code/hero_demo/src/branch/development/deploy/single-vm/scripts/setup-binaries.sh) + the manifest + the d07_set.txt to `/root/` on the VM. 3. **Runs the bootstrap** via SSH: ```bash ssh -i <user_ssh_key> root@<mycelium_ip> \ FORGE_TOKEN="<user_forge_token>" \ bash /root/setup-binaries.sh ``` Streams stdout back to deployer; admin UI shows progress as it runs (~5-30 min depending on per-user profile + cache state). 4. **Verifies** by curl'ing `https://<user>.<node>.grid.tf/health` (via the deployed gateway) — expects HTTP 200 once hero_proxy + hero_router + cockpit_web are up. 5. **Updates sqlite** — vm.state = 'ready', stores gateway_fqdn, provisioned_at; logs event 'bootstrap_run'. ## SSH vs vm_exec Initial implementation uses SSH (proven). Once Mahmoud confirms `ComputeService.vm_exec` streams stdout cleanly for long-running scripts (G2 in [`hero_compute#? (deployer integration)`](https://forge.ourworld.tf/lhumina_code/hero_compute/issues/116)), we add an alternate path. The trait surface is the same — only the transport differs. ## Per-user component manifest templating The deployer holds default profiles (per cockpit spec §6): - `demo` — proxy + router + proc + cockpit + embedder-small + db + books - `lightweight` — proxy + router + proc + cockpit + books - `books-only` — proxy + router + proc + cockpit + books (same as lightweight; alias for now) - `custom` — proxy + router + proc + cockpit (always-on only; user adds via cockpit later) Templates live at `crates/hero_os_tfgrid_deployer_server/templates/profiles/<name>.toml`. At deploy time: ```rust let manifest = profile_template(&profile) .with_user_metadata(user, vm.id) .with_byok_unset() .render(); upload_file(&ssh, &manifest, "/home/driver/hero/cfg/cockpit/services.toml")?; ``` The cockpit on the VM reads + edits this file from there on. ## Setup-binaries.sh dependency This work needs [`hero_demo/setup-binaries.sh`](https://forge.ourworld.tf/lhumina_code/hero_demo/src/branch/development/deploy/single-vm/scripts/setup-binaries.sh) to be **manifest-aware** — currently (s132) it loops over a fixed d07_set.txt. A sibling issue / hero_demo PR refactors setup-binaries.sh to read the per-user manifest's `[enabled]` table and `lab build $name --download --install` only for entries where `enabled = true`. Tracked separately (see references). ## Acceptance criteria - Given a fresh VM from D3, a username, and a profile, the post-deploy flow completes in ≤ 30 min - The user can curl `https://<user>.<node>.grid.tf/health` and get HTTP 200 - Admin UI shows the bootstrap progress live (SSE-streamed from deployer_server) - VM record in sqlite has state='ready', gateway_fqdn populated, provisioned_at set - Re-running on a partially-bootstrapped VM is idempotent (skips already-installed binaries via lab's content-hash skip logic) ## References - Cockpit manifest format: [`hero_cockpit#1` §6](https://forge.ourworld.tf/lhumina_code/hero_cockpit/issues/1) - s132 proof: [`hero_demo/deploy/single-vm/scripts/setup-binaries.sh`](https://forge.ourworld.tf/lhumina_code/hero_demo/src/branch/development/deploy/single-vm/scripts/setup-binaries.sh) - setup-binaries refactor for manifest awareness: see [`hero_demo` follow-up issue/PR](https://forge.ourworld.tf/lhumina_code/hero_demo/issues) (to file when D4 lands) - Umbrella: [`#?` (v0.1 scope)](https://forge.ourworld.tf/lhumina_code/hero_os_tfgrid_deployer/issues/2) - Skill: `/forge_api`
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_os_tfgrid_deployer#6
No description provided.