D3 — VM-deploy adapter (OpenTofu fallback + hero_compute primary) #5

Open
opened 2026-05-20 21:42:10 +00:00 by mik-tf · 0 comments
Owner

D3 — VM-deploy adapter (OpenTofu fallback + hero_compute primary)

Sub-issue of #? (v0.1 scope). Abstracts the VM lifecycle behind a trait so the deployer can speak to either backend transparently.

What this does

Today's s132 work proved the OpenTofu path end-to-end: make deploy ENV=herolab in hero_demo/deploy/single-vm/ provisions a fresh TFGrid VM in ~60s, captures mycelium_ip + gateway_fqdn, supports make destroy ENV=herolab for teardown.

Mahmoud's hero_compute exposes the same surface via OpenRPC (see hero_compute#? (deployer integration)).

We want both available behind a single Rust trait so the deployer doesn't depend on which one is wired up.

#[async_trait::async_trait]
pub trait VmBackend: Send + Sync {
    async fn deploy_vm(&self, spec: VmSpec) -> Result<VmHandle>;
    async fn get_vm(&self, vm_id: &str) -> Result<VmInfo>;
    async fn wait_vm_ready(&self, vm_id: &str, timeout: Duration) -> Result<()>;
    async fn delete_vm(&self, vm_id: &str) -> Result<()>;
    async fn deploy_webgateway(&self, vm_id: &str, fqdn: &str, port: u16) -> Result<GatewayHandle>;
    async fn delete_webgateway(&self, gateway_id: &str) -> Result<()>;
    async fn inject_ssh_key(&self, vm_id: &str, pubkey: &str) -> Result<()>;
}

pub struct OpenTofuBackend { /* shells out to `tofu` + reads tfstate */ }
pub struct HeroComputeBackend { /* hero_compute OpenRPC client */ }

Selection via config:

# ~/hero/cfg/deployer/config.toml
[vm_backend]
kind = "opentofu"   # or "hero_compute"
[vm_backend.opentofu]
hero_demo_path = "/home/driver/hero/code/hero_demo"
[vm_backend.hero_compute]
rpc_socket = "/home/driver/hero/var/sockets/hero_compute/rpc.sock"

Why both

  • OpenTofu is proven end-to-end today (s132). Falls back gracefully when hero_compute isn't running or its API is still stabilizing.
  • hero_compute is the long-term target. Once Mahmoud confirms deploy_vm + deploy_webgateway are production-stable on TFGrid mainnet, we flip the default config + retire OpenTofu eventually.

This decouples deployer-arc progress from hero_compute-arc progress. Neither blocks the other.

OpenTofuBackend implementation notes

  • Shells out to tofu -chdir=<hero_demo_path>/deploy/single-vm/tf apply with environment-templated tfvars
  • Parses tfstate JSON for outputs (mycelium_ip, gateway_fqdn, node_id) — same as the Makefile's tofu output -raw <name> mechanic, just done in Rust
  • For per-user VMs, generates a per-user env directory: envs/u_<user_id>/ with templated app.env + tfvars (overrides node_id + ssh_key + gateway_name based on user)
  • Teardown via tofu destroy

HeroComputeBackend implementation notes

  • Uses the auto-generated hero_compute_sdk for typed RPC calls (ComputeService.deploy_vm, etc.)
  • wait_vm_ready polls get_vm for mycelium_ip + open TCP on its rpc.sock dir (later: replaced by ComputeService.wait_vm_ready if Mahmoud adds it per G1 in the integration issue)
  • Tradeoff: hero_compute is reachable only over a local hero_proc socket. The deployer runs on a different machine than hero_compute today; needs hero_router-mediated discovery or a remote-hero_compute SSH-tunnel pattern. TBD with Mahmoud (Q4 in the integration issue).

Acceptance criteria

  • OpenTofuBackend produces an end-to-end VM identical to what s132's make deploy ENV=herolab produces, in under 90s
  • HeroComputeBackend produces an equivalent VM via Mahmoud's ComputeService.deploy_vm, in under 90s (or stays gated on hero_compute readiness — the test is conditional)
  • Backend swap is a config-only change — no deployer code changes
  • deployer.deploy_vm (the deployer's own RPC) returns identical state to its caller regardless of backend

References

## D3 — VM-deploy adapter (OpenTofu fallback + hero_compute primary) Sub-issue of [`#?` (v0.1 scope)](https://forge.ourworld.tf/lhumina_code/hero_os_tfgrid_deployer/issues/2). Abstracts the VM lifecycle behind a trait so the deployer can speak to either backend transparently. ## What this does Today's s132 work proved the **OpenTofu path** end-to-end: `make deploy ENV=herolab` in [`hero_demo/deploy/single-vm/`](https://forge.ourworld.tf/lhumina_code/hero_demo/src/branch/development/deploy/single-vm/) provisions a fresh TFGrid VM in ~60s, captures mycelium_ip + gateway_fqdn, supports `make destroy ENV=herolab` for teardown. Mahmoud's `hero_compute` exposes the same surface via OpenRPC (see [`hero_compute#? (deployer integration)`](https://forge.ourworld.tf/lhumina_code/hero_compute/issues/116)). We want both available behind a single Rust trait so the deployer doesn't depend on which one is wired up. ```rust #[async_trait::async_trait] pub trait VmBackend: Send + Sync { async fn deploy_vm(&self, spec: VmSpec) -> Result<VmHandle>; async fn get_vm(&self, vm_id: &str) -> Result<VmInfo>; async fn wait_vm_ready(&self, vm_id: &str, timeout: Duration) -> Result<()>; async fn delete_vm(&self, vm_id: &str) -> Result<()>; async fn deploy_webgateway(&self, vm_id: &str, fqdn: &str, port: u16) -> Result<GatewayHandle>; async fn delete_webgateway(&self, gateway_id: &str) -> Result<()>; async fn inject_ssh_key(&self, vm_id: &str, pubkey: &str) -> Result<()>; } pub struct OpenTofuBackend { /* shells out to `tofu` + reads tfstate */ } pub struct HeroComputeBackend { /* hero_compute OpenRPC client */ } ``` Selection via config: ```toml # ~/hero/cfg/deployer/config.toml [vm_backend] kind = "opentofu" # or "hero_compute" [vm_backend.opentofu] hero_demo_path = "/home/driver/hero/code/hero_demo" [vm_backend.hero_compute] rpc_socket = "/home/driver/hero/var/sockets/hero_compute/rpc.sock" ``` ## Why both - **OpenTofu** is proven end-to-end today (s132). Falls back gracefully when hero_compute isn't running or its API is still stabilizing. - **hero_compute** is the long-term target. Once Mahmoud confirms `deploy_vm` + `deploy_webgateway` are production-stable on TFGrid mainnet, we flip the default config + retire OpenTofu eventually. This decouples deployer-arc progress from hero_compute-arc progress. Neither blocks the other. ## OpenTofuBackend implementation notes - Shells out to `tofu -chdir=<hero_demo_path>/deploy/single-vm/tf apply` with environment-templated tfvars - Parses tfstate JSON for outputs (mycelium_ip, gateway_fqdn, node_id) — same as the Makefile's `tofu output -raw <name>` mechanic, just done in Rust - For per-user VMs, generates a per-user env directory: `envs/u_<user_id>/` with templated app.env + tfvars (overrides node_id + ssh_key + gateway_name based on user) - Teardown via `tofu destroy` ## HeroComputeBackend implementation notes - Uses the auto-generated `hero_compute_sdk` for typed RPC calls (`ComputeService.deploy_vm`, etc.) - `wait_vm_ready` polls `get_vm` for mycelium_ip + open TCP on its rpc.sock dir (later: replaced by `ComputeService.wait_vm_ready` if Mahmoud adds it per G1 in the integration issue) - Tradeoff: hero_compute is reachable only over a local hero_proc socket. The deployer runs on a different machine than hero_compute today; needs hero_router-mediated discovery or a remote-hero_compute SSH-tunnel pattern. TBD with Mahmoud (Q4 in the integration issue). ## Acceptance criteria - `OpenTofuBackend` produces an end-to-end VM identical to what s132's `make deploy ENV=herolab` produces, in under 90s - `HeroComputeBackend` produces an equivalent VM via Mahmoud's `ComputeService.deploy_vm`, in under 90s (or stays gated on hero_compute readiness — the test is conditional) - Backend swap is a config-only change — no deployer code changes - `deployer.deploy_vm` (the deployer's own RPC) returns identical state to its caller regardless of backend ## References - s132 OpenTofu proof: [`hero_demo/deploy/single-vm/`](https://forge.ourworld.tf/lhumina_code/hero_demo/src/branch/development/deploy/single-vm/) - hero_compute methods: `crates/my_compute_zos_server/src/cloud/openrpc.json` in [hero_compute](https://forge.ourworld.tf/lhumina_code/hero_compute) - Coord: [`hero_compute#? (deployer integration)`](https://forge.ourworld.tf/lhumina_code/hero_compute/issues/116) - Umbrella: [`#?` (v0.1 scope)](https://forge.ourworld.tf/lhumina_code/hero_os_tfgrid_deployer/issues/2)
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_os_tfgrid_deployer#5
No description provided.