hero_os_tfgrid_deployer integration: methods we'll consume + small gaps #116
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_compute#116
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
hero_os_tfgrid_deployer integration: methods we'll consume + small gaps
The new admin tool
hero_os_tfgrid_deployer(scope under discussion athero_os_tfgrid_deployer#1) will consumeComputeServiceOpenRPC (currently incrates/my_compute_zos_server/src/cloud/openrpc.json) as its only VM-lifecycle backend.Reviewed the spec — most of what we need is already there. Filing this issue to (a) confirm intended usage so we don't drift, and (b) surface a few small gaps that would make the deployer's flow easier.
Methods the deployer will call
For each demo user we provision:
ComputeService.inject_ssh_keys— deployer generates a per-user ED25519 key, registers the public half via this method, retains the private half in its sqlite for SSH-back-in.ComputeService.deploy_vmwith spec{ cpu: 16, memory: 8 GB, disk: 200 GB, rootfs: 16 GB, flist: "ubuntu-24.04-latest", publicip: false, node_id: <pinned> }. Today's s132 work proves this spec is sufficient (16 CPU is overcommit for an 8 GB VM but matches what's in flight via the OpenTofu path).ComputeService.get_vmfor mycelium_ip to appear + open.ComputeService.deploy_webgatewaymapping<user>.<node>.grid.tf→http://<vm_ip>:9988(where hero_router listens).hero_demo/deploy/single-vm/scripts/setup-binaries.sh). Alternative: pipe throughComputeService.vm_execif it handles long-running scripts cleanly — see Gap 2 below.ComputeService.list_vms/get_vm/vm_statsfor the admin UI's per-user state view.Confirmation questions (low-cost — flag any "yes" / "no" / "TBD")
deploy_vmready for production use on TFGrid mainnet? (s132 used OpenTofu directly against TFGrid — works. Want to swap to this once it's stable for our flow.)deploy_vmreturn synchronously after the VM is fully reachable (SSH-able), or does it return early and require pollingget_vm? Documentation in the OpenRPCsummaryfield would resolve this for any caller.{ user: "<forge_id>", profile: "demo", provisioned_at: ... }per-VM so the admin UI can join VMs back to users without round-tripping its own sqlite.inject_ssh_keys— is this called pre- or post-deploy_vm? Order matters for our deployer flow.Small gaps (what would help us)
ComputeService.wait_vm_ready(vm_id, timeout)method that blocks until the VM is SSH-able (or the timeout expires). Today we'd pollget_vmfrom the deployer — works but every caller reimplements the same readiness logic. Not a blocker; nice-to-have.vm_exec— does it stream stdout incrementally (good for oursetup-binaries.shwhich prints ~1500 lines oflab buildprogress over 5-30 min) or buffer until the command exits? If buffered, we keep the SSH path; if streamed, we can drop the SSH dependency on the deployer side entirely.deploy_webgateway— does it return the publicly-resolvable FQDN immediately, or does DNS propagation need extra wait? S132 saw the gateway resolve within ~30 s oftofu applycompleting; if hero_compute mirrors that, no action needed.ComputeServicesocket reachable only locally, or does it expect bearer-token auth over network? Deployer's host (deployer admin UI) is not on the same machine ashero_compute.None of these are blockers — happy to file separate issues for any of them if that's easier. Mostly this is a tee-up for the deployer work that starts in the next few sessions (current plan in
hero_os_tfgrid_deployer#1and the follow-up scope issues we're about to file there).What's NOT a gap
deploy_vm/start_vm/stop_vm/restart_vm/delete_vm/list_vms/get_vm— all present.deploy_webgateway/list_webgateways/get_webgateway/delete_webgateway— present.inject_ssh_keys).vm_logs,vm_stats,vm_exec).migrate_secret,list_images,attach_hypervisorare also there — beyond what we need immediately but useful later.Context
hero_demosetup-binaries.sh — 34/34 PASS on lab download/install + hero_proc + hero_router GREEN on a fresh TFGrid VM.hero_cockpit#1.hero_os_tfgrid_deployer#1.cc @mahmoud , no rush — answers can come incrementally.