[--from-ci] service_X start commands purge binaries and re-cargo-build, defeating --from-ci installs #64

Open
opened 2026-05-03 19:06:47 +00:00 by mik-tf · 0 comments
Owner

Summary

service_X start (and start --reset) currently always invokes the cargo install path internally — purging existing binaries and rebuilding from source. This wipes out binaries installed via service_X install --from-ci and immediately fails on hosts without the source repos / ROOTDIR configured (every --from-ci-only deploy host).

Reproduction

On heroci.gent01.grid.tf (a fresh TFGrid VM with no Hero source repos, only nu + hero_skills):

service_proc install --from-ci --root         # OK — 3 binaries land in /root/hero/bin/
service_proc start --root                     # purges the 3 binaries, then errors:
   → stopping any existing hero_proc instance...
   ✓ hero_proc stopped
   → purging old binaries in /root/hero/bin...
   ✓ removed /root/hero/bin/hero_proc
   ✓ removed /root/hero/bin/hero_proc_server
   ✓ removed /root/hero/bin/hero_proc_ui
   → ensuring hero_proc binaries are installed...
   Error: ROOTDIR not set. Run `init` first.

The same shape applies to every service_X.nu's start function — they all call svc_drop_registration → svc_wait_processes_gone → svc_purge_binaries → install → register → start. The install step inside start is the cargo path; there's no branch for --from-ci.

Why this matters

This blocks the natural goal of hero_demo#54: stand up a full Hero stack on a CI-paved VM with no source builds. We can --from-ci install all services (already done for 7 in PR-195/196/197 etc.), but the moment we want to actually run them under hero_proc supervision, we're back to needing source + cargo.

Two ways to fix

Option A — extend --from-ci to start commands (mechanical, mirrors install pattern)

Each service_X.nu start function gets --from-ci and --version flags, identical-shape to install. Internally, start would:

  1. Skip svc_purge_binaries when --from-ci is set AND binaries are already in place from a CI install (or --reset is also set, in which case purge and re-fetch).
  2. Branch the inner install call between svc_install_from_ci and svc_install based on $from_ci.
  3. Everything else (registration, action wiring, hero_proc start) stays identical — the binary at ~/hero/bin/<name> is byte-equivalent regardless of how it got there.

Per-service patch is ~5 lines. Mirrors PRs #195 / #196 / #197.

Pros: boring, mechanical, fully consistent with the merged --from-ci install pattern. No new abstractions.
Cons: ~10 service modules need the same patch (one PR per logical batch). Adds another flag to start signatures. start becomes coupled to "how was it installed", which is a smell.

Option B — separate register / up command that doesn't reinstall

Split start into two pieces:

  • service_X register — ensures the binary is present, registers actions + service with hero_proc, starts. No install / purge. If the binary is missing, errors out with a clear message naming install / install --from-ci as the prerequisite.
  • service_X start (existing) — keeps the convenience "ensure installed + register + start" behaviour for source-build dev workflows.

register is the one used for CI-deploy paths.

Pros: clean separation. register is independent of install method. Forces explicit "install before register" on deploy paths (which matches industry-standard practice — apt install foo then systemctl start foo).
Cons: new command name to learn. Two paths to maintain. Documentation cost across all service_* modules.

Recommendation

Option A for the immediate --from-ci rollout — fastest path, mirrors merged pattern, keeps the invariant that start is the one-shot "make this service running" command regardless of install method.

Option B is a bigger architectural cleanup that's worth its own design pass once the install side is fully rolled out across all easy-tier services. The two are not mutually exclusive: A unblocks deploys today, B refines the contract once the smoke is settled.

Workaround for now

On a --from-ci host:

  1. service_X install --from-ci --root for every needed service.
  2. Manually register actions and start services via direct proc service ... / proc action ... RPC calls — bypassing service_X start's reinstall step.

This is workable for one-off validation but not for production deploys.

Out of scope

  • Adding --from-ci to service_install_all (Phase 3 of hero_demo#54). Same fix shape, but lifted to the rollup.
  • Hard-tier services (service_voice, service_embedder) where the ONNX bundling decision is still pending.

Found while running the install-side smoke pass for 7 services on heroci. Install path for all 7 is verified working; full lifecycle (start under hero_proc supervision) is blocked by this.

Signed-off-by: mik-tf

## Summary `service_X start` (and `start --reset`) currently always invokes the cargo install path internally — purging existing binaries and rebuilding from source. This wipes out binaries installed via `service_X install --from-ci` and immediately fails on hosts without the source repos / `ROOTDIR` configured (every `--from-ci`-only deploy host). ## Reproduction On heroci.gent01.grid.tf (a fresh TFGrid VM with no Hero source repos, only nu + hero_skills): ``` service_proc install --from-ci --root # OK — 3 binaries land in /root/hero/bin/ service_proc start --root # purges the 3 binaries, then errors: → stopping any existing hero_proc instance... ✓ hero_proc stopped → purging old binaries in /root/hero/bin... ✓ removed /root/hero/bin/hero_proc ✓ removed /root/hero/bin/hero_proc_server ✓ removed /root/hero/bin/hero_proc_ui → ensuring hero_proc binaries are installed... Error: ROOTDIR not set. Run `init` first. ``` The same shape applies to every `service_X.nu`'s `start` function — they all call `svc_drop_registration → svc_wait_processes_gone → svc_purge_binaries → install → register → start`. The `install` step inside `start` is the cargo path; there's no branch for `--from-ci`. ## Why this matters This blocks the natural goal of [hero_demo#54](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/54): **stand up a full Hero stack on a CI-paved VM with no source builds**. We can `--from-ci` install all services (already done for 7 in PR-195/196/197 etc.), but the moment we want to actually run them under hero_proc supervision, we're back to needing source + cargo. ## Two ways to fix ### Option A — extend `--from-ci` to `start` commands (mechanical, mirrors install pattern) Each `service_X.nu` `start` function gets `--from-ci` and `--version` flags, identical-shape to install. Internally, `start` would: 1. Skip `svc_purge_binaries` when `--from-ci` is set AND binaries are already in place from a CI install (or `--reset` is also set, in which case purge and re-fetch). 2. Branch the inner `install` call between `svc_install_from_ci` and `svc_install` based on `$from_ci`. 3. Everything else (registration, action wiring, hero_proc start) stays identical — the binary at `~/hero/bin/<name>` is byte-equivalent regardless of how it got there. Per-service patch is ~5 lines. Mirrors PRs #195 / #196 / #197. **Pros:** boring, mechanical, fully consistent with the merged `--from-ci` install pattern. No new abstractions. **Cons:** ~10 service modules need the same patch (one PR per logical batch). Adds another flag to start signatures. `start` becomes coupled to "how was it installed", which is a smell. ### Option B — separate `register` / `up` command that doesn't reinstall Split `start` into two pieces: - `service_X register` — ensures the binary is present, registers actions + service with hero_proc, starts. **No install / purge.** If the binary is missing, errors out with a clear message naming `install` / `install --from-ci` as the prerequisite. - `service_X start` (existing) — keeps the convenience "ensure installed + register + start" behaviour for source-build dev workflows. `register` is the one used for CI-deploy paths. **Pros:** clean separation. `register` is independent of install method. Forces explicit "install before register" on deploy paths (which matches industry-standard practice — `apt install foo` then `systemctl start foo`). **Cons:** new command name to learn. Two paths to maintain. Documentation cost across all `service_*` modules. ## Recommendation **Option A** for the immediate `--from-ci` rollout — fastest path, mirrors merged pattern, keeps the invariant that `start` is the one-shot "make this service running" command regardless of install method. Option B is a bigger architectural cleanup that's worth its own design pass once the install side is fully rolled out across all easy-tier services. The two are not mutually exclusive: A unblocks deploys today, B refines the contract once the smoke is settled. ## Workaround for now On a `--from-ci` host: 1. `service_X install --from-ci --root` for every needed service. 2. **Manually register actions and start services** via direct `proc service ...` / `proc action ...` RPC calls — bypassing `service_X start`'s reinstall step. This is workable for one-off validation but not for production deploys. ## Out of scope - Adding `--from-ci` to `service_install_all` (Phase 3 of [hero_demo#54](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/54)). Same fix shape, but lifted to the rollup. - Hard-tier services (`service_voice`, `service_embedder`) where the ONNX bundling decision is still pending. Found while running the install-side smoke pass for 7 services on heroci. Install path for all 7 is verified working; full lifecycle (`start` under hero_proc supervision) is blocked by this. Signed-off-by: mik-tf
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_demo#64
No description provided.