feat(service_os): close lifecycle gaps — start_base/start_full bundles + islands_build with preflight #163

Closed
sameh-farouk wants to merge 0 commits from feat/service_os-bundles-and-islands-build into development
Member

Closes #162.

What's in this PR

Two commits adding three new commands + one preflight warn to service_os. Code-only — no behavior changes to existing commands.

Commit 1: feat(service_os): add start_base / start_full bundle commands

service_os start_base    # hero_os + minimum companions (osis, aibroker, proxy)
service_os start_full    # start_base + per-island companions
                         # (livekit, collab, voice, books, biz, foundry,
                         #  indexer, embedder, codescalers, browser)

Both take --reset / --update / --root / --with-core. --with-core chains service_core start first. Each underlying service_X start is idempotent.

Commit 2: feat(service_os): add islands_build with preflight warn for per-island WASMs

service_os islands_build [--root] [--update] [--restart]

Wraps cd ~/hero/code0/hero_archipelagos && make install. Also adds a non-fatal warn in service_os start when the islands dir is missing/empty, mirroring the existing svx_check_assets pattern.

Per the issue body, start does NOT auto-chain islands_build — same separation-of-phases the module already uses for install vs wasm_build. Users compose explicitly:

service_os install
service_os wasm_build
service_os islands_build --restart
service_os start_base --with-core

Tested

Validated on a multi_user_add box (kristof4):

  • service_os start_base --with-core brings up hero_proc + hero_db + hero_router + hero_code + hero_logic + hero_osis + hero_aibroker + hero_proxy + hero_os ✓
  • service_os islands_build builds 40 islands + installs to ~/hero/share/hero_os/islands/; followed by hero_os restart, dock items load via /hero_os/ui/islands/<id>/...
  • service_os start warns cleanly when islands dir is empty, and is silent when it's populated ✓
  • Existing service_os install / wasm_build / start / stop / status behavior unchanged ✓

Notes

  • Per the issue, hero_os_ui only registers the /islands HTTP route if the directory exists at process startup. islands_build --restart covers the first-install case; subsequent rebuilds don't need a restart.
  • islands_build errors loudly if wasm-pack is missing (one-shot fix: cargo install wasm-pack) and if the upstream install Makefile target ever disappears.
Closes #162. ## What's in this PR Two commits adding three new commands + one preflight warn to `service_os`. Code-only — no behavior changes to existing commands. ### Commit 1: `feat(service_os): add start_base / start_full bundle commands` ``` service_os start_base # hero_os + minimum companions (osis, aibroker, proxy) service_os start_full # start_base + per-island companions # (livekit, collab, voice, books, biz, foundry, # indexer, embedder, codescalers, browser) ``` Both take `--reset` / `--update` / `--root` / `--with-core`. `--with-core` chains `service_core start` first. Each underlying `service_X start` is idempotent. ### Commit 2: `feat(service_os): add islands_build with preflight warn for per-island WASMs` ``` service_os islands_build [--root] [--update] [--restart] ``` Wraps `cd ~/hero/code0/hero_archipelagos && make install`. Also adds a non-fatal warn in `service_os start` when the islands dir is missing/empty, mirroring the existing `svx_check_assets` pattern. Per the issue body, `start` does NOT auto-chain `islands_build` — same separation-of-phases the module already uses for `install` vs `wasm_build`. Users compose explicitly: ``` service_os install service_os wasm_build service_os islands_build --restart service_os start_base --with-core ``` ## Tested Validated on a multi_user_add box (kristof4): - `service_os start_base --with-core` brings up hero_proc + hero_db + hero_router + hero_code + hero_logic + hero_osis + hero_aibroker + hero_proxy + hero_os ✓ - `service_os islands_build` builds 40 islands + installs to `~/hero/share/hero_os/islands/`; followed by hero_os restart, dock items load via `/hero_os/ui/islands/<id>/...` ✓ - `service_os start` warns cleanly when islands dir is empty, and is silent when it's populated ✓ - Existing `service_os install` / `wasm_build` / `start` / `stop` / `status` behavior unchanged ✓ ## Notes - Per the issue, `hero_os_ui` only registers the `/islands` HTTP route if the directory exists at process startup. `islands_build --restart` covers the first-install case; subsequent rebuilds don't need a restart. - `islands_build` errors loudly if `wasm-pack` is missing (one-shot fix: `cargo install wasm-pack`) and if the upstream `install` Makefile target ever disappears.
A complete hero_os stack needs hero_proc + hero_router + several companion
services running, but until now users had to compose `service_X start`
calls by hand. This adds two bundle commands that mirror the existing
service_core start pattern but hero_os-centric:

  service_os start_base — starts hero_os + minimum companions:
    hero_osis, hero_aibroker, hero_proxy
    (the services its built-in features actually use:
     hero_osis_sdk for AI bar / spaces, aibroker for routing,
     proxy for auth per registry.rs comment)

  service_os start_full — start_base + every per-island companion service
    so every dock item routes to a live backend:
    livekit, collab, voice, books, biz, foundry, indexer, embedder,
    codescalers, browser

Both accept --reset, --update, --root, --with-core (chains
service_core start first). Each underlying service_X start is idempotent,
so re-running is safe.

No auto-chain into start_base/start_full from the existing `start`
command — same separation-of-phases the module already uses for
wasm_build vs install.
feat(service_os): add islands_build with preflight warn for per-island WASMs
All checks were successful
Build and Publish Skills / build-and-publish (pull_request) Successful in 3s
03ef061a06
Background: a working hero_os UI has TWO WASM artifact phases — the main
hero_os_app shell bundle (built via `service_os wasm_build` from the
hero_os repo) AND ~30 per-island WASM packages (built via wasm-pack from
the hero_archipelagos repo). Without the per-island packages, the desktop
shell loads but every dock click 404s at `/islands/<id>/...`.

Until now there was no `service_*` wrapper for the per-island phase —
users had to know to manually `cd ~/hero/code0/hero_archipelagos &&
make install`. This adds:

  service_os islands_build [--root] [--update] [--restart]

It wraps the hero_archipelagos `make install` target (which itself runs
wasm-pack per island crate + rsyncs pkg/ outputs to $INSTALL_DIR). For
--root, INSTALL_DIR is overridden to /root/hero/share/...; otherwise
defaults to $HOME-relative. Preflights wasm-pack availability with a
clean remediation hint, and bails loudly if the upstream `install`
Makefile target ever disappears.

Also adds a non-fatal preflight warn in `start`, mirroring the existing
svx_check_assets pattern for the public/index.html path:

  ⚠ per-island WASM bundles not found at ~/hero/share/hero_os/islands
    Dock items will 404 when clicked (the desktop shell still loads).
    Build + install them once:
      service_os islands_build --restart

The --restart hint is non-obvious but real: hero_os_ui only registers
the /islands HTTP route if the directory exists at process startup. So
even after a successful islands_build, the running hero_os_ui won't
serve them until it's restarted.

Like wasm_build, islands_build is a discrete command — start does NOT
auto-chain it. Same separation-of-phases the module already uses for
install vs wasm_build.
fix(service_os): drop service_codescalers from start_full
All checks were successful
Build and Publish Skills / build-and-publish (pull_request) Successful in 3s
8d19c2c773
service_codescalers is a per-host admin tool, not a per-user service.
Its `start` errors out without `--root`:

    error: service_codescalers must be started with --root.

    hero_codescalers is a per-server admin tool. There is ONE instance
    per host, owned by root, with TCP access gated by ADMIN_SECRETS in
    root's hero_proc secret store. Per-user instances are not supported.

start_full is the per-user "spin up everything" bundle, so calling
service_codescalers from it always fails. Removing the call (and the
unused module import). Codescalers should be brought up separately by
the box admin: `service_codescalers start --root`.
Owner

Thanks for the careful write-up — the gap you're pointing at is real, and the svx_check_islands preflight is a nice add on its own. Before this lands though I'd like to push back on the start_base / start_full shape, because I think it overlaps a lot with stuff we already have, and pulls service_os.nu somewhere it isn't supposed to go.

What we already have today

Three layers, on purpose:

service_install_all [--core]   # build binaries (per-service install --update)
service_core start             # bring up the 5-service core stack with health probes + retry
service_complete [--core]      # install_all + start every runtime service

Quick demo of just running them on my box right now:

> service_install_all --core
=== service_install_all ===
  7 services to install
→ service_proc install … ✓
→ service_router install … ✓
→ service_mycelium install … ✓
→ service_code install … ✓
→ service_codescalers install … ✓
→ service_lib_rhai install … ✓
→ service_embedder install … ✓
=== Results: 7/7 succeeded ===

> service_core start
=== hero_proc === ✓ already healthy
=== hero_db === ✓ healthy
=== hero_router === ✓ already healthy
=== hero_code === ✓ healthy
=== hero_logic === ✓ healthy

Note service_core is doing the check → start → settle → retry-with-reset dance via core_step. That's the resilience contract we want any "start a stack" command to honour.

Where the PR rubs against that

  1. service_X.nu modules manage one service. The existing precedent for one service module importing another (service_collabservice_livekit) is a hard runtime data dependency — collab reads livekit's runtime.json for credentials. Bundling 13 sibling modules into service_os.nu because they happen to back UI islands is a different shape, and it turns a leaf module into a meta-orchestrator.

  2. start_full duplicates service_complete with a different list. Today service_complete covers proxy/db/os/osis/collab/livekit/biz/aibroker/logic/slides/whiteboard/indexer/foundry/voice/agent. start_full covers a different curated set (no db, no slides, no whiteboard, no agent; commit 3 already had to remove codescalers). Two lists in two places will drift — every new island service would need to be added in both.

  3. No core_step machinery. start_full is a flat sequence of service_X start with print between them. No health probe, no settle wait, no reset retry. So it's strictly less robust than service_core for the same kind of work.

  4. --with-core from a non-core module is a layering inversion. service_core start already exists for this; chaining it from service_os.nu puts core orchestration on a leaf service.

  5. islands_build belongs to a different repo. It builds lhumina_code/hero_archipelagos with a separate toolchain (wasm-pack). Natural home is its own module — service_archipelagos.nu or service_islands.nu — sitting next to the others. The svx_check_islands warn inside service_os start is fine and can stay.

Thanks for the careful write-up — the gap you're pointing at is real, and the `svx_check_islands` preflight is a nice add on its own. Before this lands though I'd like to push back on the `start_base` / `start_full` shape, because I think it overlaps a lot with stuff we already have, and pulls `service_os.nu` somewhere it isn't supposed to go. ## What we already have today Three layers, on purpose: ``` service_install_all [--core] # build binaries (per-service install --update) service_core start # bring up the 5-service core stack with health probes + retry service_complete [--core] # install_all + start every runtime service ``` Quick demo of just running them on my box right now: ``` > service_install_all --core === service_install_all === 7 services to install → service_proc install … ✓ → service_router install … ✓ → service_mycelium install … ✓ → service_code install … ✓ → service_codescalers install … ✓ → service_lib_rhai install … ✓ → service_embedder install … ✓ === Results: 7/7 succeeded === > service_core start === hero_proc === ✓ already healthy === hero_db === ✓ healthy === hero_router === ✓ already healthy === hero_code === ✓ healthy === hero_logic === ✓ healthy ``` Note `service_core` is doing the check → start → settle → retry-with-reset dance via `core_step`. That's the resilience contract we want any "start a stack" command to honour. ## Where the PR rubs against that 1. **`service_X.nu` modules manage one service.** The existing precedent for one service module importing another (`service_collab` → `service_livekit`) is a hard runtime data dependency — collab reads livekit's `runtime.json` for credentials. Bundling 13 sibling modules into `service_os.nu` because they happen to back UI islands is a different shape, and it turns a leaf module into a meta-orchestrator. 2. **`start_full` duplicates `service_complete`** with a different list. Today `service_complete` covers proxy/db/os/osis/collab/livekit/biz/aibroker/logic/slides/whiteboard/indexer/foundry/voice/agent. `start_full` covers a different curated set (no db, no slides, no whiteboard, no agent; commit 3 already had to remove codescalers). Two lists in two places will drift — every new island service would need to be added in both. 3. **No `core_step` machinery.** `start_full` is a flat sequence of `service_X start` with `print` between them. No health probe, no settle wait, no reset retry. So it's strictly less robust than `service_core` for the same kind of work. 4. **`--with-core` from a non-core module is a layering inversion.** `service_core start` already exists for this; chaining it from `service_os.nu` puts core orchestration on a leaf service. 5. **`islands_build` belongs to a different repo.** It builds `lhumina_code/hero_archipelagos` with a separate toolchain (`wasm-pack`). Natural home is its own module — `service_archipelagos.nu` or `service_islands.nu` — sitting next to the others. The `svx_check_islands` warn inside `service_os start` is fine and can stay.
despiegk closed this pull request 2026-04-29 04:52:52 +00:00
All checks were successful
Build and Publish Skills / build-and-publish (pull_request) Successful in 3s

Pull request closed

Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_skills!163
No description provided.