feat(service_os): close lifecycle gaps — start_base/start_full bundles + islands_build with preflight #162

Closed
opened 2026-04-28 21:26:40 +00:00 by sameh-farouk · 1 comment
Member

Summary

service_os has two lifecycle gaps that bite anyone setting up hero_os from scratch:

  1. No bundle-start command. A working hero_os deploy needs hero_proc + hero_router + several companion services (osis, aibroker, proxy, plus optional collab/livekit/voice/books/biz/foundry/indexer/embedder/codescalers/browser). Until now users had to compose service_X start calls by hand and remember the right set + ordering.
  2. No service_* wrapper for the per-island WASM build. A complete UI needs THREE artifact phases: hero_os Rust binaries (service_os install), the main hero_os_app shell WASM (service_os wasm_build), and ~30 per-island WASM bundles built from hero_archipelagos via wasm-pack. Phase 3 has no service_os wrapper — users have to know to manually run cd ~/hero/code0/hero_archipelagos && make install. There's also no preflight warning when the islands directory is missing, so the failure mode is "click a dock icon → 404 in the browser console" rather than "see a hint at start time."

Plus a non-obvious operational gotcha for phase 3: hero_os_ui only registers the /islands HTTP route if the directory exists at process startup. So even after a successful islands install, the running hero_os_ui won't serve them until it's restarted.

Proposal

Three new commands + one preflight warn, all added to tools/modules/services/service_os.nu:

service_os start_base      # service_os + osis + aibroker + proxy
service_os start_full      # start_base + livekit + collab + voice + books + biz +
                           # foundry + indexer + embedder + codescalers + browser
service_os islands_build   # build + install per-island WASM bundles
                           # (wraps `cd ~/hero/code0/hero_archipelagos && make install`)

All three accept the standard --root / --update / --reset flags from the existing module conventions. islands_build also takes --restart (because of the at-startup route registration caveat above) and preflights wasm-pack availability.

start now has a non-fatal preflight warn for missing islands, mirroring the existing svx_check_assets pattern:

⚠ per-island WASM bundles not found at ~/hero/share/hero_os/islands
  Dock items will 404 when clicked (the desktop shell still loads).
  Build + install them once:
    service_os islands_build --restart

Design choices

  • No auto-chaining. start doesn't auto-call wasm_build/islands_build; start_base/start_full doesn't auto-call them either. Same separation-of-phases the module already uses for install vs wasm_build. Each phase logs separately and can be re-run independently.
  • start_base companion set is opinionated. Just the services hero_os_app actually consumes via SDK imports (hero_osis_sdk for AI bar / spaces) plus auth (hero_proxy, per the registry.rs comment "auth moves to hero_proxy"). Everything else is feature-island specific and goes in start_full.
  • start_full does explicit ordering. livekit before collab (collab auto-chains livekit, but explicit ordering keeps output readable).
  • hero_os#44 (open) — symptom report: "OSIS island crashes with WASM 404 error" — same class of gap caught by the new preflight warn
  • hero_demo#34 (open, was home#171) — broader UI-shell discussion about which islands need fallbacks
  • home#190 (open) — architectural question about parallel _app crates and island-*-native features

Acceptance

  • PR with the new commands merged
  • service_os start_base brings up hero_os + companions on a fresh box
  • service_os islands_build --restart produces a working dock without manual cd hero_archipelagos
  • service_os start warns clearly when islands are missing
## Summary `service_os` has two lifecycle gaps that bite anyone setting up hero_os from scratch: 1. **No bundle-start command.** A working hero_os deploy needs hero_proc + hero_router + several companion services (osis, aibroker, proxy, plus optional collab/livekit/voice/books/biz/foundry/indexer/embedder/codescalers/browser). Until now users had to compose `service_X start` calls by hand and remember the right set + ordering. 2. **No `service_*` wrapper for the per-island WASM build.** A complete UI needs THREE artifact phases: hero_os Rust binaries (`service_os install`), the main hero_os_app shell WASM (`service_os wasm_build`), and ~30 per-island WASM bundles built from `hero_archipelagos` via `wasm-pack`. Phase 3 has no `service_os` wrapper — users have to know to manually run `cd ~/hero/code0/hero_archipelagos && make install`. There's also no preflight warning when the islands directory is missing, so the failure mode is "click a dock icon → 404 in the browser console" rather than "see a hint at start time." Plus a non-obvious operational gotcha for phase 3: hero_os_ui only registers the `/islands` HTTP route if the directory **exists at process startup**. So even after a successful islands install, the running hero_os_ui won't serve them until it's restarted. ## Proposal Three new commands + one preflight warn, all added to `tools/modules/services/service_os.nu`: ``` service_os start_base # service_os + osis + aibroker + proxy service_os start_full # start_base + livekit + collab + voice + books + biz + # foundry + indexer + embedder + codescalers + browser service_os islands_build # build + install per-island WASM bundles # (wraps `cd ~/hero/code0/hero_archipelagos && make install`) ``` All three accept the standard `--root` / `--update` / `--reset` flags from the existing module conventions. `islands_build` also takes `--restart` (because of the at-startup route registration caveat above) and preflights wasm-pack availability. `start` now has a non-fatal preflight warn for missing islands, mirroring the existing `svx_check_assets` pattern: ``` ⚠ per-island WASM bundles not found at ~/hero/share/hero_os/islands Dock items will 404 when clicked (the desktop shell still loads). Build + install them once: service_os islands_build --restart ``` ## Design choices - **No auto-chaining.** `start` doesn't auto-call `wasm_build`/`islands_build`; `start_base`/`start_full` doesn't auto-call them either. Same separation-of-phases the module already uses for `install` vs `wasm_build`. Each phase logs separately and can be re-run independently. - **`start_base` companion set is opinionated.** Just the services hero_os_app actually consumes via SDK imports (`hero_osis_sdk` for AI bar / spaces) plus auth (`hero_proxy`, per the registry.rs comment "auth moves to hero_proxy"). Everything else is feature-island specific and goes in `start_full`. - **`start_full` does explicit ordering.** livekit before collab (collab auto-chains livekit, but explicit ordering keeps output readable). ## Related - hero_os#44 (open) — symptom report: "OSIS island crashes with WASM 404 error" — same class of gap caught by the new preflight warn - hero_demo#34 (open, was home#171) — broader UI-shell discussion about which islands need fallbacks - home#190 (open) — architectural question about parallel `_app` crates and `island-*-native` features ## Acceptance - [ ] PR with the new commands merged - [ ] `service_os start_base` brings up hero_os + companions on a fresh box - [ ] `service_os islands_build --restart` produces a working dock without manual `cd hero_archipelagos` - [ ] `service_os start` warns clearly when islands are missing
Owner

Thanks for the careful write-up — the gap you're pointing at is real, and the svx_check_islands preflight is a nice add on its own. Before this lands though I'd like to push back on the start_base / start_full shape, because I think it overlaps a lot with stuff we already have, and pulls service_os.nu somewhere it isn't supposed to go.

What we already have today

Three layers, on purpose:

service_install_all [--core]   # build binaries (per-service install --update)
service_core start             # bring up the 5-service core stack with health probes + retry
service_complete [--core]      # install_all + start every runtime service

Quick demo of just running them on my box right now:

> service_install_all --core
=== service_install_all ===
  7 services to install
→ service_proc install … ✓
→ service_router install … ✓
→ service_mycelium install … ✓
→ service_code install … ✓
→ service_codescalers install … ✓
→ service_lib_rhai install … ✓
→ service_embedder install … ✓
=== Results: 7/7 succeeded ===

> service_core start
=== hero_proc === ✓ already healthy
=== hero_db === ✓ healthy
=== hero_router === ✓ already healthy
=== hero_code === ✓ healthy
=== hero_logic === ✓ healthy

Note service_core is doing the check → start → settle → retry-with-reset dance via core_step. That's the resilience contract we want any "start a stack" command to honour.

Where the PR rubs against that

  1. service_X.nu modules manage one service. The existing precedent for one service module importing another (service_collabservice_livekit) is a hard runtime data dependency — collab reads livekit's runtime.json for credentials. Bundling 13 sibling modules into service_os.nu because they happen to back UI islands is a different shape, and it turns a leaf module into a meta-orchestrator.

  2. start_full duplicates service_complete with a different list. Today service_complete covers proxy/db/os/osis/collab/livekit/biz/aibroker/logic/slides/whiteboard/indexer/foundry/voice/agent. start_full covers a different curated set (no db, no slides, no whiteboard, no agent; commit 3 already had to remove codescalers). Two lists in two places will drift — every new island service would need to be added in both.

  3. No core_step machinery. start_full is a flat sequence of service_X start with print between them. No health probe, no settle wait, no reset retry. So it's strictly less robust than service_core for the same kind of work.

  4. --with-core from a non-core module is a layering inversion. service_core start already exists for this; chaining it from service_os.nu puts core orchestration on a leaf service.

  5. islands_build belongs to a different repo. It builds lhumina_code/hero_archipelagos with a separate toolchain (wasm-pack). Natural home is its own module — service_archipelagos.nu or service_islands.nu — sitting next to the others. The svx_check_islands warn inside service_os start is fine and can stay.

Thanks for the careful write-up — the gap you're pointing at is real, and the `svx_check_islands` preflight is a nice add on its own. Before this lands though I'd like to push back on the `start_base` / `start_full` shape, because I think it overlaps a lot with stuff we already have, and pulls `service_os.nu` somewhere it isn't supposed to go. ## What we already have today Three layers, on purpose: ``` service_install_all [--core] # build binaries (per-service install --update) service_core start # bring up the 5-service core stack with health probes + retry service_complete [--core] # install_all + start every runtime service ``` Quick demo of just running them on my box right now: ``` > service_install_all --core === service_install_all === 7 services to install → service_proc install … ✓ → service_router install … ✓ → service_mycelium install … ✓ → service_code install … ✓ → service_codescalers install … ✓ → service_lib_rhai install … ✓ → service_embedder install … ✓ === Results: 7/7 succeeded === > service_core start === hero_proc === ✓ already healthy === hero_db === ✓ healthy === hero_router === ✓ already healthy === hero_code === ✓ healthy === hero_logic === ✓ healthy ``` Note `service_core` is doing the check → start → settle → retry-with-reset dance via `core_step`. That's the resilience contract we want any "start a stack" command to honour. ## Where the PR rubs against that 1. **`service_X.nu` modules manage one service.** The existing precedent for one service module importing another (`service_collab` → `service_livekit`) is a hard runtime data dependency — collab reads livekit's `runtime.json` for credentials. Bundling 13 sibling modules into `service_os.nu` because they happen to back UI islands is a different shape, and it turns a leaf module into a meta-orchestrator. 2. **`start_full` duplicates `service_complete`** with a different list. Today `service_complete` covers proxy/db/os/osis/collab/livekit/biz/aibroker/logic/slides/whiteboard/indexer/foundry/voice/agent. `start_full` covers a different curated set (no db, no slides, no whiteboard, no agent; commit 3 already had to remove codescalers). Two lists in two places will drift — every new island service would need to be added in both. 3. **No `core_step` machinery.** `start_full` is a flat sequence of `service_X start` with `print` between them. No health probe, no settle wait, no reset retry. So it's strictly less robust than `service_core` for the same kind of work. 4. **`--with-core` from a non-core module is a layering inversion.** `service_core start` already exists for this; chaining it from `service_os.nu` puts core orchestration on a leaf service. 5. **`islands_build` belongs to a different repo.** It builds `lhumina_code/hero_archipelagos` with a separate toolchain (`wasm-pack`). Natural home is its own module — `service_archipelagos.nu` or `service_islands.nu` — sitting next to the others. The `svx_check_islands` warn inside `service_os start` is fine and can stay.
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_skills#162
No description provided.