[infra][P1] CI-built static-musl binaries + --from-ci install path — make deploys minutes, not hours #54

Open
opened 2026-05-01 17:25:25 +00:00 by mik-tf · 24 comments
Owner

Summary

Today's deploy on herodemo took ~3 hours wall-clock and exposed five deploy-blocking bugs (hero_router#81, hero_proc#91, hero_skills#186, hero_collab#42, and a JobCreateInput regression at hero_embedder), each found at compile time on a fresh build. None of them would have reached production if a CI pipeline had built statically-linked binaries on commit and the VM had simply downloaded them.

This issue proposes adding a CI-built artifact path to the install pipeline, as an additional option, not a replacement for the current build-on-VM path.

Current model: build on each VM

service_X install --update --release runs forge merge (git pull) then cargo build --release on the VM. Pros: simple, debuggable, no extra infrastructure. Cons:

  • Cold cache build is 5-30 min per service; full service_install_all cascade is 30-60 min.
  • Build deps (rust toolchain, native libs, ONNX, system OpenSSL) live on every deploy target.
  • Disk pressure: the cargo target dir grew to 50+ GB on herodemo today; /data filled before the build even finished a couple of times.
  • Build can fail mid-deploy. We hit five such failures today.
  • glibc / path-leak class of bugs (/Volumes/T7 in hero_router#81) are invisible until someone tries to build on a different machine.
  • Multi-VM fleets pay the build cost N times — wasted CPU.
  • Not shippable: a glibc-linked binary built against the VM's environment doesn't run cleanly on customer hardware, embedded targets, or arbitrary Linux distros.

Proposed: two-path install

service_X install [--from-ci | --from-source] [--update] [--reset] [--release]
  --from-ci      — download release artifact for the target commit (default once stable)
  --from-source  — forge merge + cargo build (current behavior; always available)

Defaults shift over time: --from-source while we're rolling out, then --from-ci once we trust the artifacts and have a fallback story.

--from-ci resolves like this:

  1. Determine target commit (from --commit <sha>, or forge_url HEAD, or latest tag).
  2. Hit Forgejo Releases / package registry for <service>-<commit>-x86_64-musl.
  3. Download, verify (checksum or signature), chmod +x, drop in ~/hero/bin/.
  4. Register / restart via hero_proc as today.

Total time per service: ~5-10 seconds (download + chmod + swap), down from 30-60 seconds warm / 5-30 minutes cold.

Static linking is the real fix

Container matching is a coping strategy — "make the deploy environment match the build environment" punts the problem instead of solving it. Statically-linked musl binaries solve it: same bytes run on any Linux x86_64 (or aarch64), no glibc compatibility table, no "build env must match deploy env," shippable to anywhere a kernel runs.

Per-service classification:

Service Static-link readiness Notes
hero_proc, hero_router, hero_books, hero_biz, hero_collab, hero_foundry, hero_aibroker, hero_logic, hero_indexer, hero_osis, hero_proxy, hero_db, hero_codescalers, hero_code, hero_lib, hero_archipelagos Easy Pure Rust, rusqlite uses bundled, rustls already in the dep tree (no openssl) — cargo build --target x86_64-unknown-linux-musl --release should work directly.
hero_voice, hero_embedder Hard ONNX Runtime is C++ / native. Three options: (a) bundle libonnxruntime.so next to the binary (still relocatable + distributable, just not single-file), (b) statically link ONNX against musl (extra work, possibly upstream-PR territory), (c) keep these two on glibc as a "near-static" exception. Pragmatic call depends on demo-target.
hero_os WASM bundle Different shape WASM is its own artifact. Build in CI, upload bundle, no static-linking question.
hero_onlyoffice Stays Docker Third-party C++/Node.js stack, not vendored. Keep docker run model.

Existing infrastructure

The CI side is already partly built. From the skills index:

  • forge-release-workflow — Forgejo workflow that builds Linux binaries (amd64 musl, optionally arm64 gnu) on tag push and uploads to Releases.
  • forge_release — Forgejo Releases management.
  • forge_package — binary publishing to forge.ourworld.tf packages registry.
  • build_lib — build system library for Hero projects.
  • forge_docker_publish — for the OnlyOffice-style cases.

The missing piece is the deploy sideservice_X install --from-ci and the resolution / download / verify logic.

Storage and retention

  • Forgejo Releases per repo, tagged by <service>-<sha> or git tag.
  • Retention: keep last N (e.g. 50) commits + all release tags. Prune older nightlies.
  • Verify: checksum at minimum; signature later (cosign or similar) once we care about supply-chain integrity.
  • Per-target architecture: x86_64-musl first; aarch64-gnu or aarch64-musl once Hero deploys to ARM.

Rollout plan

  1. Pilot (one service): pick hero_proc — already has good CI, smallest blast radius for testing the full pipeline. Add forge-release-workflow, build musl binaries on every push to development, upload as hero_proc-<sha>-x86_64-musl.
  2. --from-ci install path in service_proc.nu, behind an opt-in flag. Existing --from-source stays default.
  3. Easy-tier services (the 16 pure-Rust ones): roll out the same workflow. ~1-2 day per service to land the CI workflow + verify.
  4. Hard-tier services (voice, embedder): decide ONNX strategy per-service. Static-link if feasible, otherwise bundle the .so and document the exception.
  5. Default flip: once 80%+ of services have a stable --from-ci path and we've verified rollback works, flip the default in service_install_all.

Out of scope

  • Replacing the build-on-VM path. --from-source stays forever as the dev-iteration / debug / disaster-recovery path.
  • Container-image deploys. That's a separate architectural shift (kubernetes / podman). This issue is about static binaries served from Forgejo Releases.
  • Cross-arch artifacts beyond x86_64-musl. Add aarch64 when Hero needs ARM deploy targets, not before.
  • Signing infrastructure. Checksum verification first; full signing (cosign) after.

Tradeoffs to acknowledge

  • musl is slightly slower than glibc at allocator-heavy workloads. Probably fine for Hero's services; benchmark if a service is allocation-bound.
  • CI dependency: deploy path depends on CI being up. Mitigation: --from-source always works, and once an artifact is in Forgejo Releases, it's there even if CI is currently broken.
  • Storage cost: per-commit artifacts add up. Retention policy from day one.
  • Local dev: developers still need to build locally. CI artifacts only kick in for production deploys, not for code iteration.

Cross-refs

  • hero_router#81/Volumes/T7 macOS path leak (would have been caught at CI build time, never reached deploy)
  • hero_proc#91 — DB schema change handling (CI artifact still wouldn't have helped here, but the cascade restart is bounded by load not build cost)
  • hero_skills#186service_lib_rhai rename gap (CI deploy would have surfaced this in pre-prod, not in the live deploy)
  • hero_collab#42 — accept-loop EMFILE runaway
  • hero_proc#87 — log-store runaway incident
  • hero_demo#52 vision — sovereignty pitch leans on "Hero runs on customer hardware" — static-linked binaries are the prerequisite.

This is a multi-week project to roll out properly across the stack but a clear ROI: today's 3-hour deploy with 5 bugs becomes a 5-minute deploy with no bugs once the artifact pipeline is the default path.

## Summary Today's deploy on herodemo took ~3 hours wall-clock and exposed five deploy-blocking bugs ([hero_router#81](https://forge.ourworld.tf/lhumina_code/hero_router/issues/81), [hero_proc#91](https://forge.ourworld.tf/lhumina_code/hero_proc/issues/91), [hero_skills#186](https://forge.ourworld.tf/lhumina_code/hero_skills/issues/186), [hero_collab#42](https://forge.ourworld.tf/lhumina_code/hero_collab/issues/42), and a `JobCreateInput` regression at hero_embedder), each found at compile time on a fresh build. None of them would have reached production if a CI pipeline had built statically-linked binaries on commit and the VM had simply downloaded them. This issue proposes adding a CI-built artifact path to the install pipeline, **as an additional option, not a replacement** for the current build-on-VM path. ## Current model: build on each VM `service_X install --update --release` runs `forge merge` (git pull) then `cargo build --release` on the VM. Pros: simple, debuggable, no extra infrastructure. Cons: - Cold cache build is 5-30 min per service; full `service_install_all` cascade is 30-60 min. - Build deps (rust toolchain, native libs, ONNX, system OpenSSL) live on every deploy target. - Disk pressure: the cargo target dir grew to 50+ GB on herodemo today; `/data` filled before the build even finished a couple of times. - Build can fail mid-deploy. We hit five such failures today. - glibc / path-leak class of bugs (`/Volumes/T7` in hero_router#81) are invisible until someone tries to build on a different machine. - Multi-VM fleets pay the build cost N times — wasted CPU. - Not shippable: a glibc-linked binary built against the VM's environment doesn't run cleanly on customer hardware, embedded targets, or arbitrary Linux distros. ## Proposed: two-path install ``` service_X install [--from-ci | --from-source] [--update] [--reset] [--release] --from-ci — download release artifact for the target commit (default once stable) --from-source — forge merge + cargo build (current behavior; always available) ``` Defaults shift over time: `--from-source` while we're rolling out, then `--from-ci` once we trust the artifacts and have a fallback story. `--from-ci` resolves like this: 1. Determine target commit (from `--commit <sha>`, or `forge_url HEAD`, or `latest tag`). 2. Hit Forgejo Releases / package registry for `<service>-<commit>-x86_64-musl`. 3. Download, verify (checksum or signature), `chmod +x`, drop in `~/hero/bin/`. 4. Register / restart via hero_proc as today. Total time per service: **~5-10 seconds** (download + chmod + swap), down from 30-60 seconds warm / 5-30 minutes cold. ## Static linking is the real fix Container matching is a coping strategy — "make the deploy environment match the build environment" punts the problem instead of solving it. Statically-linked musl binaries solve it: same bytes run on any Linux x86_64 (or aarch64), no glibc compatibility table, no "build env must match deploy env," shippable to anywhere a kernel runs. Per-service classification: | Service | Static-link readiness | Notes | |---|---|---| | hero_proc, hero_router, hero_books, hero_biz, hero_collab, hero_foundry, hero_aibroker, hero_logic, hero_indexer, hero_osis, hero_proxy, hero_db, hero_codescalers, hero_code, hero_lib, hero_archipelagos | **Easy** | Pure Rust, rusqlite uses `bundled`, rustls already in the dep tree (no openssl) — `cargo build --target x86_64-unknown-linux-musl --release` should work directly. | | hero_voice, hero_embedder | **Hard** | ONNX Runtime is C++ / native. Three options: (a) bundle `libonnxruntime.so` next to the binary (still relocatable + distributable, just not single-file), (b) statically link ONNX against musl (extra work, possibly upstream-PR territory), (c) keep these two on glibc as a "near-static" exception. Pragmatic call depends on demo-target. | | hero_os WASM bundle | **Different shape** | WASM is its own artifact. Build in CI, upload bundle, no static-linking question. | | hero_onlyoffice | **Stays Docker** | Third-party C++/Node.js stack, not vendored. Keep `docker run` model. | ## Existing infrastructure The CI side is already partly built. From the skills index: - `forge-release-workflow` — Forgejo workflow that builds Linux binaries (amd64 musl, optionally arm64 gnu) on tag push and uploads to Releases. - `forge_release` — Forgejo Releases management. - `forge_package` — binary publishing to forge.ourworld.tf packages registry. - `build_lib` — build system library for Hero projects. - `forge_docker_publish` — for the OnlyOffice-style cases. The missing piece is **the deploy side** — `service_X install --from-ci` and the resolution / download / verify logic. ## Storage and retention - Forgejo Releases per repo, tagged by `<service>-<sha>` or git tag. - Retention: keep last N (e.g. 50) commits + all release tags. Prune older nightlies. - Verify: checksum at minimum; signature later (cosign or similar) once we care about supply-chain integrity. - Per-target architecture: `x86_64-musl` first; `aarch64-gnu` or `aarch64-musl` once Hero deploys to ARM. ## Rollout plan 1. **Pilot (one service)**: pick hero_proc — already has good CI, smallest blast radius for testing the full pipeline. Add `forge-release-workflow`, build musl binaries on every push to `development`, upload as `hero_proc-<sha>-x86_64-musl`. 2. **`--from-ci` install path** in service_proc.nu, behind an opt-in flag. Existing `--from-source` stays default. 3. **Easy-tier services** (the 16 pure-Rust ones): roll out the same workflow. ~1-2 day per service to land the CI workflow + verify. 4. **Hard-tier services** (voice, embedder): decide ONNX strategy per-service. Static-link if feasible, otherwise bundle the .so and document the exception. 5. **Default flip**: once 80%+ of services have a stable `--from-ci` path and we've verified rollback works, flip the default in service_install_all. ## Out of scope - **Replacing the build-on-VM path.** `--from-source` stays forever as the dev-iteration / debug / disaster-recovery path. - **Container-image deploys.** That's a separate architectural shift (kubernetes / podman). This issue is about static binaries served from Forgejo Releases. - **Cross-arch artifacts beyond x86_64-musl.** Add aarch64 when Hero needs ARM deploy targets, not before. - **Signing infrastructure.** Checksum verification first; full signing (cosign) after. ## Tradeoffs to acknowledge - **musl is slightly slower than glibc** at allocator-heavy workloads. Probably fine for Hero's services; benchmark if a service is allocation-bound. - **CI dependency**: deploy path depends on CI being up. Mitigation: `--from-source` always works, and once an artifact is in Forgejo Releases, it's there even if CI is currently broken. - **Storage cost**: per-commit artifacts add up. Retention policy from day one. - **Local dev**: developers still need to build locally. CI artifacts only kick in for production deploys, not for code iteration. ## Cross-refs - [hero_router#81](https://forge.ourworld.tf/lhumina_code/hero_router/issues/81) — `/Volumes/T7` macOS path leak (would have been caught at CI build time, never reached deploy) - [hero_proc#91](https://forge.ourworld.tf/lhumina_code/hero_proc/issues/91) — DB schema change handling (CI artifact still wouldn't have helped here, but the cascade restart is bounded by load not build cost) - [hero_skills#186](https://forge.ourworld.tf/lhumina_code/hero_skills/issues/186) — `service_lib_rhai` rename gap (CI deploy would have surfaced this in pre-prod, not in the live deploy) - [hero_collab#42](https://forge.ourworld.tf/lhumina_code/hero_collab/issues/42) — accept-loop EMFILE runaway - [hero_proc#87](https://forge.ourworld.tf/lhumina_code/hero_proc/issues/87) — log-store runaway incident - [hero_demo#52 vision](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/52) — sovereignty pitch leans on "Hero runs on customer hardware" — static-linked binaries are the prerequisite. This is a multi-week project to roll out properly across the stack but a clear ROI: today's 3-hour deploy with 5 bugs becomes a 5-minute deploy with no bugs once the artifact pipeline is the default path.
Author
Owner

Rollout sketch (post-demo, not blocking)

Parking some implementation thinking here so whoever picks this up has a starting point. Not the priority right now — demo work comes first.

Order of operations

The install-side change is the high-leverage piece. Every service_X.nu install today is a thin wrapper around svc_install in lib.nu. So --from-ci lives in svc_install — one change, all 17 services that go through that helper inherit the new path.

1. lib.nu: add --from-ci flag to svc_install
   resolve commit → build artifact name → curl Forgejo Releases →
   verify checksum → drop in ~/hero/bin/. ~50-100 lines of nu.
2. Pilot: hero_proc CI workflow + verify --from-ci end-to-end on one service.
3. Fan out: 15 other easy-tier services get the same .forgejo/workflows/build-linux.yaml.
4. Hard-tier (voice, embedder): ONNX strategy decided per-service.
5. Default flip in service_install_all: only after rollback works.

Effort estimate per easy-tier service

  • ~1 hour for single-binary services (paste the workflow, replace name)
  • ~2-4 hours for multi-binary ones (hero_proc has 3 bins; hero_books has 4)
  • ~4-8 hours for build.rs codegen ones (hero_osis is the big one)
  • Plus per-service review/verify cycle

Total for 16 easy-tier services: ~30-50 focused hours spread across PRs. Realistic calendar: ~1 day for lib.nu, ~1 week for hero_proc pilot end-to-end, ~2-3 weeks fanning out.

Gotchas to flag upfront

  1. Cargo.lock must be committed for reproducible CI builds. Today hero_embedder has Cargo.lock gitignored — exactly the source of the lockfile-drift bug we hit during this session. CI building from a moving lockfile is non-reproducible. Step zero on every service: ensure Cargo.lock is committed.

  2. musl-incompatible crates lurk. Most pure-Rust deps work, but anything that links to openssl-sys, libsqlite3-sys without bundled, native-tls, etc. needs swapping. The first service we put through CI reveals the pattern.

  3. Build perf: 16 services on the same Forgejo Actions runner feels slow without sccache. Hero already has sccache.nu patterns — wire that into the CI workflow once the basics work.

  4. Default-flip discipline: don't flip --from-ci to default until rollback works. "Deploy went bad, redeploy commit Y" must grab the CI artifact for Y, not rebuild it. --from-source stays the always-available out.

  5. Per-service ownership: 16 PRs touching 16 repos means coordinating with whoever owns each repo. Worth a heads-up before the rollout starts so per-service maintainers can flag musl-incompatibility cases ahead of time.

When to start

After the demo ships and the team has cycles. Until then, today's deploy reality (build-on-VM, ~5-10 min warm-cache cycle once we're past the cold start) is workable.

## Rollout sketch (post-demo, not blocking) Parking some implementation thinking here so whoever picks this up has a starting point. **Not the priority right now** — demo work comes first. ### Order of operations The install-side change is the high-leverage piece. Every `service_X.nu install` today is a thin wrapper around `svc_install` in `lib.nu`. So `--from-ci` lives in `svc_install` — one change, all 17 services that go through that helper inherit the new path. ``` 1. lib.nu: add --from-ci flag to svc_install resolve commit → build artifact name → curl Forgejo Releases → verify checksum → drop in ~/hero/bin/. ~50-100 lines of nu. 2. Pilot: hero_proc CI workflow + verify --from-ci end-to-end on one service. 3. Fan out: 15 other easy-tier services get the same .forgejo/workflows/build-linux.yaml. 4. Hard-tier (voice, embedder): ONNX strategy decided per-service. 5. Default flip in service_install_all: only after rollback works. ``` ### Effort estimate per easy-tier service - ~1 hour for single-binary services (paste the workflow, replace name) - ~2-4 hours for multi-binary ones (hero_proc has 3 bins; hero_books has 4) - ~4-8 hours for build.rs codegen ones (hero_osis is the big one) - Plus per-service review/verify cycle Total for 16 easy-tier services: **~30-50 focused hours** spread across PRs. Realistic calendar: ~1 day for lib.nu, ~1 week for hero_proc pilot end-to-end, ~2-3 weeks fanning out. ### Gotchas to flag upfront 1. **Cargo.lock must be committed for reproducible CI builds.** Today hero_embedder has `Cargo.lock` gitignored — exactly the source of the lockfile-drift bug we hit during this session. CI building from a moving lockfile is non-reproducible. Step zero on every service: ensure Cargo.lock is committed. 2. **musl-incompatible crates** lurk. Most pure-Rust deps work, but anything that links to openssl-sys, libsqlite3-sys without `bundled`, native-tls, etc. needs swapping. The first service we put through CI reveals the pattern. 3. **Build perf**: 16 services on the same Forgejo Actions runner feels slow without sccache. Hero already has `sccache.nu` patterns — wire that into the CI workflow once the basics work. 4. **Default-flip discipline**: don't flip `--from-ci` to default until rollback works. "Deploy went bad, redeploy commit Y" must grab the CI artifact for Y, not rebuild it. `--from-source` stays the always-available out. 5. **Per-service ownership**: 16 PRs touching 16 repos means coordinating with whoever owns each repo. Worth a heads-up before the rollout starts so per-service maintainers can flag musl-incompatibility cases ahead of time. ### When to start After the demo ships and the team has cycles. Until then, today's deploy reality (build-on-VM, ~5-10 min warm-cache cycle once we're past the cold start) is workable.
Author
Owner

2026-05-02 — Picking this up. Fresh audit + hero_proc pilot plan.

Context

Session 53 priority shifted to this issue. The 3-hour deploy + 5 deploy-blocking bugs at session 52 made the cost of staying on the build-on-VM model concrete. Goal for this session: prove the loop end-to-end on hero_proctag → CI → static-musl artifact in Forgejo Releases → service_proc install --from-ci on a fresh VM. Once one service works, the rest is mechanical.

Related: coopcloud/circle_ops#773 — the Set up job zombie-network failure mode that was forcing CI re-runs is fixed as of 2026-04-28 (peter deployed cleanup script + expanded address pools 256→4352 networks + Prometheus zombie alerts). Re-runs should no longer be needed.

Fresh CI audit (last 5 runs on development, 2026-05-02)

Repo last5 status release.yaml build-linux.yaml buildenv.sh
hero_proc 5/5 success Y Y
hero_skills 5/5 success Y
hero_books 5/5 success Y
hero_browser 5/5 success Y Y
hero_whiteboard 5/5 success Y Y
hero_os 5/5 success Y Y
hero_osis 5/5 success Y Y Y
hero_voice 4/5 + 1 cancelled Y Y
hero_archipelagos 4/5 + 1 cancelled Y
hero_proxy 4/5 + 1 cancelled Y Y
hero_agent 🟡 4/5 success Y
hero_lib 🟡 4/5 success Y
hero_foundry 🟡 4/5 success Y Y
hero_indexer 🟡 4/5 success Y Y Y
hero_biz 🟡 3/5 success Y Y
hero_aibroker 🟡 3/5 success Y Y Y
hero_rpc 🟡 3/5 success Y Y
hero_router 🔴 2/5 success (recent regression) Y Y
hero_embedder 🔴 1/5 success Y Y
hero_db 🔴 1/5 success Y Y Y
hero_lib_rhai 🔴 0/5 (known — #40) Y Y
hero_foundry_ui 🔴 0/5 success Y Y
hero_demo 🔴 1/5 success
hero_browser_mcp no dev runs Y Y
hero_collab no dev runs Y
hero_logic no dev runs Y
hero_office no dev runs Y
hero_codescalers no dev runs
hero_livekit no dev runs Y Y

Updated state vs 2026-04-25 audit in hero_demo#39:

  • 9 repos already publish on tag push (have release.yaml): hero_router, hero_aibroker, hero_indexer, hero_proxy, hero_db, hero_livekit, hero_os, hero_osis, hero_rpc.
  • 15 repos have build-linux.yaml but no release.yaml — they cross-compile musl in CI but don't upload artifacts.
  • 5 repos have no firing CI on development at all (the same set #39 flagged).
  • Notable regression: hero_router (the canonical template author) has 3/5 recent failures on check. Worth a look during the rollout.

Why hero_proc is the right pilot

  • 5/5 green CI on development for the last 5 runs.
  • Has buildenv.sh and build-linux.yaml — only missing piece is the artifact-publishing release.yaml.
  • Self-contained: no native deps, pure Rust workspace, rusqlite bundled.
  • Already has hero_proc#35 explicitly asking for this exact thing.
  • Smallest blast radius — even if the artifact pipeline lands broken, --from-source keeps working.

Pilot plan — concrete steps

Step 1 — Add release.yaml to hero_proc.

  • Branch: development_mik_release_artifacts_hero_proc
  • Port hero_router/.forgejo/workflows/release.yaml — adapt only the binaries list (sourced from buildenv.sh).
  • One change to drop: hero_router's release.yaml refuses any tag not on main. Hero uses development everywhere — the check needs to allow development (or be removed). Will fix in the port.
  • PR → review → squash-merge with explicit OK.

Step 2 — Tag + verify.

  • After merge, tag v0.x.y-dev on hero_proc.
  • Confirm artifact appears at forge.ourworld.tf/lhumina_code/hero_proc/releases/tag/v0.x.y-dev with <bin>-linux-amd64-musl for each binary in $BINARIES.
  • file <bin> should report static-pie linked, stripped. Smoke-test by running on a fresh container.

Step 3 — service_proc install --from-ci in hero_skills.

  • Branch: development_mik_release_artifacts_hero_skills_from_ci
  • New nu-shell module helper: pkg_url(repo, version, bin) → resolves Forgejo Releases URL.
  • service_proc.nu install learns --from-ci [<version>]:
    • Resolve target version (explicit, or latest).
    • Download each binary in $BINARIES, verify checksum (compute from response — Forgejo doesn't sign yet, see #54 §Storage), chmod +x, drop in ~/hero/bin/.
    • Restart via hero_proc service restart hero_proc (or service_proc start --reset since it's not its own service).
  • Default stays --from-source. --from-ci is opt-in.

Step 4 — End-to-end test.

  • Fresh TFGrid VM (small one, not herodemo).
  • service_proc install --from-ci should produce a working hero_proc in <60s (vs 5-30 min cold cargo build).
  • Document the time delta in this issue.

Branch naming for the rollout

Per-repo branches: development_mik_release_artifacts_<repo> (e.g. _hero_proc, _hero_indexer). Each PR scoped to one repo, gated by green CI.

Out of scope for this session

  • The 16 easy-tier service rollout (mechanical replication once hero_proc proves the loop).
  • ONNX-tier strategy for hero_voice + hero_embedder.
  • Meta-release on hero_demo (acceptance criterion of hero_demo#38, not blocking the pilot).
  • Multi-platform install_core work — already partially closed via home#192.

Signed-off-by: mik-tf

## 2026-05-02 — Picking this up. Fresh audit + hero_proc pilot plan. ### Context Session 53 priority shifted to this issue. The 3-hour deploy + 5 deploy-blocking bugs at session 52 made the cost of staying on the build-on-VM model concrete. Goal for this session: **prove the loop end-to-end on `hero_proc`** — `tag → CI → static-musl artifact in Forgejo Releases → `service_proc install --from-ci` on a fresh VM`. Once one service works, the rest is mechanical. Related: [coopcloud/circle_ops#773](https://forge.ourworld.tf/coopcloud/circle_ops/issues/773) — the `Set up job` zombie-network failure mode that was forcing CI re-runs is **fixed** as of 2026-04-28 (peter deployed cleanup script + expanded address pools 256→4352 networks + Prometheus zombie alerts). Re-runs should no longer be needed. ### Fresh CI audit (last 5 runs on `development`, 2026-05-02) | Repo | last5 status | release.yaml | build-linux.yaml | buildenv.sh | |---|---|---|---|---| | hero_proc | ✅ 5/5 success | — | Y | Y | | hero_skills | ✅ 5/5 success | — | — | Y | | hero_books | ✅ 5/5 success | — | Y | — | | hero_browser | ✅ 5/5 success | — | Y | Y | | hero_whiteboard | ✅ 5/5 success | — | Y | Y | | hero_os | ✅ 5/5 success | Y | — | Y | | hero_osis | ✅ 5/5 success | Y | Y | Y | | hero_voice | ✅ 4/5 + 1 cancelled | — | Y | Y | | hero_archipelagos | ✅ 4/5 + 1 cancelled | — | — | Y | | hero_proxy | ✅ 4/5 + 1 cancelled | Y | — | Y | | hero_agent | 🟡 4/5 success | — | — | Y | | hero_lib | 🟡 4/5 success | — | — | Y | | hero_foundry | 🟡 4/5 success | — | Y | Y | | hero_indexer | 🟡 4/5 success | Y | Y | Y | | hero_biz | 🟡 3/5 success | — | Y | Y | | hero_aibroker | 🟡 3/5 success | Y | Y | Y | | hero_rpc | 🟡 3/5 success | Y | — | Y | | hero_router | 🔴 2/5 success (recent regression) | Y | — | Y | | hero_embedder | 🔴 1/5 success | — | Y | Y | | hero_db | 🔴 1/5 success | Y | Y | Y | | hero_lib_rhai | 🔴 0/5 (known — #40) | — | Y | Y | | hero_foundry_ui | 🔴 0/5 success | — | Y | Y | | hero_demo | 🔴 1/5 success | — | — | — | | hero_browser_mcp | ⚫ no dev runs | — | Y | Y | | hero_collab | ⚫ no dev runs | — | — | Y | | hero_logic | ⚫ no dev runs | — | — | Y | | hero_office | ⚫ no dev runs | — | — | Y | | hero_codescalers | ⚫ no dev runs | — | — | — | | hero_livekit | ⚫ no dev runs | Y | — | Y | **Updated state vs 2026-04-25 audit in [hero_demo#39](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/39):** - **9 repos already publish on tag push** (have `release.yaml`): hero_router, hero_aibroker, hero_indexer, hero_proxy, hero_db, hero_livekit, hero_os, hero_osis, hero_rpc. - **15 repos have `build-linux.yaml`** but no release.yaml — they cross-compile musl in CI but don't upload artifacts. - **5 repos have no firing CI on `development`** at all (the same set #39 flagged). - **Notable regression:** hero_router (the canonical template author) has 3/5 recent failures on `check`. Worth a look during the rollout. ### Why hero_proc is the right pilot - ✅ 5/5 green CI on `development` for the last 5 runs. - Has `buildenv.sh` and `build-linux.yaml` — only missing piece is the artifact-publishing `release.yaml`. - Self-contained: no native deps, pure Rust workspace, rusqlite bundled. - Already has [hero_proc#35](https://forge.ourworld.tf/lhumina_code/hero_proc/issues/35) explicitly asking for this exact thing. - Smallest blast radius — even if the artifact pipeline lands broken, `--from-source` keeps working. ### Pilot plan — concrete steps **Step 1 — Add `release.yaml` to hero_proc.** - Branch: `development_mik_release_artifacts_hero_proc` - Port [hero_router/.forgejo/workflows/release.yaml](https://forge.ourworld.tf/lhumina_code/hero_router/src/branch/development/.forgejo/workflows/release.yaml) — adapt only the binaries list (sourced from `buildenv.sh`). - One change to drop: hero_router's release.yaml refuses any tag not on `main`. Hero uses `development` everywhere — the check needs to allow `development` (or be removed). Will fix in the port. - PR → review → squash-merge with explicit OK. **Step 2 — Tag + verify.** - After merge, tag `v0.x.y-dev` on hero_proc. - Confirm artifact appears at `forge.ourworld.tf/lhumina_code/hero_proc/releases/tag/v0.x.y-dev` with `<bin>-linux-amd64-musl` for each binary in `$BINARIES`. - `file <bin>` should report `static-pie linked, stripped`. Smoke-test by running on a fresh container. **Step 3 — `service_proc install --from-ci` in hero_skills.** - Branch: `development_mik_release_artifacts_hero_skills_from_ci` - New nu-shell module helper: `pkg_url(repo, version, bin)` → resolves Forgejo Releases URL. - `service_proc.nu install` learns `--from-ci [<version>]`: - Resolve target version (explicit, or `latest`). - Download each binary in `$BINARIES`, verify checksum (compute from response — Forgejo doesn't sign yet, see #54 §Storage), `chmod +x`, drop in `~/hero/bin/`. - Restart via `hero_proc service restart hero_proc` (or `service_proc start --reset` since it's not its own service). - Default stays `--from-source`. `--from-ci` is opt-in. **Step 4 — End-to-end test.** - Fresh TFGrid VM (small one, not herodemo). - `service_proc install --from-ci` should produce a working hero_proc in <60s (vs 5-30 min cold cargo build). - Document the time delta in this issue. ### Branch naming for the rollout Per-repo branches: `development_mik_release_artifacts_<repo>` (e.g. `_hero_proc`, `_hero_indexer`). Each PR scoped to one repo, gated by green CI. ### Out of scope for this session - The 16 easy-tier service rollout (mechanical replication once hero_proc proves the loop). - ONNX-tier strategy for hero_voice + hero_embedder. - Meta-release on hero_demo (acceptance criterion of [hero_demo#38](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/38), not blocking the pilot). - Multi-platform `install_core` work — already partially closed via [home#192](https://forge.ourworld.tf/lhumina_code/home/issues/192). Signed-off-by: mik-tf
Author
Owner

2026-05-02 — Pilot landed.

Consumer side merged: hero_skills 3387d284 via PR #193.

service_proc install --from-ci                    # latest release
service_proc install --from-ci --version v0.4.4   # pinned tag
service_proc install --from-ci --reset            # force refetch

Verified end-to-end against the live lhumina_code/hero_proc/releases/tag/v0.4.4:

  • 3 static-pie ELF binaries land in $HOME/hero/bin/
  • Installed hero_proc --version reports hero_proc 0.4.4
  • Skip-if-present and --reset work as designed
  • Unknown repo errors cleanly with the resolved URL in the message

Side-finding worth recording

The publisher side already exists for far more repos than the 2026-04-25 audit captured. Per a fresh re-audit:

  • 9 repos with release.yaml (the canonical hero_router template): hero_router, hero_aibroker, hero_indexer, hero_proxy, hero_db, hero_livekit, hero_os, hero_osis, hero_rpc.
  • 15 additional repos with build-linux.yaml doing the same publish-to-Releases work under a different filename: hero_proc, hero_lib_rhai, hero_biz, hero_books, hero_embedder, hero_voice, hero_browser, hero_browser_mcp, hero_whiteboard, hero_foundry, hero_foundry_ui, hero_aibroker (overlap), hero_indexer (overlap), hero_db (overlap), hero_osis (overlap).

Union of distinct repos publishing artifacts on tag push: ~20. The cosmetic rename (build-linux.yamlrelease.yaml) and naming-convention sweep is a separate cleanup; it doesn't gate consumer rollout.

What's next

  1. Live VM smoke test — provisioning a fresh TFGrid VM (heroci) to validate service_proc install --from-ci end-to-end on a real environment, register + start through the supervisor, confirm full lifecycle. Specs match herodemo so the same VM can graduate to demo duty once CI-paved deploys are proven.
  2. Roll out to easy-tier services — wire --from-ci into service_router, service_aibroker, service_indexer, service_proxy, service_db, service_osis, service_books, service_biz, service_collab, service_foundry, service_logic, service_archipelagos, service_lib, service_code. The helper in lib.nu is already generic — each module needs a one-line wire-up.
  3. service_install_all --from-ci — once individual services work, lift the flag to the whole-stack installer.
  4. Hard-tier (ONNX) decisionhero_voice + hero_embedder. Bundle the .so next to the binary or stay on glibc as a documented exception.
  5. Cosmetic cleanup — rename build-linux.yamlrelease.yaml per #39's canonical naming.

Closing as stale

  • hero_proc#35 — its 'binaries are published manually' premise hasn't been true since v0.4.1. Commented; proposing close.

Signed-off-by: mik-tf

## 2026-05-02 — Pilot landed. **Consumer side merged**: [hero_skills 3387d284](https://forge.ourworld.tf/lhumina_code/hero_skills/commit/3387d28403d9e09b032e0215293f203cc24454d5) via PR [#193](https://forge.ourworld.tf/lhumina_code/hero_skills/pulls/193). ```bash service_proc install --from-ci # latest release service_proc install --from-ci --version v0.4.4 # pinned tag service_proc install --from-ci --reset # force refetch ``` Verified end-to-end against the live `lhumina_code/hero_proc/releases/tag/v0.4.4`: - 3 static-pie ELF binaries land in `$HOME/hero/bin/` - Installed `hero_proc --version` reports `hero_proc 0.4.4` - Skip-if-present and `--reset` work as designed - Unknown repo errors cleanly with the resolved URL in the message ## Side-finding worth recording The publisher side already exists for far more repos than the 2026-04-25 audit captured. Per a fresh re-audit: - **9 repos with `release.yaml`** (the canonical hero_router template): hero_router, hero_aibroker, hero_indexer, hero_proxy, hero_db, hero_livekit, hero_os, hero_osis, hero_rpc. - **15 additional repos with `build-linux.yaml`** doing the same publish-to-Releases work under a different filename: hero_proc, hero_lib_rhai, hero_biz, hero_books, hero_embedder, hero_voice, hero_browser, hero_browser_mcp, hero_whiteboard, hero_foundry, hero_foundry_ui, hero_aibroker (overlap), hero_indexer (overlap), hero_db (overlap), hero_osis (overlap). Union of distinct repos publishing artifacts on tag push: **~20**. The cosmetic rename (`build-linux.yaml` → `release.yaml`) and naming-convention sweep is a separate cleanup; it doesn't gate consumer rollout. ## What's next 1. **Live VM smoke test** — provisioning a fresh TFGrid VM (heroci) to validate `service_proc install --from-ci` end-to-end on a real environment, register + start through the supervisor, confirm full lifecycle. Specs match herodemo so the same VM can graduate to demo duty once CI-paved deploys are proven. 2. **Roll out to easy-tier services** — wire `--from-ci` into `service_router`, `service_aibroker`, `service_indexer`, `service_proxy`, `service_db`, `service_osis`, `service_books`, `service_biz`, `service_collab`, `service_foundry`, `service_logic`, `service_archipelagos`, `service_lib`, `service_code`. The helper in `lib.nu` is already generic — each module needs a one-line wire-up. 3. **`service_install_all --from-ci`** — once individual services work, lift the flag to the whole-stack installer. 4. **Hard-tier (ONNX) decision** — `hero_voice` + `hero_embedder`. Bundle the `.so` next to the binary or stay on glibc as a documented exception. 5. **Cosmetic cleanup** — rename `build-linux.yaml` → `release.yaml` per [#39](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/39)'s canonical naming. ## Closing as stale - [hero_proc#35](https://forge.ourworld.tf/lhumina_code/hero_proc/issues/35) — its 'binaries are published manually' premise hasn't been true since v0.4.1. Commented; proposing close. Signed-off-by: mik-tf
Author
Owner

2026-05-02 — Pilot smoke-tested green on a fresh TFGrid VM.

Provisioned heroci.gent01.grid.tf (16 vCPU / 32 GB / 200 GB / 16 GB rootfs, public IPv4 + Mycelium fallback) and ran service_proc install --from-ci --version v0.4.4 end-to-end. All three binaries downloaded, verified ELF, installed, and report the correct version.

Wall-clock numbers

Step Time
tofu apply (full VM provision) ~80s
apt install baseline (curl/wget/git/file/ca-certs) ~7s
nushell 0.111.0 from upstream tarball ~3s
git clone --depth 1 hero_skills ~2s
service_proc install --from-ci --version v0.4.4 ~6s
TOTAL: bare VM → working hero_proc binary ~100s

For reference: session 52's source-build hero_proc deploy took ~10 minutes of cold cargo build. Cold-cache full service_install_all was 30-60 min. Pilot delivers the speedup #54 called for.

Behavior verified

  • Latest-tag resolution via Forgejo API (latest → v0.4.4)
  • Pinned tag round-trips (--version v0.4.4)
  • All 3 binaries are ELF 64-bit … static-pie linked, stripped (run on a fresh Ubuntu 24.04 with no toolchain dependency)
  • hero_proc --version and hero_proc_server --version both report 0.4.4
  • Skip-if-present guard fires correctly on re-run
  • --reset correctly forces refetch
  • Unknown repo errors cleanly with the resolved API URL in the message

Strategic rollout plan (next sessions)

Phase 1 (~1-2 sessions): wire --from-ci into the 14 easy-tier service modules with working CI. Each module is a ~4-line patch (the helper in lib.nu is already generic and works across repos). Group as 3-4 PRs of 4-5 services each. Smoke-test each batch on heroci.

Services: service_router, service_aibroker, service_db, service_foundry, service_biz, service_books, service_whiteboard, service_proxy, service_osis, service_indexer, service_browser, service_slides, service_matrixchat, service_editor.

Phase 2 (~1 session): fix CI on 4 currently-broken/missing repos (hero_collab, hero_logic, hero_codescalers, hero_livekit) by porting the canonical hero_router release.yaml per hero_demo#39. Then wire --from-ci into their service modules.

Phase 3 (~0.5 session): wire --from-ci into service_install_all — the strategic payoff. Whole-stack deploy from CI artifacts: minutes, not hours.

Out of scope for this rollout:

  • ONNX hard-tier (hero_voice, hero_embedder) — deferred pending bundling decision (this issue §hard-tier)
  • WASM (hero_os) — different shape, separate pipeline
  • Docker (hero_office) — third-party stack stays containerized

Followups filed during this session

Signed-off-by: mik-tf

## 2026-05-02 — Pilot smoke-tested green on a fresh TFGrid VM. Provisioned `heroci.gent01.grid.tf` (16 vCPU / 32 GB / 200 GB / 16 GB rootfs, public IPv4 + Mycelium fallback) and ran `service_proc install --from-ci --version v0.4.4` end-to-end. All three binaries downloaded, verified ELF, installed, and report the correct version. ### Wall-clock numbers | Step | Time | |---|---| | `tofu apply` (full VM provision) | ~80s | | `apt install` baseline (curl/wget/git/file/ca-certs) | ~7s | | nushell 0.111.0 from upstream tarball | ~3s | | `git clone --depth 1 hero_skills` | ~2s | | **`service_proc install --from-ci --version v0.4.4`** | **~6s** | | **TOTAL: bare VM → working hero_proc binary** | **~100s** | For reference: session 52's source-build hero_proc deploy took ~10 minutes of cold cargo build. Cold-cache full `service_install_all` was 30-60 min. Pilot delivers the speedup #54 called for. ### Behavior verified - Latest-tag resolution via Forgejo API (`latest → v0.4.4`) - Pinned tag round-trips (`--version v0.4.4`) - All 3 binaries are `ELF 64-bit … static-pie linked, stripped` (run on a fresh Ubuntu 24.04 with no toolchain dependency) - `hero_proc --version` and `hero_proc_server --version` both report `0.4.4` - Skip-if-present guard fires correctly on re-run - `--reset` correctly forces refetch - Unknown repo errors cleanly with the resolved API URL in the message ### Strategic rollout plan (next sessions) **Phase 1 (~1-2 sessions): wire `--from-ci` into the 14 easy-tier service modules with working CI.** Each module is a ~4-line patch (the helper in `lib.nu` is already generic and works across repos). Group as 3-4 PRs of 4-5 services each. Smoke-test each batch on heroci. Services: `service_router`, `service_aibroker`, `service_db`, `service_foundry`, `service_biz`, `service_books`, `service_whiteboard`, `service_proxy`, `service_osis`, `service_indexer`, `service_browser`, `service_slides`, `service_matrixchat`, `service_editor`. **Phase 2 (~1 session): fix CI on 4 currently-broken/missing repos** (`hero_collab`, `hero_logic`, `hero_codescalers`, `hero_livekit`) by porting the canonical [hero_router release.yaml](https://forge.ourworld.tf/lhumina_code/hero_router/src/branch/development/.forgejo/workflows/release.yaml) per [hero_demo#39](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/39). Then wire `--from-ci` into their service modules. **Phase 3 (~0.5 session): wire `--from-ci` into `service_install_all`** — the strategic payoff. Whole-stack deploy from CI artifacts: minutes, not hours. **Out of scope for this rollout:** - ONNX hard-tier (`hero_voice`, `hero_embedder`) — deferred pending bundling decision (this issue §hard-tier) - WASM (`hero_os`) — different shape, separate pipeline - Docker (`hero_office`) — third-party stack stays containerized ### Followups filed during this session - [hero_proc#35 — proposed close](https://forge.ourworld.tf/lhumina_code/hero_proc/issues/35#issuecomment-28375): the 'binaries are published manually' premise has been stale since v0.4.1 - This issue's consumer side: [hero_skills 3387d284](https://forge.ourworld.tf/lhumina_code/hero_skills/commit/3387d28403d9e09b032e0215293f203cc24454d5) (PR [#193](https://forge.ourworld.tf/lhumina_code/hero_skills/pulls/193)) merged - heroci VM config: [hero_demo ac0363c + 370fa4d](https://forge.ourworld.tf/lhumina_code/hero_demo/commit/370fa4d) on `development` Signed-off-by: mik-tf
Author
Owner

2026-05-03 — Phase 1 partial: 5 services live with --from-ci, blocker surfaced

What landed

hero_skills#195 merged at a13c9ef0. Adds --from-ci to 4 more service modules, mirroring the pilot pattern in service_proc.nu:

  • service_router (asset suffix linux-amd64-musl, hero_router v0.2.2)
  • service_proxy (asset suffix linux-amd64-musl, hero_proxy v0.5.0)
  • service_db (asset suffix linux-amd64, hero_db v0.3.2)
  • service_indexer (asset suffix linux-amd64-musl, hero_indexer v0.1.3)

Verified end-to-end on heroci.gent01.grid.tf — all 10 binaries land in ~/hero/bin/, sized 1.3-11 MB, hero_router --version reports 0.2.1, server/UI binaries launch.

Live --from-ci coverage so far: 5 services (hero_proc from pilot + the 4 above).

Blocker for the rest of Phase 1

The originally-planned 14 easy-tier services do NOT all have a usable release on the forge today. Concretely, of the 9 services NOT yet wired:

Service Repo state Block
service_aibroker release v0.1.0 (2026-04-02) Module's SVX_BINARIES includes hero_aibroker_services (added since v0.1.0 via commit 591e071); latest release would 404. Needs re-tag.
service_osis release v1.0.0-rc5 (2026-04-05) buildenv.sh adds hero_osis_seed after the tag; same 404 pattern. Needs re-tag.
service_biz tags pushed but no Forgejo release Recent CI runs failing; tag-push → release upload step never succeeded.
service_whiteboard tag pushed but no Forgejo release Same as biz.
service_editor no v* tag pushed Workflow exists; just never tag-released.
service_foundry no release First release needed.
service_browser no release First release needed.
service_slides no release First release needed.
service_matrixchat no build-linux.yaml / release.yaml at all Phase 2 (CI fix) per #39.

The consumer wiring is now a one-line patch per service. The actual blocker is upstream: each repo needs a working tag-triggered release pipeline producing the binaries the module expects.

What's next (in order)

  1. hero_aibroker re-tag (in flight) — bumping to v0.1.1 to get hero_aibroker_services published. Once green, service_aibroker joins on a one-line patch (asset suffix linux-amd64-musl).
  2. hero_osis re-tag — same shape: bump to v1.0.0-rc6 to get hero_osis_seed published.
  3. First-release sweep on the never-tagged reposhero_editor, hero_foundry, hero_browser, hero_slides each need a v0.1.0 cut once their CI is verified green on a tag.
  4. CI debug on the failing-tag reposhero_biz, hero_whiteboard have tags pushed but the build-linux.yaml runs failed; needs investigation per repo.
  5. Phase 2hero_matrixchat needs a build-linux.yaml workflow added, classed as a #39 cleanup item.
  6. Phase 3service_install_all --from-ci only after the per-service rollout is complete.

The pattern is now boring and mechanical: tag a fresh v*, wait for CI to upload assets, add a one-line --from-ci branch + asset_suffix to the corresponding service_X.nu. PR-1 is the template; subsequent PRs will look identical.

Signed-off-by: mik-tf

## 2026-05-03 — Phase 1 partial: 5 services live with `--from-ci`, blocker surfaced ### What landed [hero_skills#195](https://forge.ourworld.tf/lhumina_code/hero_skills/pulls/195) merged at `a13c9ef0`. Adds `--from-ci` to 4 more service modules, mirroring the pilot pattern in `service_proc.nu`: - `service_router` (asset suffix `linux-amd64-musl`, hero_router v0.2.2) - `service_proxy` (asset suffix `linux-amd64-musl`, hero_proxy v0.5.0) - `service_db` (asset suffix `linux-amd64`, hero_db v0.3.2) - `service_indexer` (asset suffix `linux-amd64-musl`, hero_indexer v0.1.3) Verified end-to-end on heroci.gent01.grid.tf — all 10 binaries land in `~/hero/bin/`, sized 1.3-11 MB, `hero_router --version` reports `0.2.1`, server/UI binaries launch. **Live `--from-ci` coverage so far: 5 services** (`hero_proc` from pilot + the 4 above). ### Blocker for the rest of Phase 1 The originally-planned 14 easy-tier services do NOT all have a usable release on the forge today. Concretely, of the 9 services NOT yet wired: | Service | Repo state | Block | |---|---|---| | `service_aibroker` | release v0.1.0 (2026-04-02) | Module's `SVX_BINARIES` includes `hero_aibroker_services` (added since v0.1.0 via commit 591e071); latest release would 404. **Needs re-tag.** | | `service_osis` | release v1.0.0-rc5 (2026-04-05) | `buildenv.sh` adds `hero_osis_seed` after the tag; same 404 pattern. **Needs re-tag.** | | `service_biz` | tags pushed but **no Forgejo release** | Recent CI runs failing; tag-push → release upload step never succeeded. | | `service_whiteboard` | tag pushed but **no Forgejo release** | Same as biz. | | `service_editor` | no `v*` tag pushed | Workflow exists; just never tag-released. | | `service_foundry` | no release | First release needed. | | `service_browser` | no release | First release needed. | | `service_slides` | no release | First release needed. | | `service_matrixchat` | no `build-linux.yaml` / `release.yaml` at all | Phase 2 (CI fix) per [#39](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/39). | **The consumer wiring is now a one-line patch per service.** The actual blocker is upstream: each repo needs a working tag-triggered release pipeline producing the binaries the module expects. ### What's next (in order) 1. **`hero_aibroker` re-tag** (in flight) — bumping to v0.1.1 to get `hero_aibroker_services` published. Once green, `service_aibroker` joins on a one-line patch (asset suffix `linux-amd64-musl`). 2. **`hero_osis` re-tag** — same shape: bump to v1.0.0-rc6 to get `hero_osis_seed` published. 3. **First-release sweep on the never-tagged repos** — `hero_editor`, `hero_foundry`, `hero_browser`, `hero_slides` each need a `v0.1.0` cut once their CI is verified green on a tag. 4. **CI debug on the failing-tag repos** — `hero_biz`, `hero_whiteboard` have tags pushed but the build-linux.yaml runs failed; needs investigation per repo. 5. **Phase 2** — `hero_matrixchat` needs a `build-linux.yaml` workflow added, classed as a [#39](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/39) cleanup item. 6. **Phase 3** — `service_install_all --from-ci` only after the per-service rollout is complete. The pattern is now boring and mechanical: tag a fresh `v*`, wait for CI to upload assets, add a one-line `--from-ci` branch + asset_suffix to the corresponding `service_X.nu`. PR-1 is the template; subsequent PRs will look identical. Signed-off-by: mik-tf
Author
Owner

2026-05-03 (later) — hero_aibroker joins, plus replicable recipe for the rest

What landed since the last update

  1. hero_aibroker#57release.yaml gate relaxed to allow tagging on development or main (mirrors hero_proxy, the working template). 4-line diff.
  2. hero_aibroker v0.1.1 — fresh tag at the new development HEAD. release.yaml published 4 binaries with linux-amd64-musl suffix:
    • hero_aibroker, hero_aibroker_server, hero_aibroker_ui, hero_aibroker_services
  3. hero_skills#196 — one-line service_aibroker consumer wiring. Merged at 9cad828.

Smoke-tested end-to-end on heroci: all 4 binaries land in /root/hero/bin/, sized 4–12 MB, file reports static-pie ELF stripped.

--from-ci coverage now: 6 services (hero_proc, hero_router, hero_proxy, hero_db, hero_indexer, hero_aibroker).

Tangential issue filed

hero_aibroker#58test_server_rpc_methods in build.yaml fails because it needs a live hero_db, which CI doesn't provide. Pre-existing, not blocking the release pipeline. Two recommended fixes documented (#[ignore]-by-default vs. start hero_db in the CI job).

Generalised recipe for the next service

The aibroker work fell into a 3-step pattern that should generalise to most of the remaining easy-tier services. Per repo:

  1. Audit release.yaml — does the "Verify tag is on..." gate accept development? hero_proxy is the canonical example. If main-only, ship a 4-line mirror hero_proxy gate PR.
  2. Cut a fresh tag at the current development HEAD that publishes whatever binaries the corresponding service_X.nu module's SVX_BINARIES expects today (the binary list often drifts ahead of the last release).
  3. One-line consumer wiring in hero_skills, smoke service_X install --from-ci on heroci.

Per-repo readiness audit (where I can tell from outside the repo):

Repo Latest release Gate release.yaml exists Module match?
hero_router v0.2.2 main-only
hero_indexer v0.1.3 main-only
hero_osis v1.0.0-rc5 unchecked ✗ (hero_osis_seed missing — needs re-tag)
hero_biz none on forge unchecked ? needs first release
hero_whiteboard none on forge unchecked ? needs first release
hero_editor none on forge unchecked needs first tag
hero_foundry none on forge unchecked ? needs first release
hero_browser none on forge unchecked ? needs first release
hero_slides none on forge unchecked ? needs first release
hero_matrixchat n/a n/a Phase 2 (workflow-add)

router and indexer already have working releases — their main-only gate is cosmetic, not blocking, until the next release-cut. They can stay as-is until the next tag-cut on those repos.

The "8 needing first/fresh release" set is now a sequenced workflow, one repo at a time, identical shape every time. Suggest tackling them in priority order — happy to take the next one whenever you're ready.

Signed-off-by: mik-tf

## 2026-05-03 (later) — `hero_aibroker` joins, plus replicable recipe for the rest ### What landed since the last update 1. [hero_aibroker#57](https://forge.ourworld.tf/lhumina_code/hero_aibroker/pulls/57) — `release.yaml` gate relaxed to allow tagging on `development` or `main` (mirrors `hero_proxy`, the working template). 4-line diff. 2. [hero_aibroker v0.1.1](https://forge.ourworld.tf/lhumina_code/hero_aibroker/releases/tag/v0.1.1) — fresh tag at the new `development` HEAD. `release.yaml` published 4 binaries with `linux-amd64-musl` suffix: - `hero_aibroker`, `hero_aibroker_server`, `hero_aibroker_ui`, `hero_aibroker_services` 3. [hero_skills#196](https://forge.ourworld.tf/lhumina_code/hero_skills/pulls/196) — one-line `service_aibroker` consumer wiring. Merged at `9cad828`. Smoke-tested end-to-end on heroci: all 4 binaries land in `/root/hero/bin/`, sized 4–12 MB, `file` reports static-pie ELF stripped. **`--from-ci` coverage now: 6 services** (`hero_proc`, `hero_router`, `hero_proxy`, `hero_db`, `hero_indexer`, `hero_aibroker`). ### Tangential issue filed [hero_aibroker#58](https://forge.ourworld.tf/lhumina_code/hero_aibroker/issues/58) — `test_server_rpc_methods` in `build.yaml` fails because it needs a live `hero_db`, which CI doesn't provide. Pre-existing, not blocking the release pipeline. Two recommended fixes documented (`#[ignore]`-by-default vs. start `hero_db` in the CI job). ### Generalised recipe for the next service The aibroker work fell into a 3-step pattern that should generalise to most of the remaining easy-tier services. Per repo: 1. **Audit `release.yaml`** — does the "Verify tag is on..." gate accept `development`? `hero_proxy` is the canonical example. If main-only, ship a 4-line `mirror hero_proxy gate` PR. 2. **Cut a fresh tag** at the current `development` HEAD that publishes whatever binaries the corresponding `service_X.nu` module's `SVX_BINARIES` expects today (the binary list often drifts ahead of the last release). 3. **One-line consumer wiring** in `hero_skills`, smoke `service_X install --from-ci` on heroci. Per-repo readiness audit (where I can tell from outside the repo): | Repo | Latest release | Gate | `release.yaml` exists | Module match? | |---|---|---|---|---| | `hero_router` | v0.2.2 | main-only | ✓ | ✓ | | `hero_indexer` | v0.1.3 | main-only | ✓ | ✓ | | `hero_osis` | v1.0.0-rc5 | unchecked | ✓ | ✗ (`hero_osis_seed` missing — needs re-tag) | | `hero_biz` | none on forge | unchecked | ? | needs first release | | `hero_whiteboard` | none on forge | unchecked | ? | needs first release | | `hero_editor` | none on forge | unchecked | ✓ | needs first tag | | `hero_foundry` | none on forge | unchecked | ? | needs first release | | `hero_browser` | none on forge | unchecked | ? | needs first release | | `hero_slides` | none on forge | unchecked | ? | needs first release | | `hero_matrixchat` | n/a | n/a | ✗ | Phase 2 (workflow-add) | `router` and `indexer` already have working releases — their main-only gate is cosmetic, not blocking, until the next release-cut. They can stay as-is until the next tag-cut on those repos. The "8 needing first/fresh release" set is now a sequenced workflow, one repo at a time, identical shape every time. Suggest tackling them in priority order — happy to take the next one whenever you're ready. Signed-off-by: mik-tf
Author
Owner

2026-05-03 (later still) — hero_osis joins, plus a load-bearing finding

What landed since the last update

  1. hero_osis v1.0.0-rc6 — fresh tag at the development tip. Publishes the current binary set (hero_osis, hero_osis_ui, hero_osis_seed, hero_bot) which the v1.0.0-rc5 release was missing. No gate fix needed (hero_osis only gates the upload step on startsWith(github.ref, 'refs/tags/v'), not on branch).
  2. hero_skills#197service_osis consumer wiring, one-line patch. Merged at b94bd7e.

Smoke-tested end-to-end on heroci. The 3 module-expected binaries land cleanly.

--from-ci install coverage now: 7 serviceshero_proc, hero_router, hero_proxy, hero_db, hero_indexer, hero_aibroker, hero_osis. 20 binaries totalling ~108 MB sitting in /root/hero/bin/ on heroci, all from CI artifacts, no cargo run anywhere on the box.

Bumps along the way (worth recording)

  • hero_osis repo's FORGEJO_TOKEN secret was unset → tag-push CI failed silently on the create-release POST (curl -sf swallowed the 401, python crashed on empty stdin). Token added → tag deleted + re-pushed → run #494 succeeded.
  • One CI runner appears to occasionally wedge on docker create (21 min on Set up job for one attempt). Other repos' jobs flowed through fine in parallel — looked like single-runner saturation rather than pool-wide failure.

Load-bearing finding: --from-ci is install-only today

While trying to drive service_proc start --root on heroci as a stress-test, hit a hard wall: service_X start always purges existing binaries and reinstalls via cargo, regardless of how the binary got on disk. On a CI-paved host (no source repos, no ROOTDIR), it errors out with ROOTDIR not set after wiping the just-installed CI binaries.

This means the --from-ci install path doesn't yet enable a full CI-paved stack lifecycle. We can --from-ci install the binaries; we cannot --from-ci start the supervised stack.

Filed hero_demo#64 with two recommended fixes:

  • Option A — extend --from-ci flag to each service_X.nu start function (mirrors install pattern, ~5 lines per service).
  • Option B — split start into a separate register / up command that doesn't reinstall (cleaner separation, matches apt install foo && systemctl start foo shape).

Recommendation: A for immediate rollout (mechanical, mirrors merged install pattern), B as a follow-on architectural cleanup.

Tracking as limitation L-05 in the workspace pipeline.

Updated rollout map

Status Service Notes
install live proc, router, proxy, db, indexer, aibroker, osis 7/14 easy-tier
⚠ install needs first/fresh release biz, whiteboard, editor, foundry, browser, slides tag/CI work per repo
⚠ install needs CI workflow added matrixchat Phase 2
⚠ start lifecycle blocked everywhere all 7 above hero_demo#64

What's next

  1. Continue install-side rollout — pick one of the 6 unreleased easy-tier repos, audit its CI workflows, fix what's broken, cut a first tag, wire service_X.nu. hero_editor is the lowest-effort candidate (workflow exists, just never tag-pushed). Each service is now a self-contained ~10-30 min unit if no CI debugging is needed; longer if first-tag CI uncovers issues.
  2. Phase 1.5: extend --from-ci to start (hero_demo#64 Option A). Independent track from #1. Once installed, this unlocks the full "fresh VM → working stack from CI artifacts" demo.
  3. Phase 2 — CI-fix on hero_matrixchat (no release workflow at all).
  4. Phase 3service_install_all --from-ci.

Signed-off-by: mik-tf

## 2026-05-03 (later still) — `hero_osis` joins, plus a load-bearing finding ### What landed since the last update 1. [hero_osis v1.0.0-rc6](https://forge.ourworld.tf/lhumina_code/hero_osis/releases/tag/v1.0.0-rc6) — fresh tag at the development tip. Publishes the current binary set (`hero_osis`, `hero_osis_ui`, `hero_osis_seed`, `hero_bot`) which the v1.0.0-rc5 release was missing. No gate fix needed (`hero_osis` only gates the upload step on `startsWith(github.ref, 'refs/tags/v')`, not on branch). 2. [hero_skills#197](https://forge.ourworld.tf/lhumina_code/hero_skills/pulls/197) — `service_osis` consumer wiring, one-line patch. Merged at `b94bd7e`. Smoke-tested end-to-end on heroci. The 3 module-expected binaries land cleanly. **`--from-ci` install coverage now: 7 services** — `hero_proc`, `hero_router`, `hero_proxy`, `hero_db`, `hero_indexer`, `hero_aibroker`, `hero_osis`. **20 binaries totalling ~108 MB** sitting in `/root/hero/bin/` on heroci, all from CI artifacts, no cargo run anywhere on the box. ### Bumps along the way (worth recording) - `hero_osis` repo's `FORGEJO_TOKEN` secret was unset → tag-push CI failed silently on the create-release POST (curl `-sf` swallowed the 401, python crashed on empty stdin). Token added → tag deleted + re-pushed → run #494 succeeded. - One CI runner appears to occasionally wedge on `docker create` (21 min on `Set up job` for one attempt). Other repos' jobs flowed through fine in parallel — looked like single-runner saturation rather than pool-wide failure. ### Load-bearing finding: `--from-ci` is install-only today While trying to drive `service_proc start --root` on heroci as a stress-test, hit a hard wall: **`service_X start` always purges existing binaries and reinstalls via cargo**, regardless of how the binary got on disk. On a CI-paved host (no source repos, no `ROOTDIR`), it errors out with `ROOTDIR not set` after wiping the just-installed CI binaries. This means **the `--from-ci` install path doesn't yet enable a full CI-paved stack lifecycle**. We can `--from-ci` install the binaries; we cannot `--from-ci` start the supervised stack. Filed [hero_demo#64](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/64) with two recommended fixes: - **Option A** — extend `--from-ci` flag to each `service_X.nu` `start` function (mirrors install pattern, ~5 lines per service). - **Option B** — split `start` into a separate `register` / `up` command that doesn't reinstall (cleaner separation, matches `apt install foo && systemctl start foo` shape). Recommendation: A for immediate rollout (mechanical, mirrors merged install pattern), B as a follow-on architectural cleanup. Tracking as limitation **L-05** in the workspace pipeline. ### Updated rollout map | Status | Service | Notes | |---|---|---| | ✅ install live | proc, router, proxy, db, indexer, aibroker, osis | 7/14 easy-tier | | ⚠ install needs first/fresh release | biz, whiteboard, editor, foundry, browser, slides | tag/CI work per repo | | ⚠ install needs CI workflow added | matrixchat | Phase 2 | | ⚠ start lifecycle blocked everywhere | all 7 above | hero_demo#64 | ### What's next 1. **Continue install-side rollout** — pick one of the 6 unreleased easy-tier repos, audit its CI workflows, fix what's broken, cut a first tag, wire `service_X.nu`. `hero_editor` is the lowest-effort candidate (workflow exists, just never tag-pushed). Each service is now a self-contained ~10-30 min unit if no CI debugging is needed; longer if first-tag CI uncovers issues. 2. **Phase 1.5: extend `--from-ci` to `start`** ([hero_demo#64](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/64) Option A). Independent track from #1. Once installed, this unlocks the full "fresh VM → working stack from CI artifacts" demo. 3. **Phase 2** — CI-fix on `hero_matrixchat` (no release workflow at all). 4. **Phase 3** — `service_install_all --from-ci`. Signed-off-by: mik-tf
Author
Owner

Session 55 — Phase 2 audit plan (cluster-by-cluster)

Session 54 reverted --from-ci from the 8 services without published artifacts. Session 55 audits why their CI doesn't publish. Surveyed all 8 .forgejo/workflows/ + Forge API tags/releases/runs first to avoid duplicating work.

Findings

Repo Tags Release Assets
hero_matrixchat none
hero_editor none
hero_slides none
hero_biz v0.1.0/1/2
hero_browser v0.1.1/2/3
hero_foundry v0.0.1-rc1, v0.1.0
hero_whiteboard v0.1.0
hero_books v0.1.3/4/5 v0.1.4 0

hero_biz, hero_browser, hero_editor, hero_foundry ship near-identical 68-74-line build-linux.yaml templates that all source the same scripts/build_lib.sh (~2370 lines, present in every repo) and call shared helpers setup_linux_toolchain / build_binaries / publish_binaries. They'll fail or succeed for the same reason.

Audit clustering

Rather than 8 independent audits, group by likely shared root cause:

  • Cluster 1 — no tag pushed (2 repos): hero_editor, hero_slides. Workflow is fine; just hasn't been triggered. Fix: tag + watch a run.
  • Cluster 2 — no release workflow at all (1 repo): hero_matrixchat. Has ci.yml test/lint only, no build-linux.yaml. Fix: port a working template.
  • Cluster 3 — tag exists, no release got created, shared-helper template (4 repos): hero_biz, hero_browser, hero_foundry, hero_whiteboard (whiteboard is partial outlier — inline release logic, not shared helper). Investigate hero_biz first; finding likely propagates to 3 siblings.
  • Cluster 4 — release got created, asset-upload failed (1 repo): hero_books. Distinct symptom; standalone deep-dive.

Order this session

  1. hero_books (Cluster 4) — partial-success state has the richest diagnostics
  2. hero_biz (Cluster 3) — find the shared template failure, propagate diagnosis to browser/foundry/whiteboard
  3. hero_editor + hero_slides (Cluster 1) — quick: tag-and-watch
  4. hero_matrixchat (Cluster 2) — template-port recommendation

Output

8 per-repo issues filed (one per service), cross-linked where they share a root cause, plus a closing summary comment back here with effort estimates for Phase 2 implementation.

Out of scope this session: any actual CI fixes — audit + issues only.

### Session 55 — Phase 2 audit plan (cluster-by-cluster) Session 54 reverted `--from-ci` from the 8 services without published artifacts. Session 55 audits **why** their CI doesn't publish. Surveyed all 8 `.forgejo/workflows/` + Forge API tags/releases/runs first to avoid duplicating work. #### Findings | Repo | Tags | Release | Assets | |---|---|---|---| | `hero_matrixchat` | none | — | — | | `hero_editor` | none | — | — | | `hero_slides` | none | — | — | | `hero_biz` | v0.1.0/1/2 | — | — | | `hero_browser` | v0.1.1/2/3 | — | — | | `hero_foundry` | v0.0.1-rc1, v0.1.0 | — | — | | `hero_whiteboard` | v0.1.0 | — | — | | `hero_books` | v0.1.3/4/5 | v0.1.4 | **0** | `hero_biz`, `hero_browser`, `hero_editor`, `hero_foundry` ship **near-identical 68-74-line `build-linux.yaml`** templates that all source the same `scripts/build_lib.sh` (~2370 lines, present in every repo) and call shared helpers `setup_linux_toolchain` / `build_binaries` / `publish_binaries`. They'll fail or succeed for the same reason. #### Audit clustering Rather than 8 independent audits, group by likely shared root cause: - **Cluster 1 — no tag pushed (2 repos):** `hero_editor`, `hero_slides`. Workflow is fine; just hasn't been triggered. Fix: tag + watch a run. - **Cluster 2 — no release workflow at all (1 repo):** `hero_matrixchat`. Has `ci.yml` test/lint only, no `build-linux.yaml`. Fix: port a working template. - **Cluster 3 — tag exists, no release got created, shared-helper template (4 repos):** `hero_biz`, `hero_browser`, `hero_foundry`, `hero_whiteboard` (whiteboard is partial outlier — inline release logic, not shared helper). Investigate `hero_biz` first; finding likely propagates to 3 siblings. - **Cluster 4 — release got created, asset-upload failed (1 repo):** `hero_books`. Distinct symptom; standalone deep-dive. #### Order this session 1. **hero_books** (Cluster 4) — partial-success state has the richest diagnostics 2. **hero_biz** (Cluster 3) — find the shared template failure, propagate diagnosis to browser/foundry/whiteboard 3. **hero_editor + hero_slides** (Cluster 1) — quick: tag-and-watch 4. **hero_matrixchat** (Cluster 2) — template-port recommendation #### Output 8 per-repo issues filed (one per service), cross-linked where they share a root cause, plus a closing summary comment back here with effort estimates for Phase 2 implementation. Out of scope this session: any actual CI fixes — audit + issues only.
Author
Owner

Session 55 — Phase 2 audit complete

8 per-repo issues filed. Root cause analysis revealed a shared-helper bug that explains 4 of 8 services at once.

Per-repo issues

Cluster Repo Issue State found Fix
A hero_biz #13 tags + pkg-registry binary, no Release shared-helper fix
A hero_books #118 tags + pkg-registry binaries + empty Release shared-helper fix
A hero_browser #16 tags + pkg-registry binary, no Release shared-helper fix
B hero_foundry #26 stale Feb-2026 success runs, empty everywhere re-tag, then shared-helper fix
C hero_whiteboard #136 9 failed runs on v0.1.0; inline release logic run-log debug
D hero_editor #5 no tag pushed; standard template tag + (cluster A fix prereq)
D hero_slides #42 no tag pushed; inline release logic tag (no prereq)
E hero_matrixchat #4 no build-linux.yaml at all port from hero_proc

Root cause for clusters A + B (4 of 8 services)

scripts/build_lib.sh::publish_binaries writes binaries only to the Forgejo package registry (/api/packages/<owner>/generic/<pkg>/<version>/<asset>). It never creates a Forgejo Release nor uploads to /api/v1/repos/<repo>/releases/<id>/assets.

svc_install_from_ci in hero_skills/tools/modules/services/lib.nu:510 downloads from forge.ourworld.tf/<repo>/releases/download/<tag>/<asset>release assets only, not pkg registry. Net: 4 services have working CI but are invisible to --from-ci.

The 6 services in Phase 1 with working CI (hero_proc, hero_router, hero_proxy, hero_db, hero_indexer, hero_aibroker, hero_osis) all have inline release-creation + asset-upload logic in their workflows, not the shared helper.

Phase 2 effort estimates

Class Repos Effort
Add publish_release_assets helper to scripts/build_lib.sh, propagate to 4 repos, re-tag each, validate on heroci biz, books, browser, foundry 4-6 h total (one helper + 4 propagations + 4 re-tag/validate cycles)
Add --from-ci consumer wiring in service_*.nu modules for unblocked services biz, books, browser, foundry, slides, editor, matrixchat ~30 min/each = 3-4 h
Standalone debug whiteboard 2-4 h (read run logs, fix build)
Add new release workflow matrixchat 1-2 h (port template)
Push tags + validate editor, slides 30 min each = 1 h

Phase 2 total estimate: ~12-18 h of work. Highest-leverage starting point is the shared-helper fix — unblocks 4 services in one PR.

Out of scope this session

  • All implementation. Each Phase 2 fix may be a follow-up session.
  • Modifying the 6 working-CI services.

Ready to ship — closing audit phase.

### Session 55 — Phase 2 audit complete 8 per-repo issues filed. Root cause analysis revealed a **shared-helper bug** that explains 4 of 8 services at once. #### Per-repo issues | Cluster | Repo | Issue | State found | Fix | |---|---|---|---|---| | A | hero_biz | [#13](https://forge.ourworld.tf/lhumina_code/hero_biz/issues/13) | tags + pkg-registry binary, no Release | shared-helper fix | | A | hero_books | [#118](https://forge.ourworld.tf/lhumina_code/hero_books/issues/118) | tags + pkg-registry binaries + empty Release | shared-helper fix | | A | hero_browser | [#16](https://forge.ourworld.tf/lhumina_code/hero_browser/issues/16) | tags + pkg-registry binary, no Release | shared-helper fix | | B | hero_foundry | [#26](https://forge.ourworld.tf/lhumina_code/hero_foundry/issues/26) | stale Feb-2026 success runs, empty everywhere | re-tag, then shared-helper fix | | C | hero_whiteboard | [#136](https://forge.ourworld.tf/lhumina_code/hero_whiteboard/issues/136) | 9 failed runs on v0.1.0; inline release logic | run-log debug | | D | hero_editor | [#5](https://forge.ourworld.tf/lhumina_code/hero_editor/issues/5) | no tag pushed; standard template | tag + (cluster A fix prereq) | | D | hero_slides | [#42](https://forge.ourworld.tf/lhumina_code/hero_slides/issues/42) | no tag pushed; inline release logic | tag (no prereq) | | E | hero_matrixchat | [#4](https://forge.ourworld.tf/lhumina_code/hero_matrixchat/issues/4) | no `build-linux.yaml` at all | port from hero_proc | #### Root cause for clusters A + B (4 of 8 services) `scripts/build_lib.sh::publish_binaries` writes binaries only to the Forgejo **package registry** (`/api/packages/<owner>/generic/<pkg>/<version>/<asset>`). It never creates a Forgejo **Release** nor uploads to `/api/v1/repos/<repo>/releases/<id>/assets`. `svc_install_from_ci` in [hero_skills/tools/modules/services/lib.nu:510](https://forge.ourworld.tf/lhumina_code/hero_skills/src/branch/development/tools/modules/services/lib.nu#L510) downloads from `forge.ourworld.tf/<repo>/releases/download/<tag>/<asset>` — **release assets only**, not pkg registry. Net: 4 services have working CI but are invisible to `--from-ci`. The 6 services in Phase 1 with working CI (hero_proc, hero_router, hero_proxy, hero_db, hero_indexer, hero_aibroker, hero_osis) all have **inline release-creation + asset-upload logic** in their workflows, *not* the shared helper. #### Phase 2 effort estimates | Class | Repos | Effort | |---|---|---| | Add `publish_release_assets` helper to `scripts/build_lib.sh`, propagate to 4 repos, re-tag each, validate on heroci | biz, books, browser, foundry | **4-6 h total** (one helper + 4 propagations + 4 re-tag/validate cycles) | | Add `--from-ci` consumer wiring in service_*.nu modules for unblocked services | biz, books, browser, foundry, slides, editor, matrixchat | ~30 min/each = **3-4 h** | | Standalone debug | whiteboard | **2-4 h** (read run logs, fix build) | | Add new release workflow | matrixchat | **1-2 h** (port template) | | Push tags + validate | editor, slides | **30 min each = 1 h** | **Phase 2 total estimate: ~12-18 h of work.** Highest-leverage starting point is the shared-helper fix — unblocks 4 services in one PR. #### Out of scope this session - All implementation. Each Phase 2 fix may be a follow-up session. - Modifying the 6 working-CI services. Ready to ship — closing audit phase.
Author
Owner

Decision: Releases is canonical, work from the 7 already-working services

After broader assessment of where Hero sits relative to industry-standard binary distribution:

Industry signal

Static-Linux-binary distribution from a forge is overwhelmingly Releases-based in the OSS world: kubectl, gh, ripgrep, hugo, terraform, docker-compose, nu, deno, bun, foundry, cargo-binstall — all expect Release assets. Generic package registries (Forgejo /api/packages/, GitHub Packages generic) are used for typed packages consumed by typed package managers (npm/cargo/pip/docker), not bare binaries pulled by deploy scripts.

Hero-specific reasons Releases wins

  1. No-auth bootstrap. Release assets are world-readable. A fresh TFGrid VM with curl can pull without FORGEJO_TOKEN. Pkg registry needs token plumbing on every VM.
  2. UI signal. /releases page tells humans what shipped and when. Pkg registry pages are machine-only.
  3. Tag-bound semantics. A Release is bound to a tag — strong invariant. Pkg-registry versions can drift from git tags.
  4. Tooling alignment. Any future gh release download / cargo binstall / install.sh story works out of the box.

Where we already stand

The 7 working services (hero_proc, hero_router, hero_proxy, hero_db, hero_indexer, hero_aibroker, hero_osis) all do exactly this: their build-linux.yaml has inline Create Release + Upload Release Assets steps, plus an optional pkg-registry mirror. They are the canonical Hero pattern. No new helper needed — copy the working pattern into the 4 broken repos.

Updated Phase 2 plan

  • Cluster A+B (biz, books, browser, foundry): copy the inline release-asset upload pattern from hero_proc's build-linux.yaml into each repo's workflow. Drop reliance on build_lib.sh::publish_binaries for release-asset publishing (keep it for optional pkg-registry mirror or remove it entirely). Re-tag, validate on heroci. ~2-3 h total (one PR pattern, applied 4x).
  • Cluster C (whiteboard): unchanged — independent build-failure debug per #136.
  • Cluster D (editor, slides): push tag (after Cluster A pattern lands in shared template), validate.
  • Cluster E (matrixchat): port hero_proc's build-linux.yaml directly.

Skill ecosystem follow-up (not blocking)

  • build_lib_ci SKILL.md template currently calls only publish_binaries (pkg registry). Should be updated to match the canonical hero_proc pattern (inline release-asset upload + optional pkg-registry mirror).
  • tfgrid_deploy should switch its consumer to Release URLs to drop the auth-token requirement on TFGrid VMs.

Filed for visibility; not blocking the 4-repo Phase 2 fix above.

### Decision: Releases is canonical, work from the 7 already-working services After broader assessment of where Hero sits relative to industry-standard binary distribution: #### Industry signal Static-Linux-binary distribution from a forge is overwhelmingly **Releases-based** in the OSS world: kubectl, gh, ripgrep, hugo, terraform, docker-compose, nu, deno, bun, foundry, cargo-binstall — all expect Release assets. Generic package registries (Forgejo `/api/packages/`, GitHub Packages generic) are used for *typed* packages consumed by *typed* package managers (npm/cargo/pip/docker), not bare binaries pulled by deploy scripts. #### Hero-specific reasons Releases wins 1. **No-auth bootstrap.** Release assets are world-readable. A fresh TFGrid VM with `curl` can pull without `FORGEJO_TOKEN`. Pkg registry needs token plumbing on every VM. 2. **UI signal.** `/releases` page tells humans what shipped and when. Pkg registry pages are machine-only. 3. **Tag-bound semantics.** A Release is bound to a tag — strong invariant. Pkg-registry versions can drift from git tags. 4. **Tooling alignment.** Any future `gh release download` / `cargo binstall` / install.sh story works out of the box. #### Where we already stand The **7 working services** (hero_proc, hero_router, hero_proxy, hero_db, hero_indexer, hero_aibroker, hero_osis) all do exactly this: their `build-linux.yaml` has **inline** `Create Release` + `Upload Release Assets` steps, plus an optional pkg-registry mirror. **They are the canonical Hero pattern.** No new helper needed — copy the working pattern into the 4 broken repos. #### Updated Phase 2 plan - **Cluster A+B (biz, books, browser, foundry):** copy the inline release-asset upload pattern from hero_proc's `build-linux.yaml` into each repo's workflow. Drop reliance on `build_lib.sh::publish_binaries` for release-asset publishing (keep it for optional pkg-registry mirror or remove it entirely). Re-tag, validate on heroci. **~2-3 h total** (one PR pattern, applied 4x). - **Cluster C (whiteboard):** unchanged — independent build-failure debug per [#136](https://forge.ourworld.tf/lhumina_code/hero_whiteboard/issues/136). - **Cluster D (editor, slides):** push tag (after Cluster A pattern lands in shared template), validate. - **Cluster E (matrixchat):** port hero_proc's `build-linux.yaml` directly. #### Skill ecosystem follow-up (not blocking) - `build_lib_ci` SKILL.md template currently calls only `publish_binaries` (pkg registry). Should be updated to match the canonical hero_proc pattern (inline release-asset upload + optional pkg-registry mirror). - `tfgrid_deploy` should switch its consumer to Release URLs to drop the auth-token requirement on TFGrid VMs. Filed for visibility; not blocking the 4-repo Phase 2 fix above.
Author
Owner

Phase 3 scope: WASM artifacts (hero_os + hero_archipelagos)

Adding to the roadmap. After Phase 2 (binary cluster A/B/C/D/E) lands, the next major chunk is WASM artifact distribution for the browser-side stack. Same architectural gap as cluster A, plus an additional consumer-side gap.

Producer state

Repo Workflow Lines Trigger Publishes to Tags exist? Releases?
hero_os release.yaml 91 v* tag pkg registry only (hero_os-web-<v>.tar.gz) v0.1.1/2/3 none
hero_archipelagos build-release.yaml 137 v* tag pkg registry only (hero_archipelagos-wasm.tar.gz) none none

Both have the same Cluster A bug — publish to pkg registry, never to Releases. hero_archipelagos additionally has never been tagged.

Consumer state — bigger gap

service_os.nu line 26: "service_os install — fetch source, cargo build, copy binaries". The deploy script builds locally (~25 min cold per CLAUDE.md) and never attempts to fetch the WASM tarball CI produces.

So even fixing the producer side gives us nothing until a svc_install_wasm_from_ci helper exists in hero_skills/tools/modules/services/lib.nu — alongside the existing svc_install_from_ci (binary-shaped). The shapes differ: WASM artifact is a .tar.gz of a directory tree that extracts into ~/hero/share/hero_os/public/ (or /islands/), not a single binary copied to ~/hero/bin/.

Why Phase 3 is the highest-leverage piece of the whole CI roadmap

Component Cold deploy time saved per VM
Binary services (--from-ci Phase 2) ~10 min cargo → ~30 sec download
hero_os WASM (Phase 3) ~25 min dx build --release → ~30 sec download + tar extract

hero_os is the front-door for every demo VM and every contributor onboarding. Sub-minute fresh-deploy is unblocked entirely by Phase 3.

Phase 3 effort estimate (~8-12 h)

  • ~1 h: workflow diffs to both repos (near-copy of cluster A pattern, single .tar.gz asset instead of multiple binaries)
  • ~2-3 h: svc_install_wasm_from_ci helper in hero_skills/lib.nu (new shape, not a one-line addition — handles tarball download + extract + content-hash bookkeeping)
  • ~1 h: rewire service_os.nu install path to prefer --from-ci, fall back to local dx build
  • ~1 h: wire archipelagos consumer (where do islands install? same flow or separate?)
  • ~1-2 h: tag hero_archipelagos for the first time, validate full pipeline end-to-end on heroci
  • buffer for cold-cache surprises

Updated roadmap

Phase What Status
1 7 binary services with working --from-ci done (PRs #193/#195/#196/#197)
2 7 broken binary services (cluster A/B/C/D/E) 🟡 in progress — cluster A pilot on hero_browser running now
3 hero_os + hero_archipelagos WASM --from-ci next major chunk; per-repo issues filed for tracking
4 Skill ecosystem cleanup (build_lib_ci template, tfgrid_deploy Releases default) lower priority, mostly docs

End-state ("complete CI via Hero OS nu-shell"): every Hero service (binary or WASM) ships via tag → CI → Releases → service_<name> install --from-ci. Bare TFGrid VM → fully working Hero OS in <2 min wall-clock. Today's 25-min cold demo deploy → ~90 sec.

### Phase 3 scope: WASM artifacts (hero_os + hero_archipelagos) Adding to the roadmap. After Phase 2 (binary cluster A/B/C/D/E) lands, the next major chunk is WASM artifact distribution for the browser-side stack. **Same architectural gap as cluster A, plus an additional consumer-side gap.** #### Producer state | Repo | Workflow | Lines | Trigger | Publishes to | Tags exist? | Releases? | |---|---|---|---|---|---|---| | hero_os | `release.yaml` | 91 | `v*` tag | **pkg registry only** (`hero_os-web-<v>.tar.gz`) | v0.1.1/2/3 | **none** | | hero_archipelagos | `build-release.yaml` | 137 | `v*` tag | **pkg registry only** (`hero_archipelagos-wasm.tar.gz`) | **none** | **none** | Both have the same Cluster A bug — publish to pkg registry, never to Releases. hero_archipelagos additionally has never been tagged. #### Consumer state — bigger gap `service_os.nu` line 26: *"service_os install — fetch source, **cargo build**, copy binaries"*. The deploy script builds locally (~25 min cold per CLAUDE.md) and never attempts to fetch the WASM tarball CI produces. So even fixing the producer side gives us nothing until a `svc_install_wasm_from_ci` helper exists in `hero_skills/tools/modules/services/lib.nu` — alongside the existing `svc_install_from_ci` (binary-shaped). The shapes differ: WASM artifact is a `.tar.gz` of a directory tree that extracts into `~/hero/share/hero_os/public/` (or `/islands/`), not a single binary copied to `~/hero/bin/`. #### Why Phase 3 is the highest-leverage piece of the whole CI roadmap | Component | Cold deploy time saved per VM | |---|---| | Binary services (`--from-ci` Phase 2) | ~10 min cargo → ~30 sec download | | **hero_os WASM (Phase 3)** | **~25 min `dx build --release` → ~30 sec download + tar extract** | hero_os is the front-door for every demo VM and every contributor onboarding. Sub-minute fresh-deploy is unblocked entirely by Phase 3. #### Phase 3 effort estimate (~8-12 h) - ~1 h: workflow diffs to both repos (near-copy of cluster A pattern, single `.tar.gz` asset instead of multiple binaries) - ~2-3 h: `svc_install_wasm_from_ci` helper in hero_skills/lib.nu (new shape, not a one-line addition — handles tarball download + extract + content-hash bookkeeping) - ~1 h: rewire `service_os.nu` install path to prefer `--from-ci`, fall back to local `dx build` - ~1 h: wire archipelagos consumer (where do islands install? same flow or separate?) - ~1-2 h: tag hero_archipelagos for the first time, validate full pipeline end-to-end on heroci - buffer for cold-cache surprises #### Updated roadmap | Phase | What | Status | |---|---|---| | 1 | 7 binary services with working `--from-ci` | ✅ done (PRs [#193](https://forge.ourworld.tf/lhumina_code/hero_skills/pulls/193)/[#195](https://forge.ourworld.tf/lhumina_code/hero_skills/pulls/195)/[#196](https://forge.ourworld.tf/lhumina_code/hero_skills/pulls/196)/[#197](https://forge.ourworld.tf/lhumina_code/hero_skills/pulls/197)) | | 2 | 7 broken binary services (cluster A/B/C/D/E) | 🟡 in progress — cluster A pilot on hero_browser running now | | **3** | **hero_os + hero_archipelagos WASM `--from-ci`** | **⏳ next major chunk; per-repo issues filed for tracking** | | 4 | Skill ecosystem cleanup (`build_lib_ci` template, `tfgrid_deploy` Releases default) | ⏳ lower priority, mostly docs | **End-state ("complete CI via Hero OS nu-shell"):** every Hero service (binary or WASM) ships via tag → CI → Releases → `service_<name> install --from-ci`. Bare TFGrid VM → fully working Hero OS in <2 min wall-clock. Today's 25-min cold demo deploy → ~90 sec.
Author
Owner

2026-05-04 — Session 55 producer-side check-in: 2 of 8 unblocked

Re-audited the 8 cluster A/B/C/D/E targets. Two have shipped Release assets since the audit comment (28672):

Repo Latest release Assets Suffix Notes
hero_books v0.1.6-rc1 5 linux-amd64 hero_books, hero_books_admin, hero_books_server, hero_books_ui, hero_docs
hero_browser v0.1.4-rc5 6 linux-amd64 + linux-arm64 hero_browser, hero_browser_server, hero_browser_ui (each in both arches)

Both unblocked the consumer-side wiring that was reverted at session 54. Working on the consumer-side wiring next:

  • service_books — re-add --from-ci/--version, asset suffix linux-amd64, target tag v0.1.6-rc1. Smoke on heroci.
  • service_browser — same shape, target tag v0.1.4-rc5.

Two PRs on hero_skills, one per service, mirroring the #196 / #197 cadence.

Updated rollout map

Phase What Status
1 7 binary services with --from-ci (proc, router, proxy, db, indexer, aibroker, osis) done
2 books + browser unblocked, wiring incoming this session 🟡 in progress
2 cont. biz, foundry (cluster A propagation needed) producer-blocked
2 cont. whiteboard (build-failure debug per #136) producer-blocked
2 cont. editor, slides (no tag pushed yet) producer-blocked
2 cont. matrixchat (no release workflow) producer-blocked
3 hero_os + hero_archipelagos WASM next major chunk

End-state target unchanged: every Hero service ships via tag → CI → Releases → service_<name> install --from-ci.

Signed-off-by: mik-tf

## 2026-05-04 — Session 55 producer-side check-in: 2 of 8 unblocked Re-audited the 8 cluster A/B/C/D/E targets. Two have shipped Release assets since the audit comment ([28672](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/54#issuecomment-28672)): | Repo | Latest release | Assets | Suffix | Notes | |---|---|---|---|---| | `hero_books` | [v0.1.6-rc1](https://forge.ourworld.tf/lhumina_code/hero_books/releases/tag/v0.1.6-rc1) | 5 | `linux-amd64` | hero_books, hero_books_admin, hero_books_server, hero_books_ui, hero_docs | | `hero_browser` | [v0.1.4-rc5](https://forge.ourworld.tf/lhumina_code/hero_browser/releases/tag/v0.1.4-rc5) | 6 | `linux-amd64` + `linux-arm64` | hero_browser, hero_browser_server, hero_browser_ui (each in both arches) | Both unblocked the consumer-side wiring that was reverted at session 54. Working on the consumer-side wiring next: - `service_books` — re-add `--from-ci`/`--version`, asset suffix `linux-amd64`, target tag `v0.1.6-rc1`. Smoke on heroci. - `service_browser` — same shape, target tag `v0.1.4-rc5`. Two PRs on `hero_skills`, one per service, mirroring the [#196](https://forge.ourworld.tf/lhumina_code/hero_skills/pulls/196) / [#197](https://forge.ourworld.tf/lhumina_code/hero_skills/pulls/197) cadence. ### Updated rollout map | Phase | What | Status | |---|---|---| | 1 | 7 binary services with `--from-ci` (proc, router, proxy, db, indexer, aibroker, osis) | ✅ done | | 2 | books + browser unblocked, wiring incoming this session | 🟡 in progress | | 2 cont. | biz, foundry (cluster A propagation needed) | ⏳ producer-blocked | | 2 cont. | whiteboard (build-failure debug per [#136](https://forge.ourworld.tf/lhumina_code/hero_whiteboard/issues/136)) | ⏳ producer-blocked | | 2 cont. | editor, slides (no tag pushed yet) | ⏳ producer-blocked | | 2 cont. | matrixchat (no release workflow) | ⏳ producer-blocked | | 3 | hero_os + hero_archipelagos WASM | ⏳ next major chunk | End-state target unchanged: every Hero service ships via tag → CI → Releases → `service_<name> install --from-ci`. Signed-off-by: mik-tf
Author
Owner

2026-05-04 — Complete current-state recap

Posting the full picture in one place after session 55's smoke validation. State below reflects what's live on heroci + Forgejo Releases + hero_skills/development right now.

heroci.gent01.grid.tf (CI-validation VM)

Check Result
HTTPS gateway 502 — expected; no nginx/hero_router stack on heroci (CI-validation only per session 53 manifest)
SSH via IPv4 178.251.27.21 OK
Uptime 2 days
nu 0.111.0
/root/hero/bin/ 25 hero_ binaries*, all ELF static-pie, every one of them got there via --from-ci (no cargo has run on this box)
hero_proc daemon installed but not running — no rpc.sock present (L-05 / #64 start --from-ci gap)

Producer × consumer matrix (all 15 services)

Service Producer (Forgejo Release) Consumer (--from-ci wired) E2E
hero_proc v0.4.4 / 3 assets #193
hero_router v0.2.2 / 1 asset #195
hero_proxy v0.5.0 / 3 assets #195
hero_db v0.3.2 / 6 assets #195
hero_indexer v0.1.3 / 3 assets #195
hero_aibroker v0.1.1 / 4 assets #196
hero_osis v1.0.0-rc6 / 4 assets #197
hero_books v0.1.6-rc1 / 5 assets 🟡 #200 (open, smoke-tested green) 🟡 pending merge
hero_browser v0.1.4-rc5 / 6 assets (amd64+arm64) next
hero_biz no releases producer-blocked
hero_foundry no releases producer-blocked
hero_whiteboard no releases producer-blocked
hero_editor no releases producer-blocked
hero_slides no releases producer-blocked
hero_matrixchat no release workflow at all producer-blocked

Coverage summary

  • 8/15 services fully E2E today (producer ships → consumer pulls → smoke passes on heroci).
  • 1/15 producer-ready, consumer-not-wired (hero_browser — wiring next this session).
  • 6/15 producer-blocked — 5 need a tag pushed (biz, foundry, whiteboard, editor, slides), 1 needs CI workflow added (matrixchat).
  • After PR #200 merges + hero_browser PR lands: coverage → 9/15 E2E.

Spot-checked binaries on heroci (all static-pie ELF, version-correct)

hero_proc 0.4.4              hero_books 0.1.5              hero_db 0.3.2
hero_router 0.2.1            hero_books_server 0.1.5       hero_indexer 0.1.3
hero_proxy (no --version)    hero_books_admin/ui/_docs     hero_osis (rpc 0.5.0)
hero_aibroker (4 bins)

Outstanding architectural gaps (independent of binary rollout)

  • L-05service_X start always purges & rebuilds via cargo, defeating --from-ci installs on CI-paved hosts → hero_demo#64.
  • Phase 3 — WASM artifacts (hero_os + hero_archipelagos), same producer-side cluster A bug + new consumer-shape needed → already scoped in this issue's c28695.

Session 55 in flight

  • PR #200 — service_books wiring opened, smoke green
  • hero_browser wiring next (mirrors books, asset suffix linux-amd64, target tag v0.1.4-rc5)

Signed-off-by: mik-tf

## 2026-05-04 — Complete current-state recap Posting the full picture in one place after session 55's smoke validation. State below reflects what's live on heroci + Forgejo Releases + `hero_skills/development` right now. ### heroci.gent01.grid.tf (CI-validation VM) | Check | Result | |---|---| | HTTPS gateway | 502 — expected; no nginx/hero_router stack on heroci (CI-validation only per session 53 manifest) | | SSH via IPv4 `178.251.27.21` | OK | | Uptime | 2 days | | nu | 0.111.0 | | `/root/hero/bin/` | **25 hero_* binaries**, all `ELF static-pie`, every one of them got there via `--from-ci` (no cargo has run on this box) | | `hero_proc` daemon | installed but **not running** — no rpc.sock present (L-05 / [#64](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/64) `start --from-ci` gap) | ### Producer × consumer matrix (all 15 services) | Service | Producer (Forgejo Release) | Consumer (`--from-ci` wired) | E2E | |---|---|---|---| | `hero_proc` | ✅ [v0.4.4](https://forge.ourworld.tf/lhumina_code/hero_proc/releases/tag/v0.4.4) / 3 assets | ✅ [#193](https://forge.ourworld.tf/lhumina_code/hero_skills/pulls/193) | ✅ | | `hero_router` | ✅ [v0.2.2](https://forge.ourworld.tf/lhumina_code/hero_router/releases/tag/v0.2.2) / 1 asset | ✅ [#195](https://forge.ourworld.tf/lhumina_code/hero_skills/pulls/195) | ✅ | | `hero_proxy` | ✅ [v0.5.0](https://forge.ourworld.tf/lhumina_code/hero_proxy/releases/tag/v0.5.0) / 3 assets | ✅ [#195](https://forge.ourworld.tf/lhumina_code/hero_skills/pulls/195) | ✅ | | `hero_db` | ✅ [v0.3.2](https://forge.ourworld.tf/lhumina_code/hero_db/releases/tag/v0.3.2) / 6 assets | ✅ [#195](https://forge.ourworld.tf/lhumina_code/hero_skills/pulls/195) | ✅ | | `hero_indexer` | ✅ [v0.1.3](https://forge.ourworld.tf/lhumina_code/hero_indexer/releases/tag/v0.1.3) / 3 assets | ✅ [#195](https://forge.ourworld.tf/lhumina_code/hero_skills/pulls/195) | ✅ | | `hero_aibroker` | ✅ [v0.1.1](https://forge.ourworld.tf/lhumina_code/hero_aibroker/releases/tag/v0.1.1) / 4 assets | ✅ [#196](https://forge.ourworld.tf/lhumina_code/hero_skills/pulls/196) | ✅ | | `hero_osis` | ✅ [v1.0.0-rc6](https://forge.ourworld.tf/lhumina_code/hero_osis/releases/tag/v1.0.0-rc6) / 4 assets | ✅ [#197](https://forge.ourworld.tf/lhumina_code/hero_skills/pulls/197) | ✅ | | `hero_books` | ✅ [v0.1.6-rc1](https://forge.ourworld.tf/lhumina_code/hero_books/releases/tag/v0.1.6-rc1) / 5 assets | 🟡 [#200](https://forge.ourworld.tf/lhumina_code/hero_skills/pulls/200) (open, smoke-tested green) | 🟡 pending merge | | `hero_browser` | ✅ [v0.1.4-rc5](https://forge.ourworld.tf/lhumina_code/hero_browser/releases/tag/v0.1.4-rc5) / 6 assets *(amd64+arm64)* | ⏳ next | ⏳ | | `hero_biz` | ❌ no releases | ❌ producer-blocked | ❌ | | `hero_foundry` | ❌ no releases | ❌ producer-blocked | ❌ | | `hero_whiteboard` | ❌ no releases | ❌ producer-blocked | ❌ | | `hero_editor` | ❌ no releases | ❌ producer-blocked | ❌ | | `hero_slides` | ❌ no releases | ❌ producer-blocked | ❌ | | `hero_matrixchat` | ❌ no release workflow at all | ❌ producer-blocked | ❌ | ### Coverage summary - **8/15 services fully E2E** today (producer ships → consumer pulls → smoke passes on heroci). - **1/15 producer-ready, consumer-not-wired** (hero_browser — wiring next this session). - **6/15 producer-blocked** — 5 need a tag pushed (biz, foundry, whiteboard, editor, slides), 1 needs CI workflow added (matrixchat). - After PR #200 merges + hero_browser PR lands: **coverage → 9/15 E2E**. ### Spot-checked binaries on heroci (all static-pie ELF, version-correct) ```text hero_proc 0.4.4 hero_books 0.1.5 hero_db 0.3.2 hero_router 0.2.1 hero_books_server 0.1.5 hero_indexer 0.1.3 hero_proxy (no --version) hero_books_admin/ui/_docs hero_osis (rpc 0.5.0) hero_aibroker (4 bins) ``` ### Outstanding architectural gaps (independent of binary rollout) - **L-05** — `service_X start` always purges & rebuilds via cargo, defeating `--from-ci` installs on CI-paved hosts → [hero_demo#64](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/64). - **Phase 3** — WASM artifacts (hero_os + hero_archipelagos), same producer-side cluster A bug + new consumer-shape needed → already scoped in this issue's [c28695](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/54#issuecomment-28695). ### Session 55 in flight - ✅ PR [#200](https://forge.ourworld.tf/lhumina_code/hero_skills/pulls/200) — service_books wiring opened, smoke green - ⏳ hero_browser wiring next (mirrors books, asset suffix `linux-amd64`, target tag `v0.1.4-rc5`) Signed-off-by: mik-tf
Author
Owner

2026-05-04 — Session 55 close: 9/15 services live with --from-ci

Both PRs merged. Coverage now 9/15 end-to-end.

Landed

Re-validated end-to-end on heroci from merged development HEAD

=== service_books install --from-ci --reset ===
  ⤓ 5 binaries (hero_books, _server, _ui, _admin, hero_docs) → /root/hero/bin/
  ✓ hero_books installed from CI artifacts (release v0.1.6-rc1)

=== service_browser install --from-ci --reset ===
  ⤓ 3 binaries (hero_browser, _server, _ui) → /root/hero/bin/
  ✓ hero_browser installed from CI artifacts (release v0.1.4-rc5)

Both --version latest resolutions correct. All binaries ELF 64-bit LSB pie executable, statically linked.

Updated coverage

Service Producer Consumer E2E
hero_proc, router, proxy, db, indexer, aibroker, osis
hero_books v0.1.6-rc1 / 5 assets #200
hero_browser v0.1.4-rc5 / 6 assets #201 (linux-amd64 only; arm64 deferred)
hero_biz, foundry, whiteboard, editor, slides, matrixchat producer-blocked

9/15 services fully E2E. 6/15 still producer-blocked — natural session 56 entry point is cluster A propagation to hero_biz + hero_foundry (port the inline release-asset upload pattern from hero_books's working build-linux.yaml into each repo's workflow, then re-tag).

Next session entry points (in suggested order)

  1. Cluster A propagation — hero_biz + hero_foundry. ~2-3 h. Pattern is now boring.
  2. Cluster D — first tags — hero_editor + hero_slides. ~30 min each. Workflow exists; just push a tag.
  3. Cluster C — debug — hero_whiteboard build failure on existing tags. 2-4 h.
  4. Cluster E — workflow port — hero_matrixchat. 1-2 h.
  5. Phase 1.5 / L-05start --from-ci lifecycle gap, hero_demo#64. Independent track.
  6. Phase 3 — WASM (hero_os + hero_archipelagos). Highest leverage per cold-deploy time saved.

Signed-off-by: mik-tf

## 2026-05-04 — Session 55 close: 9/15 services live with `--from-ci` Both PRs merged. Coverage now **9/15** end-to-end. ### Landed - [hero_skills#200](https://forge.ourworld.tf/lhumina_code/hero_skills/pulls/200) — `service_books` re-add `--from-ci` (linux-amd64, v0.1.6-rc1) — squash-merged at [`2f38fc89`](https://forge.ourworld.tf/lhumina_code/hero_skills/commit/2f38fc89) - [hero_skills#201](https://forge.ourworld.tf/lhumina_code/hero_skills/pulls/201) — `service_browser` add `--from-ci` (linux-amd64, v0.1.4-rc5) — squash-merged at [`2ed37497`](https://forge.ourworld.tf/lhumina_code/hero_skills/commit/2ed37497) ### Re-validated end-to-end on heroci from merged `development` HEAD ```text === service_books install --from-ci --reset === ⤓ 5 binaries (hero_books, _server, _ui, _admin, hero_docs) → /root/hero/bin/ ✓ hero_books installed from CI artifacts (release v0.1.6-rc1) === service_browser install --from-ci --reset === ⤓ 3 binaries (hero_browser, _server, _ui) → /root/hero/bin/ ✓ hero_browser installed from CI artifacts (release v0.1.4-rc5) ``` Both `--version latest` resolutions correct. All binaries `ELF 64-bit LSB pie executable, statically linked`. ### Updated coverage | Service | Producer | Consumer | E2E | |---|---|---|---| | hero_proc, router, proxy, db, indexer, aibroker, osis | ✅ | ✅ | ✅ | | **hero_books** | ✅ v0.1.6-rc1 / 5 assets | ✅ #200 | ✅ | | **hero_browser** | ✅ v0.1.4-rc5 / 6 assets | ✅ #201 (linux-amd64 only; arm64 deferred) | ✅ | | hero_biz, foundry, whiteboard, editor, slides, matrixchat | ❌ | ❌ | ❌ producer-blocked | **9/15 services fully E2E.** 6/15 still producer-blocked — natural session 56 entry point is cluster A propagation to hero_biz + hero_foundry (port the inline release-asset upload pattern from hero_books's working `build-linux.yaml` into each repo's workflow, then re-tag). ### Next session entry points (in suggested order) 1. **Cluster A propagation** — hero_biz + hero_foundry. ~2-3 h. Pattern is now boring. 2. **Cluster D — first tags** — hero_editor + hero_slides. ~30 min each. Workflow exists; just push a tag. 3. **Cluster C — debug** — hero_whiteboard build failure on existing tags. 2-4 h. 4. **Cluster E — workflow port** — hero_matrixchat. 1-2 h. 5. **Phase 1.5 / L-05** — `start --from-ci` lifecycle gap, hero_demo#64. Independent track. 6. **Phase 3** — WASM (hero_os + hero_archipelagos). Highest leverage per cold-deploy time saved. Signed-off-by: mik-tf
Author
Owner

Sibling cleanup — asset naming convention

Filed home#212 — standardize CI release-asset naming on Rust target triples (honest libc per repo).

Three conventions in use today across the 9 working repos, and several assets misrepresent their libc (hero_proc-linux-amd64 is musl; hero_books-linux-amd64 is gnu; hero_browser-linux-arm64 is gnu). Migration is pure-rename via Forgejo PATCH API — no rebuilds for any of the 9. Future producer-side work (the 6 still-blocked repos under this issue) adopts the new convention from the first tag-cut.

Independent of this issue's --from-ci rollout but the same surface area; cross-posting so anyone working on Phase 2 cluster-A propagation knows to use *-x86_64-unknown-linux-musl (or -gnu) directly in the workflow + service module rather than perpetuating linux-amd64 / linux-amd64-musl.

Signed-off-by: mik-tf

## Sibling cleanup — asset naming convention Filed [home#212](https://forge.ourworld.tf/lhumina_code/home/issues/212) — standardize CI release-asset naming on Rust target triples (honest libc per repo). Three conventions in use today across the 9 working repos, and several assets misrepresent their libc (`hero_proc-linux-amd64` is musl; `hero_books-linux-amd64` is gnu; `hero_browser-linux-arm64` is gnu). Migration is pure-rename via Forgejo `PATCH` API — no rebuilds for any of the 9. Future producer-side work (the 6 still-blocked repos under this issue) adopts the new convention from the first tag-cut. Independent of this issue's `--from-ci` rollout but the same surface area; cross-posting so anyone working on Phase 2 cluster-A propagation knows to use `*-x86_64-unknown-linux-musl` (or `-gnu`) directly in the workflow + service module rather than perpetuating `linux-amd64` / `linux-amd64-musl`. Signed-off-by: mik-tf
Author
Owner

Phase 2 — execution plan for the remaining 6 services

Tiered easiest → hardest, lowest-variance first. Cumulative effort ~12-15h focused work, splittable across 3-5 sessions.

Tier 1 — push tag, watch CI, ship (~1h each)

Workflow exists, has inline release logic (or close to it), no prerequisite blockers. Each is essentially git tag v0.1.0 && git push origin v0.1.0 after sanity-checking the workflow's binary list matches the corresponding service_X.nu SVX_BINARIES, then a one-line consumer wiring + heroci smoke.

# Repo Audit issue Why this position
1 hero_slides #42 Simplest possible path: workflow exists, inline-release-logic already there, no shared-helper bug, just push tag. Lowest variance.
2 hero_editor #5 Same shape but uses cluster-A shared-helper template. If it "just works" we win cheap; if not, the failure mode informs Tier 2.

Tier 2 — port inline-upload pattern from hero_books (~2-3h each)

Producer-side workflow is wired and tag-triggered, but uses the broken shared-helper that writes to pkg registry instead of Releases. Each needs the inline release-asset upload pattern from hero_books's build-linux.yaml (which just landed and works).

# Repo Audit issue Notes
3 hero_biz #13 First Tier-2 — designs the port from hero_books pattern. Pattern then reusable for foundry.
4 hero_foundry #26 Mechanical replication of biz's pattern. Sanity-validates the template is portable.

Tier 3 — full template port (~1-2h)

# Repo Audit issue Notes
5 hero_matrixchat #4 No release workflow at all; needs new build-linux.yaml ported from hero_books's working version + first tag.

Tier 4 — debug (~2-4h)

# Repo Audit issue Notes
6 hero_whiteboard #136 Highest-variance: tag exists but 9 prior CI runs failed. Needs run-log inspection to find actual build error before any naming/upload work matters. Saved for last to benefit from the muscle memory built across Tiers 1-3.

Cross-cutting decisions adopted

  • Naming convention: all new tags adopt the home#212 target-triple shape from day one (<bin>-x86_64-unknown-linux-musl for musl-built, -gnu for glibc-built — honest about what the workflow actually compiles). This means the new producer-side workflows include the new asset naming directly, no future rename pass needed for these 6 repos.
  • Architecture scope: amd64-only for the 6 producer-blocked repos. arm64 can be added per-repo when there's an actual ARM deploy target, but adds 2-4× per-repo work and Hero has no ARM target today.
  • Tag-cut authority: each git push origin v*.*.* is a visible-to-others action; per-repo authorization at the moment of tagging.

Path to "complete CI via Hero OS nu-shell"

After this Phase 2 finishes (all 15 services with --from-ci):

Phase What Effort
2.5 home#212 — rename existing 9 repos' assets to target-triple naming ~4-5h
1.5 / L-05 / #64 Extend --from-ci to start lifecycle (no purge-and-rebuild) ~5-8h
3 hero_os + hero_archipelagos WASM via --from-ci ~8-12h
4 service_install_all --from-ci (whole-stack default) ~2-4h

Then a fresh TFGrid VM → fully working Hero OS in ~90 sec wall-clock vs today's 25-min cold demo deploy.

Starting Tier 1 now: hero_slides first.

Signed-off-by: mik-tf

## Phase 2 — execution plan for the remaining 6 services Tiered easiest → hardest, lowest-variance first. Cumulative effort ~12-15h focused work, splittable across 3-5 sessions. ### Tier 1 — push tag, watch CI, ship (~1h each) Workflow exists, has inline release logic (or close to it), no prerequisite blockers. Each is essentially `git tag v0.1.0 && git push origin v0.1.0` after sanity-checking the workflow's binary list matches the corresponding `service_X.nu` `SVX_BINARIES`, then a one-line consumer wiring + heroci smoke. | # | Repo | Audit issue | Why this position | |---|---|---|---| | 1 | **hero_slides** | [#42](https://forge.ourworld.tf/lhumina_code/hero_slides/issues/42) | Simplest possible path: workflow exists, inline-release-logic already there, no shared-helper bug, just push tag. Lowest variance. | | 2 | **hero_editor** | [#5](https://forge.ourworld.tf/lhumina_code/hero_editor/issues/5) | Same shape but uses cluster-A shared-helper template. If it "just works" we win cheap; if not, the failure mode informs Tier 2. | ### Tier 2 — port inline-upload pattern from hero_books (~2-3h each) Producer-side workflow is wired and tag-triggered, but uses the broken shared-helper that writes to pkg registry instead of Releases. Each needs the inline release-asset upload pattern from hero_books's `build-linux.yaml` (which just landed and works). | # | Repo | Audit issue | Notes | |---|---|---|---| | 3 | **hero_biz** | [#13](https://forge.ourworld.tf/lhumina_code/hero_biz/issues/13) | First Tier-2 — designs the port from hero_books pattern. Pattern then reusable for foundry. | | 4 | **hero_foundry** | [#26](https://forge.ourworld.tf/lhumina_code/hero_foundry/issues/26) | Mechanical replication of biz's pattern. Sanity-validates the template is portable. | ### Tier 3 — full template port (~1-2h) | # | Repo | Audit issue | Notes | |---|---|---|---| | 5 | **hero_matrixchat** | [#4](https://forge.ourworld.tf/lhumina_code/hero_matrixchat/issues/4) | No release workflow at all; needs new `build-linux.yaml` ported from hero_books's working version + first tag. | ### Tier 4 — debug (~2-4h) | # | Repo | Audit issue | Notes | |---|---|---|---| | 6 | **hero_whiteboard** | [#136](https://forge.ourworld.tf/lhumina_code/hero_whiteboard/issues/136) | Highest-variance: tag exists but 9 prior CI runs failed. Needs run-log inspection to find actual build error before any naming/upload work matters. Saved for last to benefit from the muscle memory built across Tiers 1-3. | ## Cross-cutting decisions adopted - **Naming convention:** all new tags adopt the [home#212](https://forge.ourworld.tf/lhumina_code/home/issues/212) target-triple shape from day one (`<bin>-x86_64-unknown-linux-musl` for musl-built, `-gnu` for glibc-built — honest about what the workflow actually compiles). This means the new producer-side workflows include the new asset naming directly, no future rename pass needed for these 6 repos. - **Architecture scope:** amd64-only for the 6 producer-blocked repos. arm64 can be added per-repo when there's an actual ARM deploy target, but adds 2-4× per-repo work and Hero has no ARM target today. - **Tag-cut authority:** each `git push origin v*.*.*` is a visible-to-others action; per-repo authorization at the moment of tagging. ## Path to "complete CI via Hero OS nu-shell" After this Phase 2 finishes (all 15 services with `--from-ci`): | Phase | What | Effort | |---|---|---| | 2.5 | home#212 — rename existing 9 repos' assets to target-triple naming | ~4-5h | | 1.5 / L-05 / [#64](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/64) | Extend `--from-ci` to `start` lifecycle (no purge-and-rebuild) | ~5-8h | | 3 | hero_os + hero_archipelagos WASM via `--from-ci` | ~8-12h | | 4 | `service_install_all --from-ci` (whole-stack default) | ~2-4h | Then a fresh TFGrid VM → fully working Hero OS in **~90 sec** wall-clock vs today's 25-min cold demo deploy. Starting Tier 1 now: **hero_slides** first. Signed-off-by: mik-tf
Author
Owner

2026-05-04 — Slides E2E complete: 10/15 services live + lessons captured

Slides landed end-to-end

Layer What Reference
Producer fix hero_slides_lib reqwest → rustls (musl unblock), workspace fmt + clippy clean hero_slides#45 merged at 71221e1
Asset naming release.yaml adopts target-triple convention per home#212 hero_slides#46 merged at 54646cf
Producer release v0.1.0-rc2 — 3 assets named hero_slides{,_server,_ui}-x86_64-unknown-linux-musl first to use new naming convention
Consumer service_slides.nu --from-ci wired with x86_64-unknown-linux-musl suffix hero_skills#202 merged
heroci smoke 3 binaries downloaded, static-pie ELF, install path verified

Live --from-ci coverage now: 10/15 services — proc, router, proxy, db, indexer, aibroker, osis, books, browser, slides (new).

Lessons captured (apply to remaining 5 producer-blocked repos)

These cost us ~2-3 hours of friction on slides that we don't have to repeat:

  1. FORGEJO_TOKEN secret must exist in each new repo BEFORE the first tag push. hero_slides hit the exact same bump as hero_osis (c28650) — the release.yaml POSTs to /api/v1/repos/.../releases with the secret; if it's unset, curl -sf swallows the 401 and the python parser crashes on empty stdin. The build itself succeeds, but the release is never created. Cosmetic failed-run noise persists on the tag's check status forever after.

    Action for next 5 repos: before tagging, verify with:

    curl -s -H "Authorization: token $FORGEJO_TOKEN" \
      "https://forge.ourworld.tf/api/v1/repos/lhumina_code/<repo>/actions/secrets" \
      | grep FORGEJO_TOKEN
    

    If absent: set it via UI before tagging.

  2. Run the workspace gate locally before pushing any cleanup PR. Per feedback_workspace_build_before_merge.md: cargo fmt --check && cargo clippy --workspace --all-targets -- -D warnings && cargo build --workspace --release. Cost runner cycles + 2 PR rounds on slides (separate fmt + musl PRs initially) before user redirected to bundle. Most of the producer-blocked repos likely have similar chronic fmt+clippy drift built up — bundle hygiene with the bug fix in one PR.

  3. Adopt the home#212 target-triple naming from day one in each new repo's release.yaml. The matrix artifact: field in release.yaml should match the target: field exactly. No future rename needed when the repo's first tag ships.

  4. Match service_X.nu SVX_BINARIES to buildenv.sh BINARIES before tagging. If they drift, --from-ci fails because the consumer asks for binaries the producer doesn't publish.

  5. Consumer wiring suffix matches the new convention. service_X.nu calls svc_install_from_ci ... "x86_64-unknown-linux-musl" (or -gnu if the workflow is glibc), not the old linux-amd64 / linux-amd64-musl shapes.

Updated rollout map

Phase Status
1 — 7 binary services with --from-ci (proc, router, proxy, db, indexer, aibroker, osis) done
2.A — books + browser (cluster A pilot, old linux-amd64 naming) done
2.B — slides (cluster D, target-triple naming, sets precedent for next 5) done
2.C — editor (different bug: bun missing in CI runner) next
2.D — biz, foundry (cluster A propagation, port hero_books inline pattern)
2.E — matrixchat (no release workflow at all, port template)
2.F — whiteboard (build-failure debug per #136)
2.5 — home#212 rename existing 9 repos to target-triple naming (~4-5h, pure API rename)
1.5 / #64 — extend --from-ci to start lifecycle
3 — hero_os + hero_archipelagos WASM --from-ci
4 — service_install_all --from-ci

Next session (56) suggested order

  1. hero_editor — same musl/rustls treatment if needed, plus the bun runner gap (likely a Containerfile or apt-install fix in the CI image). Workflow already uses target-triple internally; just apply lessons-learned checklist.
  2. hero_biz — cluster A producer-side fix (port hero_books inline pattern, with target-triple naming from day one).
  3. hero_foundry — mechanical replication of hero_biz.
  4. hero_matrixchat — full template port (also using target-triple naming from day one).
  5. hero_whiteboard — debug existing failed runs.

Each repo at ~30 min - 2 h. Estimate ~6-8 h remaining for full Phase 2 completion → coverage 15/15.

Signed-off-by: mik-tf

## 2026-05-04 — Slides E2E complete: 10/15 services live + lessons captured ### Slides landed end-to-end | Layer | What | Reference | |---|---|---| | Producer fix | `hero_slides_lib` reqwest → rustls (musl unblock), workspace fmt + clippy clean | [hero_slides#45](https://forge.ourworld.tf/lhumina_code/hero_slides/pulls/45) merged at `71221e1` | | Asset naming | `release.yaml` adopts target-triple convention per [home#212](https://forge.ourworld.tf/lhumina_code/home/issues/212) | [hero_slides#46](https://forge.ourworld.tf/lhumina_code/hero_slides/pulls/46) merged at `54646cf` | | Producer release | [`v0.1.0-rc2`](https://forge.ourworld.tf/lhumina_code/hero_slides/releases/tag/v0.1.0-rc2) — 3 assets named `hero_slides{,_server,_ui}-x86_64-unknown-linux-musl` | first to use new naming convention | | Consumer | `service_slides.nu --from-ci` wired with `x86_64-unknown-linux-musl` suffix | [hero_skills#202](https://forge.ourworld.tf/lhumina_code/hero_skills/pulls/202) merged | | heroci smoke | 3 binaries downloaded, static-pie ELF, install path verified | ✅ | **Live `--from-ci` coverage now: 10/15 services** — proc, router, proxy, db, indexer, aibroker, osis, books, browser, **slides** (new). ### Lessons captured (apply to remaining 5 producer-blocked repos) These cost us ~2-3 hours of friction on slides that we don't have to repeat: 1. **`FORGEJO_TOKEN` secret must exist in each new repo BEFORE the first tag push.** hero_slides hit the exact same bump as hero_osis ([c28650](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/54#issuecomment-28650)) — the `release.yaml` POSTs to `/api/v1/repos/.../releases` with the secret; if it's unset, `curl -sf` swallows the 401 and the python parser crashes on empty stdin. The build itself succeeds, but the release is never created. Cosmetic failed-run noise persists on the tag's check status forever after. **Action for next 5 repos:** before tagging, verify with: ```bash curl -s -H "Authorization: token $FORGEJO_TOKEN" \ "https://forge.ourworld.tf/api/v1/repos/lhumina_code/<repo>/actions/secrets" \ | grep FORGEJO_TOKEN ``` If absent: set it via UI before tagging. 2. **Run the workspace gate locally before pushing any cleanup PR.** Per `feedback_workspace_build_before_merge.md`: `cargo fmt --check && cargo clippy --workspace --all-targets -- -D warnings && cargo build --workspace --release`. Cost runner cycles + 2 PR rounds on slides (separate fmt + musl PRs initially) before user redirected to bundle. Most of the producer-blocked repos likely have similar chronic fmt+clippy drift built up — bundle hygiene with the bug fix in one PR. 3. **Adopt the [home#212](https://forge.ourworld.tf/lhumina_code/home/issues/212) target-triple naming from day one** in each new repo's `release.yaml`. The matrix `artifact:` field in `release.yaml` should match the `target:` field exactly. No future rename needed when the repo's first tag ships. 4. **Match `service_X.nu` `SVX_BINARIES` to `buildenv.sh` `BINARIES`** before tagging. If they drift, `--from-ci` fails because the consumer asks for binaries the producer doesn't publish. 5. **Consumer wiring suffix matches the new convention.** `service_X.nu` calls `svc_install_from_ci ... "x86_64-unknown-linux-musl"` (or `-gnu` if the workflow is glibc), not the old `linux-amd64` / `linux-amd64-musl` shapes. ### Updated rollout map | Phase | Status | |---|---| | 1 — 7 binary services with `--from-ci` (proc, router, proxy, db, indexer, aibroker, osis) | ✅ done | | 2.A — books + browser (cluster A pilot, old `linux-amd64` naming) | ✅ done | | 2.B — **slides (cluster D, target-triple naming, sets precedent for next 5)** | ✅ **done** | | 2.C — editor (different bug: `bun` missing in CI runner) | ⏳ next | | 2.D — biz, foundry (cluster A propagation, port hero_books inline pattern) | ⏳ | | 2.E — matrixchat (no release workflow at all, port template) | ⏳ | | 2.F — whiteboard (build-failure debug per [#136](https://forge.ourworld.tf/lhumina_code/hero_whiteboard/issues/136)) | ⏳ | | 2.5 — [home#212](https://forge.ourworld.tf/lhumina_code/home/issues/212) rename existing 9 repos to target-triple naming (~4-5h, pure API rename) | ⏳ | | 1.5 / [#64](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/64) — extend `--from-ci` to `start` lifecycle | ⏳ | | 3 — hero_os + hero_archipelagos WASM `--from-ci` | ⏳ | | 4 — `service_install_all --from-ci` | ⏳ | ### Next session (56) suggested order 1. **hero_editor** — same musl/rustls treatment if needed, plus the `bun` runner gap (likely a Containerfile or apt-install fix in the CI image). Workflow already uses target-triple internally; just apply lessons-learned checklist. 2. **hero_biz** — cluster A producer-side fix (port hero_books inline pattern, with target-triple naming from day one). 3. **hero_foundry** — mechanical replication of hero_biz. 4. **hero_matrixchat** — full template port (also using target-triple naming from day one). 5. **hero_whiteboard** — debug existing failed runs. Each repo at ~30 min - 2 h. Estimate ~6-8 h remaining for full Phase 2 completion → coverage 15/15. Signed-off-by: mik-tf
Author
Owner

Tiering correction — hero_editor is hard-tier (ONNX)

Investigated hero_editor for the next round of producer-side fixes and discovered it actually belongs in the hard tier alongside hero_voice and hero_embedder. Updating the rollout map accordingly.

What we found

When hero_editor's CI v0.1.0-rc3 ran (after hero_editor#4 fixed the bun: command not found issue), the cargo build itself would have hit the same musl/openssl-sys wall as hero_slides did — but for a fundamentally different reason:

openssl-sys
  via native-tls
    via ureq                 (build-dependency)
      via ort-sys            ← ONNX Runtime sys crate (downloads libonnxruntime at build time)
        via ort
          via voice_activity_detector
            via hero_editor_ui

hero_editor_ui uses voice_activity_detector for VAD in crates/hero_editor_ui/src/voice/audio.rs (not feature-gated). VAD pulls in ort (ONNX Runtime Rust bindings). ort-sys has a build-dependency on ureq (defaults to native-tls) to download ONNX Runtime libraries during compilation. That's how openssl-sys gets pulled into the build environment.

This is the same architectural class as hero_voice and hero_embedder, called out in this issue's body:

| hero_voice, hero_embedder | Hard | ONNX Runtime is C++ / native. Three options: (a) bundle libonnxruntime.so next to the binary, (b) statically link ONNX against musl, (c) keep these on glibc as a "near-static" exception.

The audit comment (c28672) and Phase 2 plan (c28994) misclassified hero_editor as easy-tier — the v0.1.0-rc3 build failed at bun BEFORE reaching the cargo build step that would have surfaced the ONNX issue, so the audit couldn't see it.

Real tier-2 picture for the remaining 5 producer-blocked

Repo Real tier openssl-sys via Strategy
hero_biz easy reqwest (project's own dep) rustls-tls fix — same as hero_slides
hero_foundry easy reqwest same as hero_biz
hero_matrixchat easy + needs full workflow port reqwest same fix + port build-linux.yaml
hero_whiteboard special 0 openssl-sys / 0 native-tls in lockfile different debug — build was failing for unrelated reason
hero_editor hard ort-sys/ureq build-dep deferred — same ONNX strategy as hero_voice/hero_embedder

Hard-tier consolidated

These three now share an architectural dependency on a libonnxruntime strategy:

  • hero_voice — already hard-tier per #54 body
  • hero_embedder — same
  • hero_editor — moved here today

Three options remain unchanged: (a) bundle libonnxruntime.so next to the binary, (b) statically link ONNX against musl (likely upstream PR territory), (c) keep on glibc as documented near-static exception.

Updated session order

  1. hero_biz — next this session (slides playbook applies directly, plus cluster A inline-upload port)
  2. hero_foundry — mechanical replication of biz
  3. hero_matrixchat — full template port
  4. hero_whiteboard — debug existing failed runs
  5. hero_editor — defer to a dedicated ONNX-strategy session covering all three hard-tier repos at once

After step 4: easy/special tier complete (14/15). Step 5 closes out the last 1/15 along with hero_voice + hero_embedder.

Signed-off-by: mik-tf

## Tiering correction — hero_editor is hard-tier (ONNX) Investigated hero_editor for the next round of producer-side fixes and discovered it actually belongs in the **hard tier** alongside hero_voice and hero_embedder. Updating the rollout map accordingly. ### What we found When hero_editor's CI v0.1.0-rc3 ran (after [hero_editor#4](https://forge.ourworld.tf/lhumina_code/hero_editor/pulls/4) fixed the `bun: command not found` issue), the cargo build itself would have hit the same musl/openssl-sys wall as hero_slides did — but for a fundamentally different reason: ``` openssl-sys via native-tls via ureq (build-dependency) via ort-sys ← ONNX Runtime sys crate (downloads libonnxruntime at build time) via ort via voice_activity_detector via hero_editor_ui ``` `hero_editor_ui` uses `voice_activity_detector` for VAD in `crates/hero_editor_ui/src/voice/audio.rs` (not feature-gated). VAD pulls in `ort` (ONNX Runtime Rust bindings). `ort-sys` has a build-dependency on `ureq` (defaults to native-tls) to download ONNX Runtime libraries during compilation. That's how openssl-sys gets pulled into the build environment. This is the same architectural class as hero_voice and hero_embedder, called out in this issue's body: > | hero_voice, hero_embedder | **Hard** | ONNX Runtime is C++ / native. Three options: (a) bundle `libonnxruntime.so` next to the binary, (b) statically link ONNX against musl, (c) keep these on glibc as a "near-static" exception. The audit comment ([c28672](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/54#issuecomment-28672)) and Phase 2 plan ([c28994](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/54#issuecomment-28994)) misclassified hero_editor as easy-tier — the v0.1.0-rc3 build failed at `bun` BEFORE reaching the cargo build step that would have surfaced the ONNX issue, so the audit couldn't see it. ### Real tier-2 picture for the remaining 5 producer-blocked | Repo | Real tier | openssl-sys via | Strategy | |---|---|---|---| | **hero_biz** | easy | reqwest (project's own dep) | rustls-tls fix — same as hero_slides | | **hero_foundry** | easy | reqwest | same as hero_biz | | **hero_matrixchat** | easy + needs full workflow port | reqwest | same fix + port build-linux.yaml | | **hero_whiteboard** | special | **0 openssl-sys / 0 native-tls in lockfile** | different debug — build was failing for unrelated reason | | **hero_editor** | **hard** | ort-sys/ureq build-dep | **deferred** — same ONNX strategy as hero_voice/hero_embedder | ### Hard-tier consolidated These three now share an architectural dependency on a libonnxruntime strategy: - **hero_voice** — already hard-tier per #54 body - **hero_embedder** — same - **hero_editor** — moved here today Three options remain unchanged: (a) bundle `libonnxruntime.so` next to the binary, (b) statically link ONNX against musl (likely upstream PR territory), (c) keep on glibc as documented near-static exception. ### Updated session order 1. **hero_biz** — next this session (slides playbook applies directly, plus cluster A inline-upload port) 2. **hero_foundry** — mechanical replication of biz 3. **hero_matrixchat** — full template port 4. **hero_whiteboard** — debug existing failed runs 5. **hero_editor** — defer to a dedicated ONNX-strategy session covering all three hard-tier repos at once After step 4: easy/special tier complete (14/15). Step 5 closes out the last 1/15 along with hero_voice + hero_embedder. Signed-off-by: mik-tf
Author
Owner

2026-05-04 — Session 55 close: 11/15 services live + 8-item Phase 2 playbook

Closing out session 55. Coverage moved 7 → 11/15 services with --from-ci end-to-end. Three E2E completions (slides, biz; plus books and browser earlier in the session). Plus architectural wins: home#212 target-triple naming standard adopted, hero_editor moved to hard-tier.

Producer × consumer matrix (full)

Service Producer Consumer (--from-ci wired) E2E
hero_proc v0.4.4 #193
hero_router v0.2.2 #195
hero_proxy v0.5.0 #195
hero_db v0.3.2 #195
hero_indexer v0.1.3 #195
hero_aibroker v0.1.1 #196
hero_osis v1.0.0-rc6 #197
hero_books v0.1.6-rc1 #200
hero_browser v0.1.4-rc5 #201
hero_slides v0.1.0-rc2 (4 assets, target-triple naming — first to use new convention) #202
hero_biz v0.1.3-rc4 (4 assets, both archs, target-triple naming) #204
hero_foundry no release next session
hero_whiteboard no release
hero_matrixchat no release workflow
hero_editor (hard-tier — ONNX) deferred to ONNX strategy session

8-item Phase 2 pre-flight playbook (apply to hero_foundry / hero_whiteboard / hero_matrixchat)

These 8 distinct CI gotchas surfaced during this session — pre-flight all of them on each remaining repo BEFORE pushing the first tag, to compress the multi-iteration debug cycle we just went through with hero_biz (5 tag rounds, 8h elapsed) down to 1-2 iterations:

# Pre-flight check Where it surfaced
1 Verify FORGEJO_TOKEN repo secret exists in repo settings (set via Forgejo UI before tagging) hero_slides
2 Verify reqwest declarations in workspace use default-features = false, features = ["rustls-tls", ...] — NEVER native-tls (pulls openssl-sys → musl breaks) hero_slides, hero_biz
3 Verify buildenv.sh::ALL_FEATURES references real, currently-existing workspace features. Stale value referencing renamed/removed features will fail at cargo build --features ... time. Set to "default" if unsure. hero_biz
4 In release.yaml / build-linux.yaml, JSON parsing should use python3 (universally available in ghcr.io/despiegk/builder:latest), NEVER jq (not installed). hero_biz
5 Forgejo workflow_dispatch ref resolution can flake — runs sometimes pick up an OLD sha. Prefer push trigger via tag-cut for verification. hero_biz
6 CI runner apt-mirror reachability is occasionally flaky. Transient retry-on-fail acceptable; if persistent, runner-side issue. hero_biz
7 Verify buildenv.sh path used by inline release-upload steps matches the repo's actual layout. hero_books has scripts/buildenv.sh; hero_biz has buildenv.sh at root. hero_biz
8 Add explicit rustup target add "${{ matrix.target }}" in Setup toolchain step. setup_linux_toolchain from build_lib.sh can silently skip the musl target on the despiegk/builder runner → E0463 "can't find crate for core". hero_biz

Plus the structural decisions also adopted this session:

  • Asset naming: full Rust target triple (x86_64-unknown-linux-musl, aarch64-unknown-linux-gnu) per home#212. Honest about libc per repo. Already in slides + biz; precedent set.
  • Inline release-upload pattern: Create Release + Upload Release Assets curl steps mirroring hero_books's working build-linux.yaml. Replaces shared-helper publish_binaries that only writes to pkg registry (cluster A bug).

Updated rollout map

Phase Status
1 — 7 binary services with --from-ci (proc, router, proxy, db, indexer, aibroker, osis) done
2.A — books + browser (cluster A pilot, old linux-amd64 naming) done
2.B — slides (target-triple naming, sets precedent for next 3) done
2.C — biz (cluster A propagation, full musl + arm64-gnu, 8 gotchas surfaced & captured) done
2.D — foundry (next, should be much faster with playbook) session 56
2.E — whiteboard (different shape, no openssl issue, debug existing failed runs)
2.F — matrixchat (no release workflow at all, full template port)
2.5 — home#212 rename existing 9 repos' assets to target-triple naming pure-rename, ~4-5h
1.5 / #64 — extend --from-ci to start lifecycle
3 — hero_os + hero_archipelagos WASM --from-ci
3.5 — hard-tier ONNX strategy (hero_voice + hero_embedder + hero_editor) deferred
4 — service_install_all --from-ci

Side artifacts filed this session

  • home#212 — workspace-wide asset-naming standard (target-triple)
  • hero_biz#22 — clippy drift hygiene (~99 warnings, separate from rollout)

Session 56 entry point

hero_foundry — same cluster A shape as hero_biz, should land much faster with the 8-item playbook applied up front. After foundry, whiteboard (different shape — no openssl issue, debug existing failed CI runs) and matrixchat (no release workflow, full template port).

Estimated remaining for full Phase 2 (3 services): ~3-5h with playbook discipline, vs the ~8h hero_biz took without it.

Signed-off-by: mik-tf

## 2026-05-04 — Session 55 close: 11/15 services live + 8-item Phase 2 playbook Closing out session 55. Coverage moved 7 → **11/15 services** with `--from-ci` end-to-end. Three E2E completions (slides, biz; plus books and browser earlier in the session). Plus architectural wins: [home#212](https://forge.ourworld.tf/lhumina_code/home/issues/212) target-triple naming standard adopted, hero_editor moved to hard-tier. ### Producer × consumer matrix (full) | Service | Producer | Consumer (`--from-ci` wired) | E2E | |---|---|---|---| | hero_proc | ✅ v0.4.4 | ✅ #193 | ✅ | | hero_router | ✅ v0.2.2 | ✅ #195 | ✅ | | hero_proxy | ✅ v0.5.0 | ✅ #195 | ✅ | | hero_db | ✅ v0.3.2 | ✅ #195 | ✅ | | hero_indexer | ✅ v0.1.3 | ✅ #195 | ✅ | | hero_aibroker | ✅ v0.1.1 | ✅ #196 | ✅ | | hero_osis | ✅ v1.0.0-rc6 | ✅ #197 | ✅ | | hero_books | ✅ v0.1.6-rc1 | ✅ #200 | ✅ | | hero_browser | ✅ v0.1.4-rc5 | ✅ #201 | ✅ | | **hero_slides** | ✅ [v0.1.0-rc2](https://forge.ourworld.tf/lhumina_code/hero_slides/releases/tag/v0.1.0-rc2) (4 assets, target-triple naming — first to use new convention) | ✅ #202 | ✅ | | **hero_biz** | ✅ [v0.1.3-rc4](https://forge.ourworld.tf/lhumina_code/hero_biz/releases/tag/v0.1.3-rc4) (4 assets, both archs, target-triple naming) | ✅ #204 | ✅ | | hero_foundry | ❌ no release | ❌ | ❌ next session | | hero_whiteboard | ❌ no release | ❌ | ❌ | | hero_matrixchat | ❌ no release workflow | ❌ | ❌ | | hero_editor | ❌ (hard-tier — ONNX) | ❌ | ❌ deferred to ONNX strategy session | ### 8-item Phase 2 pre-flight playbook (apply to hero_foundry / hero_whiteboard / hero_matrixchat) These 8 distinct CI gotchas surfaced during this session — pre-flight all of them on each remaining repo BEFORE pushing the first tag, to compress the multi-iteration debug cycle we just went through with hero_biz (5 tag rounds, 8h elapsed) down to 1-2 iterations: | # | Pre-flight check | Where it surfaced | |---|---|---| | 1 | Verify `FORGEJO_TOKEN` repo secret exists in repo settings (set via Forgejo UI before tagging) | hero_slides | | 2 | Verify `reqwest` declarations in workspace use `default-features = false, features = ["rustls-tls", ...]` — NEVER native-tls (pulls openssl-sys → musl breaks) | hero_slides, hero_biz | | 3 | Verify `buildenv.sh::ALL_FEATURES` references real, currently-existing workspace features. Stale value referencing renamed/removed features will fail at `cargo build --features ...` time. Set to `"default"` if unsure. | hero_biz | | 4 | In `release.yaml` / `build-linux.yaml`, JSON parsing should use `python3` (universally available in `ghcr.io/despiegk/builder:latest`), NEVER `jq` (not installed). | hero_biz | | 5 | Forgejo `workflow_dispatch` ref resolution can flake — runs sometimes pick up an OLD sha. Prefer push trigger via tag-cut for verification. | hero_biz | | 6 | CI runner apt-mirror reachability is occasionally flaky. Transient retry-on-fail acceptable; if persistent, runner-side issue. | hero_biz | | 7 | Verify `buildenv.sh` path used by inline release-upload steps matches the repo's actual layout. hero_books has `scripts/buildenv.sh`; hero_biz has `buildenv.sh` at root. | hero_biz | | 8 | Add explicit `rustup target add "${{ matrix.target }}"` in Setup toolchain step. `setup_linux_toolchain` from build_lib.sh can silently skip the musl target on the despiegk/builder runner → E0463 "can't find crate for core". | hero_biz | Plus the structural decisions also adopted this session: - **Asset naming**: full Rust target triple (`x86_64-unknown-linux-musl`, `aarch64-unknown-linux-gnu`) per [home#212](https://forge.ourworld.tf/lhumina_code/home/issues/212). Honest about libc per repo. Already in slides + biz; precedent set. - **Inline release-upload pattern**: `Create Release` + `Upload Release Assets` curl steps mirroring [hero_books's working build-linux.yaml](https://forge.ourworld.tf/lhumina_code/hero_books/src/branch/development/.forgejo/workflows/build-linux.yaml). Replaces shared-helper `publish_binaries` that only writes to pkg registry (cluster A bug). ### Updated rollout map | Phase | Status | |---|---| | 1 — 7 binary services with `--from-ci` (proc, router, proxy, db, indexer, aibroker, osis) | ✅ done | | 2.A — books + browser (cluster A pilot, old `linux-amd64` naming) | ✅ done | | 2.B — slides (target-triple naming, sets precedent for next 3) | ✅ done | | 2.C — **biz** (cluster A propagation, full musl + arm64-gnu, 8 gotchas surfaced & captured) | ✅ done | | 2.D — **foundry** (next, should be much faster with playbook) | ⏳ session 56 | | 2.E — whiteboard (different shape, no openssl issue, debug existing failed runs) | ⏳ | | 2.F — matrixchat (no release workflow at all, full template port) | ⏳ | | 2.5 — [home#212](https://forge.ourworld.tf/lhumina_code/home/issues/212) rename existing 9 repos' assets to target-triple naming | ⏳ pure-rename, ~4-5h | | 1.5 / [#64](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/64) — extend `--from-ci` to `start` lifecycle | ⏳ | | 3 — hero_os + hero_archipelagos WASM `--from-ci` | ⏳ | | 3.5 — hard-tier ONNX strategy (hero_voice + hero_embedder + hero_editor) | ⏳ deferred | | 4 — `service_install_all --from-ci` | ⏳ | ### Side artifacts filed this session - [home#212](https://forge.ourworld.tf/lhumina_code/home/issues/212) — workspace-wide asset-naming standard (target-triple) - [hero_biz#22](https://forge.ourworld.tf/lhumina_code/hero_biz/issues/22) — clippy drift hygiene (~99 warnings, separate from rollout) ### Session 56 entry point **hero_foundry** — same cluster A shape as hero_biz, should land much faster with the 8-item playbook applied up front. After foundry, whiteboard (different shape — no openssl issue, debug existing failed CI runs) and matrixchat (no release workflow, full template port). Estimated remaining for full Phase 2 (3 services): ~3-5h with playbook discipline, vs the ~8h hero_biz took without it. Signed-off-by: mik-tf
Author
Owner

Session 56 close — Phase 2 cluster A complete (12/15 services E2E)

hero_foundry full E2Ev0.2.3-rc2 published with 6 target-triple-named assets (3 binaries × 2 archs).

Playbook validation (8 items applied verbatim)

# Check Foundry
1 FORGEJO_TOKEN secret exists → set via API (HTTP 204)
2 reqwest features = rustls-tls → webdav_client/Cargo.toml fixed (3 → 0 openssl-sys)
3 ALL_FEATURES references real already "default"
4 python3 not jq → workflow rewritten + apt install jq reverted in build_lib.sh
5 tag-push trigger (not dispatch)
6 apt-mirror retry no flake hit
7 buildenv.sh path correct at repo root
8 explicit rustup target add → added defensively

Result: single successful tag (rc2 — rc1 was a stale session-53 tag pointing to old commit, skipped not retagged). Compare biz: 4 rc rounds. The playbook compressed cluster A propagation from ~8h debug to ~110min including squash-merge + smoke test.

Surprises

  • Pre-existing branch development_mik_release_assets from session 53 (commit 1cbf71a) was layered on top rather than reset — preserved audit trail of pre-playbook state. Squash-merge collapsed both at land time.
  • ~30 pre-existing clippy -D warnings errors across the workspace. Filed hero_foundry#28 (mirrors hero_biz#22 deferral). Workspace cargo build --workspace --release green = structural gate met.
  • target/ had 4033 root-owned files (1.9GB) from prior sudo cargo run; required user intervention.

Coverage

Phase Wired Remaining
1 proc, router, proxy, db, indexer, aibroker, osis (7)
2 books, browser, slides, biz (session 55) + foundry (this session) (5) whiteboard, matrixchat, editor
Total 12/15 3

Pinned sessions

  • 57 = hero_whiteboard (cluster C). Already 0 openssl-sys, no ONNX. Debug PR #128 build-linux failures (runs 23457/23458) + apply playbook. LOW-MED effort.
  • 58 = hero_matrixchat (cluster E). No build-linux.yaml at all → full template port from hero_biz canonical + reqwest rustls fix (3 openssl-sys). MED effort.
  • 59 = hero_editor (cluster D hard-tier). voice_activity_detector → ort → ONNX runtime cross-compile. Strategically blocked; likely swaps with a dedicated ONNX session covering hero_voice + hero_embedder + hero_editor together.

PRs landed: hero_foundry#29 + hero_skills#205. Manifest: sessions/56.yml.

### Session 56 close — Phase 2 cluster A complete (12/15 services E2E) **hero_foundry full E2E** — [v0.2.3-rc2](https://forge.ourworld.tf/lhumina_code/hero_foundry/releases/tag/v0.2.3-rc2) published with 6 target-triple-named assets (3 binaries × 2 archs). #### Playbook validation (8 items applied verbatim) | # | Check | Foundry | |---|---|---| | 1 | FORGEJO_TOKEN secret exists | ❌ → set via API (HTTP 204) | | 2 | reqwest features = rustls-tls | ❌ → webdav_client/Cargo.toml fixed (3 → 0 openssl-sys) | | 3 | ALL_FEATURES references real | ✅ already "default" | | 4 | python3 not jq | ❌ → workflow rewritten + apt install jq reverted in build_lib.sh | | 5 | tag-push trigger (not dispatch) | ✅ | | 6 | apt-mirror retry | ✅ no flake hit | | 7 | buildenv.sh path correct | ✅ at repo root | | 8 | explicit rustup target add | ❌ → added defensively | **Result: single successful tag** (rc2 — rc1 was a stale session-53 tag pointing to old commit, skipped not retagged). Compare biz: 4 rc rounds. The playbook compressed cluster A propagation from ~8h debug to ~110min including squash-merge + smoke test. #### Surprises - Pre-existing branch `development_mik_release_assets` from session 53 (commit `1cbf71a`) was layered on top rather than reset — preserved audit trail of pre-playbook state. Squash-merge collapsed both at land time. - ~30 pre-existing clippy `-D warnings` errors across the workspace. Filed [hero_foundry#28](https://forge.ourworld.tf/lhumina_code/hero_foundry/issues/28) (mirrors hero_biz#22 deferral). Workspace `cargo build --workspace --release` green = structural gate met. - target/ had 4033 root-owned files (1.9GB) from prior sudo cargo run; required user intervention. #### Coverage | Phase | Wired | Remaining | |---|---|---| | 1 | proc, router, proxy, db, indexer, aibroker, osis (7) | — | | 2 | books, browser, slides, biz (session 55) + foundry (this session) (5) | whiteboard, matrixchat, editor | | **Total** | **12/15** | 3 | #### Pinned sessions - **57** = hero_whiteboard (cluster C). Already 0 openssl-sys, no ONNX. Debug PR #128 build-linux failures (runs 23457/23458) + apply playbook. LOW-MED effort. - **58** = hero_matrixchat (cluster E). No build-linux.yaml at all → full template port from hero_biz canonical + reqwest rustls fix (3 openssl-sys). MED effort. - **59** = hero_editor (cluster D hard-tier). voice_activity_detector → ort → ONNX runtime cross-compile. **Strategically blocked**; likely swaps with a dedicated ONNX session covering hero_voice + hero_embedder + hero_editor together. PRs landed: [hero_foundry#29](https://forge.ourworld.tf/lhumina_code/hero_foundry/pulls/29) + [hero_skills#205](https://forge.ourworld.tf/lhumina_code/hero_skills/pulls/205). Manifest: `sessions/56.yml`.
Author
Owner

Session 58 status — cluster E shipped (matrixchat) → 14/15 services E2E

What landed

Producer (hero_matrixchat):

Consumer (hero_skills):

  • lhumina_code/hero_skills#207service_matrixchat install --from-ci, mirrors service_biz/foundry/whiteboard verbatim
  • Smoke-tested on heroci.gent01.grid.tf — 3 statically-linked musl binaries land in ~/hero/bin/ (rustls fix verified, no openssl drag)

Coverage now (14/15 services E2E)

 1. ✓ hero_proc
 2. ✓ hero_router
 3. ✓ hero_proxy
 4. ✓ hero_db
 5. ✓ hero_indexer
 6. ✓ hero_aibroker
 7. ✓ hero_osis
 8. ✓ hero_books
 9. ✓ hero_browser
10. ✓ hero_slides
11. ✓ hero_biz
12. ✓ hero_foundry
13. ✓ hero_whiteboard
14. ✓ hero_matrixchat   ← session 58
15. ✗ hero_editor       ← BLOCKED on ONNX cross-compile (cluster D hard-tier)

Bugs found / fixed beyond the 8-item playbook

Matrixchat needed two fixes the playbook didn't cover (now part of the cluster-E learnings):

  1. No workspace member declared [features]cargo build --features default errored "none of the selected packages contains this feature: default". Fix: add [features]\ndefault = [] to one library crate (hero_matrixchat_sdk, mirroring hero_foundry_core / hero_biz_app convention). Repos that already had a default-bearing library crate (foundry, biz, whiteboard) didn't hit this.

  2. Pre-existing fmt debt from upstream 0907fcaci.yml's cargo fmt --all --check step was failing on every push since the herolib_core integration commit. Surfaced now because it blocked our PRs from being green. Fix: mechanical cargo fmt, no behaviour change.

v0.1.0-rc1 ate both before they were diagnosed; v0.1.0-rc2 was clean first try after the unblock PR.

Workspace-build gate

Clean: cargo fmt --check && cargo clippy --workspace --all-targets -- -D warnings && cargo build --workspace --features default --release. No clippy debt — no follow-up issue needed (unlike hero_biz#22 / hero_foundry#28).

Next session (59)

Pinned: hero_editor (cluster D hard-tier). Strategically blocked on ONNX cross-compile (voice_activity_detector → ort → onnxruntime). Likely swap with a dedicated ONNX-strategy session covering hero_voice + hero_embedder + hero_editor together per session 55's hard-tier note. If session 59 starts on editor and immediately hits the ONNX wall, retreat and convert to ONNX-strategy first.

After editor: Phase 2 is complete at 15/15 and we move to Phase 3 (deploy --from-ci on herodemo).

Signed-off-by: mik-tf

## Session 58 status — cluster E shipped (matrixchat) → **14/15 services E2E** ### What landed **Producer (hero_matrixchat):** - https://forge.ourworld.tf/lhumina_code/hero_matrixchat/pulls/5 — full canonical workflow port from hero_biz + workspace reqwest→rustls (drops openssl-sys + 440 lock lines) - https://forge.ourworld.tf/lhumina_code/hero_matrixchat/pulls/7 — features-default + cargo fmt unblock (closes https://forge.ourworld.tf/lhumina_code/hero_matrixchat/issues/6) - Tag https://forge.ourworld.tf/lhumina_code/hero_matrixchat/releases/tag/v0.1.0-rc2 — **6 target-triple-named assets** (3 binaries × 2 archs, both legs green) **Consumer (hero_skills):** - https://forge.ourworld.tf/lhumina_code/hero_skills/pulls/207 — `service_matrixchat install --from-ci`, mirrors service_biz/foundry/whiteboard verbatim - Smoke-tested on heroci.gent01.grid.tf — 3 statically-linked musl binaries land in `~/hero/bin/` (rustls fix verified, no openssl drag) ### Coverage now (14/15 services E2E) ``` 1. ✓ hero_proc 2. ✓ hero_router 3. ✓ hero_proxy 4. ✓ hero_db 5. ✓ hero_indexer 6. ✓ hero_aibroker 7. ✓ hero_osis 8. ✓ hero_books 9. ✓ hero_browser 10. ✓ hero_slides 11. ✓ hero_biz 12. ✓ hero_foundry 13. ✓ hero_whiteboard 14. ✓ hero_matrixchat ← session 58 15. ✗ hero_editor ← BLOCKED on ONNX cross-compile (cluster D hard-tier) ``` ### Bugs found / fixed beyond the 8-item playbook Matrixchat needed two fixes the playbook didn't cover (now part of the cluster-E learnings): 1. **No workspace member declared `[features]`** — `cargo build --features default` errored "none of the selected packages contains this feature: default". Fix: add `[features]\ndefault = []` to one library crate (hero_matrixchat_sdk, mirroring hero_foundry_core / hero_biz_app convention). Repos that already had a default-bearing library crate (foundry, biz, whiteboard) didn't hit this. 2. **Pre-existing fmt debt from upstream `0907fca`** — `ci.yml`'s `cargo fmt --all --check` step was failing on every push since the herolib_core integration commit. Surfaced now because it blocked our PRs from being green. Fix: mechanical `cargo fmt`, no behaviour change. v0.1.0-rc1 ate both before they were diagnosed; v0.1.0-rc2 was clean first try after the unblock PR. ### Workspace-build gate Clean: `cargo fmt --check && cargo clippy --workspace --all-targets -- -D warnings && cargo build --workspace --features default --release`. No clippy debt — no follow-up issue needed (unlike hero_biz#22 / hero_foundry#28). ### Next session (59) **Pinned: hero_editor (cluster D hard-tier).** Strategically blocked on ONNX cross-compile (voice_activity_detector → ort → onnxruntime). Likely swap with a dedicated **ONNX-strategy session covering hero_voice + hero_embedder + hero_editor together** per session 55's hard-tier note. If session 59 starts on editor and immediately hits the ONNX wall, retreat and convert to ONNX-strategy first. After editor: Phase 2 is complete at **15/15** and we move to Phase 3 (deploy --from-ci on herodemo). Signed-off-by: mik-tf
Author
Owner

Complete roadmap — lhumina_code hero OS to green

Scope: end-to-end from Phase 2 finish through deployable AI-native demo. Sequenced for maximum leverage early. Honest effort estimates; flagged uncertainties as such.


Where we are after session 58

  • Phase 2 release pipeline rollout: 14/15 killer-demo services E2E with --from-ci.
  • Reality check on "15": the killer-demo set is a subset. lhumina_code/hero_skills/tools/modules/services/ has 34 service modules total — 14 wired, 20 not yet wired. Plus hero_os WASM shell, hero_archipelagos native islands, and ~7 unscoped repos.
  • Active scope creep risk: Phase 2 was scoped narrow (the demo path). Phases 4-5 below cover the other 19 services and need a triage pass before effort estimates harden.
  • 8-item playbook + 2 new items from session 58 = stable inheritance cost per service (~60-90 min for pure-Rust services).

Inventory: 34 hero_skills service modules

✓ Phase 2 wired (14):    proc, router, proxy, db, indexer, aibroker, osis,
                          books, browser, slides, biz, foundry, whiteboard,
                          matrixchat

✗ Phase 2 last gap (1):  editor                          ← session 59 candidate

✗ ONNX-blocked (2):      voice, embedder                 ← unblocked by ONNX-strategy session

✗ Hero infra services (5): agent, code, collab, hero_do, mycelium

✗ Auxiliary (8):         claude, codescalers, compute, core, livekit,
                          logic, mail, shrimp

✗ Office stack (3):      office, onlyoffice, runner_rhai

✗ Misc (1):              os

Releasable units outside service_*.nu: hero_os (Dioxus/WASM shell), hero_archipelagos (native islands), hero_browser_mcp, hero_foundry_ui, hero_indexer_ui, kokoro-micro. Unscoped: hero_auth, hero_cluster, hero_compute_manager, hero_coordinator, hero_launcher, hero_ledger, hero_researcher, hero_lib_rhai, hero_web_template, dist — some likely deprecated, all need triage before effort commitment.


The plan — 9 phases

Phase 2 — finish (last 1 + ONNX-blocked 3)

Goal: 15/15 killer-demo services + voice + embedder all --from-ci.

Work:

  • Session 59: ONNX-strategy session. Inventory ort/voice_activity_detector/onnxruntime drag across hero_voice + hero_embedder + hero_editor. Investigate prebuilt onnxruntime musl/arm64 distributions. Lock approach in decisions/D-05-onnx-cross-compile.md. Single proof-of-concept tag on one of the three.
  • Sessions 60-62: apply ONNX approach to all three (voice, embedder, editor). Inheritance pattern after the strategy is locked.

Definition of done:

  • All 15 killer-demo services + voice + embedder publish target-triple-named release assets on tag push.
  • 17 service_*.nu modules in hero_skills wire --from-ci.
  • Smoke test green on heroci for all 17.

Estimate: 4 sessions. Risk: ONNX may not have a unified approach across all three — fallback is per-service ad-hoc fix (3 sessions become 3 distinct ones, total cost rises to ~6-7).

Blockers: none. Parallel-eligible: docs_hero Phase 1 content (independent track).


Phase 3 — herodemo deploys entirely via --from-ci

Goal: Zero cargo build invocations on the herodemo VM. Every service install path goes through downloaded musl artifacts.

Work:

  • Fix L-05: service_X start currently purges + rebuilds via cargo, defeating --from-ci installs. Need start-side awareness of "already installed from CI".
  • Roll the start-side fix across all 17 wired services.
  • One-shot herodemo redeploy with --from-ci --version v… for every service.
  • Verify the demo at https://herodemo.gent01.grid.tf/ end-to-end with the AI Multi-Session Pipeline UX gate (CLAUDE.md Rule 6).

Definition of done:

  • service_proc start --from-ci --version v… works idempotently for all 17 services on a fresh VM.
  • herodemo running entirely from CI artifacts; no cargo on the VM after deploy.
  • Rebuild-from-scratch on a clean TFGrid VM in <10 min wall-clock.

Estimate: 2-3 sessions. Risk: L-05 fix may surface architectural questions about service lifecycle; could expand. Blockers: Phase 2 finish (need wired services to roll across). Parallel-eligible: docs_hero, reliability META design work.


Phase 4 — hero infra services on --from-ci

Goal: AI/coordination layer (agent, code, collab, hero_do, mycelium) on the same release pipeline.

Work: Apply the playbook to each. After ONNX is solved, agent + code are likely pure-Rust and inherit cleanly. mycelium is a special case (separate upstream, may already have its own pipeline). collab has had FD-leak issues — investigate before pipelining.

Definition of done: 22/34 services wired (14 + 3 ONNX + 5 infra).

Estimate: 3-4 sessions. Risk: collab/mycelium may need refactor before pipelining is sensible. Blockers: Phase 2 finish; ONNX strategy locked. Parallel-eligible: Phase 5 inventory.


Phase 5 — auxiliary + office services

Goal: Cover the remaining 12 service modules.

Pre-work (1 session): Triage inventory. Determine which are actively maintained vs deprecated; which have unique build constraints (office stack is not pure-Rust); which are demo-critical. Output: a per-service decision (pipeline / deprecate / defer).

Then per-service:

  • claude, codescalers, compute, core: likely pure-Rust inheritance.
  • livekit, logic, mail, shrimp: scope unknown until triage.
  • office, onlyoffice, runner_rhai: have non-Rust deps, may need bespoke workflow.
  • os: scope unclear.

Definition of done: Every actively-maintained service has either a --from-ci pipeline OR an explicit deprecation note in its README + workspace removal.

Estimate: 5-7 sessions (1 triage + 4-6 inheritance/bespoke). Blockers: Phase 4 (proves pattern at scale). Parallel-eligible: WASM shell pipeline.


Phase 6 — hero_os WASM shell release pipeline

Goal: Reproducible WASM bundle deploys for the Dioxus shell.

Work: Separate workflow shape — wasm-pack/trunk build, content-hashed bundle, deploy to herodemo's ~/hero/share/hero_os/public/ (or equivalent CDN/origin). Distinct from the hero_proc service pattern.

Definition of done: Tag push → WASM bundle uploaded → make install-assets-release equivalent runs on herodemo.

Estimate: 2 sessions. Risk: the WASM build is ~25 min cold; CI runtime budget may force optimizations. Blockers: none from Phase 2-5. Parallel-eligible: Phase 5.


Phase 7 — hero_archipelagos native islands

Goal: Native Dioxus islands (photos, videos, calendar, etc.) on the same release pipeline.

Work: Each island is a binary; same biz canonical pattern but per-island matrix. Likely a single workflow with per-island feature gates. Investigate whether one-binary-many-islands or many-binaries shape.

Definition of done: Every active island ships musl/arm64 release artifacts on tag push.

Estimate: 2-3 sessions. Blockers: Phase 2-3 (proves the pattern). Parallel-eligible: Phase 8.


Phase 8 — Reliability META

Goal: Close the architectural gaps that have been accumulating in limitations/.

Targets:

Definition of done: Every L-* limitation either resolved with a linked PR or explicitly accepted with a long-term tracking issue.

Estimate: 3-5 sessions (each is a real refactor). Blockers: none from earlier phases. Parallel-eligible from Phase 3+: can start as soon as deploy is stable.


Phase 9 — Ambient AI vision per hero_demo#52

Goal: The actual product — Hero OS as a sovereign AI-native personal OS.

Work (each is a session+):

  • MCP tool discovery surface (hero_agent#15 Phase 0/1)
  • Wake word + ambient AI widget (hero_agent#16)
  • Conversation mode
  • Per-context everything (read AND write closed loop)
  • AI scratchpad-leakage fix (L-03)
  • STT path verification
  • 24h killer-demo plan from hero_demo#53

Definition of done: hero_demo#52 acceptance criteria met. Demo verifiable at https://herodemo.gent01.grid.tf/ with no human onboarding.

Estimate: 6-10 sessions. This is the real product work; everything before is plumbing. Blockers: Phase 2 (services) + Phase 3 (deploy). Parallel-eligible: docs_hero Phase 1 content (the agent grounds on it).


Best path — critical sequence

┌─ Phase 2 finish (4 sessions: 1 ONNX strategy + 3 inheritance)
│       │
│       └─→ Phase 3 deploy (2-3 sessions)
│              │
│              ├─→ Phase 4 infra services (3-4 sessions)
│              │     │
│              │     └─→ Phase 5 auxiliary (5-7 sessions, after triage)
│              │
│              └─→ Phase 8 reliability META (3-5 sessions, parallel)
│
├─ Phase 6 WASM shell (2 sessions, parallelizable with Phase 4-5)
│
├─ Phase 7 archipelagos (2-3 sessions, parallelizable with Phase 4-5)
│
└─ Phase 9 AI vision (6-10 sessions, starts after Phase 3 stable)
   parallel with Phase 4-8

Why this sequence:

  1. ONNX-strategy first (in Phase 2 finish) — single session unlocks 3 services (voice, embedder, editor). Highest leverage in the entire plan.
  2. Phase 3 immediately after Phase 2 — deploy what we have, validate the pipeline end-to-end, surface deploy-side bugs early. Don't accumulate 22+ wired services before testing the deploy story.
  3. Phase 4-8 are largely parallelizable. Different repos, independent reviewers possible. WASM shell + archipelagos run on a different track from services rollout.
  4. Phase 5 needs triage first — don't pour effort into deprecated services. The 1-session inventory pays for itself if it kills 2+ services.
  5. Phase 9 starts as early as Phase 3 is stable — the AI vision is the actual product; the rest is plumbing. Don't push the product to the end.

Total: 25-40 sessions to "all hero OS green" depending on Phase 5 triage outcome and ONNX strategy success rate.

Roughly 25-80 hours of focused execution time depending on session length. With the multi-session pipeline discipline, this is 1-3 calendar months of part-time work or 2-4 weeks full-time.


Risks + open decisions

  1. ONNX strategy may not unify across voice/embedder/editor. Fallback: per-service approaches, Phase 2 finish becomes 6-7 sessions instead of 4.
  2. Phase 5 services may be partly deprecated. Triage pass before committing effort. Possible deletes save sessions.
  3. WASM build CI runtime budget — 25-min cold builds may need cache strategy or dedicated runner.
  4. L-05 fix may expand to architectural service-lifecycle work — could pull effort from Phase 3 into Phase 8.
  5. No estimate for unscoped repos (auth, cluster, compute_manager, coordinator, launcher, ledger, researcher, lib_rhai, web_template, dist) — needs a 1-session inventory before we can call "all hero OS green" honest.
  6. Demo-critical vs nice-to-have — Phases 5 + 7 contain services that may not need to ship for v1. Worth a product-side conversation before committing to 100% coverage.

Session 59 = ONNX-strategy session (Phase 2 finish path A → strategy variant).

Maximum leverage: one investigation session unlocks 3 services. The alternative — direct attempt on hero_editor — has high probability of hitting the ONNX wall in the first 15 min and forcing a retreat anyway, so we'd pay for both the retreat AND the strategy work.

Path: investigate prebuilt onnxruntime musl distribution + ort crate's download-binaries feature, lock approach in decisions/D-05-onnx-cross-compile.md, single proof-of-concept tag on hero_editor (smallest of the three). Sessions 60-62 then apply pattern across voice + embedder + editor.


Roadmap drafted at session 58 close. To revise, comment with proposed changes; locked decisions go to decisions/D-NN-*.md. This comment is the SSOT for the meta-plan until hero_demo#52 absorbs it.

Signed-off-by: mik-tf

## Complete roadmap — `lhumina_code` hero OS to green *Scope: end-to-end from Phase 2 finish through deployable AI-native demo. Sequenced for maximum leverage early. Honest effort estimates; flagged uncertainties as such.* --- ### Where we are after session 58 - **Phase 2 release pipeline rollout: 14/15 killer-demo services E2E** with `--from-ci`. - **Reality check on "15":** the killer-demo set is a subset. `lhumina_code/hero_skills/tools/modules/services/` has **34 service modules total** — 14 wired, **20 not yet wired**. Plus `hero_os` WASM shell, `hero_archipelagos` native islands, and ~7 unscoped repos. - **Active scope creep risk:** Phase 2 was scoped narrow (the demo path). Phases 4-5 below cover the other 19 services and need a triage pass before effort estimates harden. - **8-item playbook** + 2 new items from session 58 = stable inheritance cost per service (~60-90 min for pure-Rust services). ### Inventory: 34 hero_skills service modules ``` ✓ Phase 2 wired (14): proc, router, proxy, db, indexer, aibroker, osis, books, browser, slides, biz, foundry, whiteboard, matrixchat ✗ Phase 2 last gap (1): editor ← session 59 candidate ✗ ONNX-blocked (2): voice, embedder ← unblocked by ONNX-strategy session ✗ Hero infra services (5): agent, code, collab, hero_do, mycelium ✗ Auxiliary (8): claude, codescalers, compute, core, livekit, logic, mail, shrimp ✗ Office stack (3): office, onlyoffice, runner_rhai ✗ Misc (1): os ``` Releasable units outside `service_*.nu`: `hero_os` (Dioxus/WASM shell), `hero_archipelagos` (native islands), `hero_browser_mcp`, `hero_foundry_ui`, `hero_indexer_ui`, `kokoro-micro`. **Unscoped:** `hero_auth`, `hero_cluster`, `hero_compute_manager`, `hero_coordinator`, `hero_launcher`, `hero_ledger`, `hero_researcher`, `hero_lib_rhai`, `hero_web_template`, `dist` — some likely deprecated, all need triage before effort commitment. --- ## The plan — 9 phases ### Phase 2 — finish (last 1 + ONNX-blocked 3) **Goal:** 15/15 killer-demo services + voice + embedder all `--from-ci`. **Work:** - Session 59: ONNX-strategy session. Inventory ort/voice_activity_detector/onnxruntime drag across hero_voice + hero_embedder + hero_editor. Investigate prebuilt onnxruntime musl/arm64 distributions. Lock approach in `decisions/D-05-onnx-cross-compile.md`. Single proof-of-concept tag on one of the three. - Sessions 60-62: apply ONNX approach to all three (voice, embedder, editor). Inheritance pattern after the strategy is locked. **Definition of done:** - All 15 killer-demo services + voice + embedder publish target-triple-named release assets on tag push. - 17 `service_*.nu` modules in hero_skills wire `--from-ci`. - Smoke test green on heroci for all 17. **Estimate: 4 sessions.** Risk: ONNX may not have a unified approach across all three — fallback is per-service ad-hoc fix (3 sessions become 3 distinct ones, total cost rises to ~6-7). **Blockers:** none. **Parallel-eligible:** docs_hero Phase 1 content (independent track). --- ### Phase 3 — herodemo deploys entirely via `--from-ci` **Goal:** Zero `cargo build` invocations on the herodemo VM. Every service install path goes through downloaded musl artifacts. **Work:** - Fix [L-05](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/64): `service_X start` currently purges + rebuilds via cargo, defeating `--from-ci` installs. Need start-side awareness of "already installed from CI". - Roll the start-side fix across all 17 wired services. - One-shot herodemo redeploy with `--from-ci --version v…` for every service. - Verify the demo at https://herodemo.gent01.grid.tf/ end-to-end with the AI Multi-Session Pipeline UX gate (CLAUDE.md Rule 6). **Definition of done:** - `service_proc start --from-ci --version v…` works idempotently for all 17 services on a fresh VM. - herodemo running entirely from CI artifacts; no cargo on the VM after deploy. - Rebuild-from-scratch on a clean TFGrid VM in <10 min wall-clock. **Estimate: 2-3 sessions.** Risk: L-05 fix may surface architectural questions about service lifecycle; could expand. **Blockers:** Phase 2 finish (need wired services to roll across). **Parallel-eligible:** docs_hero, reliability META design work. --- ### Phase 4 — hero infra services on `--from-ci` **Goal:** AI/coordination layer (agent, code, collab, hero_do, mycelium) on the same release pipeline. **Work:** Apply the playbook to each. After ONNX is solved, agent + code are likely pure-Rust and inherit cleanly. mycelium is a special case (separate upstream, may already have its own pipeline). collab has had FD-leak issues — investigate before pipelining. **Definition of done:** 22/34 services wired (14 + 3 ONNX + 5 infra). **Estimate: 3-4 sessions.** Risk: collab/mycelium may need refactor before pipelining is sensible. **Blockers:** Phase 2 finish; ONNX strategy locked. **Parallel-eligible:** Phase 5 inventory. --- ### Phase 5 — auxiliary + office services **Goal:** Cover the remaining 12 service modules. **Pre-work (1 session):** **Triage inventory.** Determine which are actively maintained vs deprecated; which have unique build constraints (office stack is not pure-Rust); which are demo-critical. Output: a per-service decision (pipeline / deprecate / defer). **Then per-service:** - claude, codescalers, compute, core: likely pure-Rust inheritance. - livekit, logic, mail, shrimp: scope unknown until triage. - office, onlyoffice, runner_rhai: have non-Rust deps, may need bespoke workflow. - os: scope unclear. **Definition of done:** Every actively-maintained service has either a `--from-ci` pipeline OR an explicit deprecation note in its README + workspace removal. **Estimate: 5-7 sessions** (1 triage + 4-6 inheritance/bespoke). **Blockers:** Phase 4 (proves pattern at scale). **Parallel-eligible:** WASM shell pipeline. --- ### Phase 6 — hero_os WASM shell release pipeline **Goal:** Reproducible WASM bundle deploys for the Dioxus shell. **Work:** Separate workflow shape — wasm-pack/trunk build, content-hashed bundle, deploy to herodemo's `~/hero/share/hero_os/public/` (or equivalent CDN/origin). Distinct from the hero_proc service pattern. **Definition of done:** Tag push → WASM bundle uploaded → `make install-assets-release` equivalent runs on herodemo. **Estimate: 2 sessions.** Risk: the WASM build is ~25 min cold; CI runtime budget may force optimizations. **Blockers:** none from Phase 2-5. **Parallel-eligible:** Phase 5. --- ### Phase 7 — hero_archipelagos native islands **Goal:** Native Dioxus islands (photos, videos, calendar, etc.) on the same release pipeline. **Work:** Each island is a binary; same biz canonical pattern but per-island matrix. Likely a single workflow with per-island feature gates. Investigate whether one-binary-many-islands or many-binaries shape. **Definition of done:** Every active island ships musl/arm64 release artifacts on tag push. **Estimate: 2-3 sessions.** **Blockers:** Phase 2-3 (proves the pattern). **Parallel-eligible:** Phase 8. --- ### Phase 8 — Reliability META **Goal:** Close the architectural gaps that have been accumulating in `limitations/`. **Targets:** - [L-01](https://forge.ourworld.tf/lhumina_code/hero_demo/blob/development/limitations/L-01-hero_proc-status-drift.md) hero_proc status reporting drift → [hero_proc#83](https://forge.ourworld.tf/lhumina_code/hero_proc/issues/83) + [hero_proc#84](https://forge.ourworld.tf/lhumina_code/hero_proc/issues/84) - [L-02](https://forge.ourworld.tf/lhumina_code/hero_demo/blob/development/limitations/L-02-half-broken-oserver-listener.md) half-broken OServer listener → [home#202](https://forge.ourworld.tf/lhumina_code/home/issues/202) + [home#204](https://forge.ourworld.tf/lhumina_code/home/issues/204) - [L-04](https://forge.ourworld.tf/lhumina_code/hero_demo/blob/development/limitations/L-04-mcp-tool-bruteforce.md) MCP tool brute-force → [hero_agent#15](https://forge.ourworld.tf/lhumina_code/hero_agent/issues/15) - L-05 (closed in Phase 3) **Definition of done:** Every L-* limitation either resolved with a linked PR or explicitly accepted with a long-term tracking issue. **Estimate: 3-5 sessions** (each is a real refactor). **Blockers:** none from earlier phases. **Parallel-eligible from Phase 3+:** can start as soon as deploy is stable. --- ### Phase 9 — Ambient AI vision per [hero_demo#52](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/52) **Goal:** The actual product — Hero OS as a sovereign AI-native personal OS. **Work (each is a session+):** - MCP tool discovery surface ([hero_agent#15](https://forge.ourworld.tf/lhumina_code/hero_agent/issues/15) Phase 0/1) - Wake word + ambient AI widget ([hero_agent#16](https://forge.ourworld.tf/lhumina_code/hero_agent/issues/16)) - Conversation mode - Per-context everything (read AND write closed loop) - AI scratchpad-leakage fix ([L-03](https://forge.ourworld.tf/lhumina_code/hero_demo/blob/development/limitations/L-03-ai-scratchpad-leakage.md)) - STT path verification - 24h killer-demo plan from [hero_demo#53](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/53) **Definition of done:** [hero_demo#52](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/52) acceptance criteria met. Demo verifiable at https://herodemo.gent01.grid.tf/ with no human onboarding. **Estimate: 6-10 sessions.** This is the real product work; everything before is plumbing. **Blockers:** Phase 2 (services) + Phase 3 (deploy). **Parallel-eligible:** docs_hero Phase 1 content (the agent grounds on it). --- ## Best path — critical sequence ``` ┌─ Phase 2 finish (4 sessions: 1 ONNX strategy + 3 inheritance) │ │ │ └─→ Phase 3 deploy (2-3 sessions) │ │ │ ├─→ Phase 4 infra services (3-4 sessions) │ │ │ │ │ └─→ Phase 5 auxiliary (5-7 sessions, after triage) │ │ │ └─→ Phase 8 reliability META (3-5 sessions, parallel) │ ├─ Phase 6 WASM shell (2 sessions, parallelizable with Phase 4-5) │ ├─ Phase 7 archipelagos (2-3 sessions, parallelizable with Phase 4-5) │ └─ Phase 9 AI vision (6-10 sessions, starts after Phase 3 stable) parallel with Phase 4-8 ``` **Why this sequence:** 1. **ONNX-strategy first (in Phase 2 finish)** — single session unlocks 3 services (voice, embedder, editor). Highest leverage in the entire plan. 2. **Phase 3 immediately after Phase 2** — deploy what we have, validate the pipeline end-to-end, surface deploy-side bugs early. Don't accumulate 22+ wired services before testing the deploy story. 3. **Phase 4-8 are largely parallelizable.** Different repos, independent reviewers possible. WASM shell + archipelagos run on a different track from services rollout. 4. **Phase 5 needs triage first** — don't pour effort into deprecated services. The 1-session inventory pays for itself if it kills 2+ services. 5. **Phase 9 starts as early as Phase 3 is stable** — the AI vision is the actual product; the rest is plumbing. Don't push the product to the end. **Total: 25-40 sessions** to "all hero OS green" depending on Phase 5 triage outcome and ONNX strategy success rate. Roughly 25-80 hours of focused execution time depending on session length. With the multi-session pipeline discipline, this is 1-3 calendar months of part-time work or 2-4 weeks full-time. --- ## Risks + open decisions 1. **ONNX strategy may not unify across voice/embedder/editor.** Fallback: per-service approaches, Phase 2 finish becomes 6-7 sessions instead of 4. 2. **Phase 5 services may be partly deprecated.** Triage pass before committing effort. Possible deletes save sessions. 3. **WASM build CI runtime budget** — 25-min cold builds may need cache strategy or dedicated runner. 4. **L-05 fix may expand to architectural service-lifecycle work** — could pull effort from Phase 3 into Phase 8. 5. **No estimate for unscoped repos** (auth, cluster, compute_manager, coordinator, launcher, ledger, researcher, lib_rhai, web_template, dist) — needs a 1-session inventory before we can call "all hero OS green" honest. 6. **Demo-critical vs nice-to-have** — Phases 5 + 7 contain services that may not need to ship for v1. Worth a product-side conversation before committing to 100% coverage. ## Recommended next session **Session 59 = ONNX-strategy session** (Phase 2 finish path A → strategy variant). Maximum leverage: one investigation session unlocks 3 services. The alternative — direct attempt on hero_editor — has high probability of hitting the ONNX wall in the first 15 min and forcing a retreat anyway, so we'd pay for both the retreat AND the strategy work. Path: investigate prebuilt onnxruntime musl distribution + `ort` crate's `download-binaries` feature, lock approach in `decisions/D-05-onnx-cross-compile.md`, single proof-of-concept tag on hero_editor (smallest of the three). Sessions 60-62 then apply pattern across voice + embedder + editor. --- *Roadmap drafted at session 58 close. To revise, comment with proposed changes; locked decisions go to `decisions/D-NN-*.md`. This comment is the SSOT for the meta-plan until [hero_demo#52](https://forge.ourworld.tf/lhumina_code/hero_demo/issues/52) absorbs it.* Signed-off-by: mik-tf
Author
Owner

Session 60 — D-05 implementation pilot complete: hero_editor → 15/15 (+1 ONNX service)

Coverage: 14/15 → 15/15 (original Phase 2 set complete) + first ONNX service shipped, paving the way for hero_voice (session 61) and hero_embedder (session 62) to repeat the pattern.

Producer side — hero_editor v0.1.0-rc4

Released at https://forge.ourworld.tf/lhumina_code/hero_editor/releases/tag/v0.1.0-rc4 with 6 assets (~45 MB total):

  • hero_editor_server-{x86_64,aarch64}-unknown-linux-gnu (518 KB / 488 KB)
  • hero_editor_ui-{x86_64,aarch64}-unknown-linux-gnu (1.2 MB / 1.2 MB)
  • libonnxruntime.so.1.25.1-{x86_64,aarch64}-unknown-linux-gnu (22.7 MB / 19.2 MB)

D-05 fully validated end-to-end on the producer pipeline: load-dynamic ort + matrix swap to gnu + bundled Microsoft libonnxruntime.so all worked.

Consumer side

  • hero_skills #210 wired service_editor install --from-ci with libonnxruntime.so handling.
  • hero_skills #212 tiny fix-forward (svc_verify_elf was unexported; nu degraded the Command-not-found error to External-command-failed inside try/catch, masking the real diagnosis).
  • hero_demo services/hero_editor.toml committed direct to development with ORT_DYLIB_PATH=__HERO_BIN__/libonnxruntime.so.1.25.1 in the UI action env.

Heroci smoke

  • service_editor install --from-ci --root produced all 3 artifacts at /root/hero/bin/.
  • Both binaries start cleanly. hero_editor_server binds its Unix socket; hero_editor_ui prints all routes and stays alive.
  • Binaries are UPX-packed (which makes file and readelf -d falsely report "statically linked / no dynamic section") — upx -d on a copy reveals the underlying glibc-dynamic linkage (libc.so.6, libm.so.6, libgcc_s.so.1), confirming dlopen() will work for ort's runtime libonnxruntime.so resolution.
  • Voice WebSocket end-to-end smoke deferred to a UX-validation session per D-05 (the proof-of-load-dynamic only shows up when /ws/voice is hit).

Producer-side fix-forward chain on rc4

Three small fixes had to land before rc4 went green; saving the lessons in the playbook:

  • PR #9actions/checkout@v4 had the Forgejo auth bug (extraheader + git-fetch don't agree). Editor's build.yaml had documented this and used a manual git clone since PR #4; same fix needed in build-linux.yaml.
  • PR #10 — bun is not preinstalled in the runner image; make bundle-web needs an explicit Setup-bun step (already present in build.yaml).
  • PR #11 — dropped build-macos.yaml (forge.ourworld.tf has no macOS runner; failures were just template carryover).
  • FORGEJO_TOKEN secret on hero_editor needed write:repository scope refresh (mirrors session 57 whiteboard pattern).

Playbook additions

Add to the 8-item Phase 2 playbook (now 14):

  • Item 14: build-linux.yaml must mirror the toolchain/auth conventions of build.yaml — manual clone if build.yaml does, plus any non-default toolchain installs (bun, deno, etc.).
  • Item 15: drop build-macos.yaml if present — forge.ourworld.tf has no macOS runner.
  • Item 16: pre-flight FORGEJO_TOKEN secret has write:repository scope before tagging.

Next sessions

  • Session 61: hero_voice. Same D-05 overlay (voice already declares ort with load-dynamic, so no Cargo work — only workflow + consumer wiring + matrix swap).
  • Session 62: hero_embedder. Same as voice.

Coverage projection: 15/15 + 2 → 17 services with --from-ci after sessions 61 + 62.

## Session 60 — D-05 implementation pilot complete: hero_editor → 15/15 (+1 ONNX service) **Coverage:** 14/15 → **15/15** (original Phase 2 set complete) + first ONNX service shipped, paving the way for hero_voice (session 61) and hero_embedder (session 62) to repeat the pattern. ### Producer side — `hero_editor v0.1.0-rc4` Released at https://forge.ourworld.tf/lhumina_code/hero_editor/releases/tag/v0.1.0-rc4 with **6 assets** (~45 MB total): - `hero_editor_server-{x86_64,aarch64}-unknown-linux-gnu` (518 KB / 488 KB) - `hero_editor_ui-{x86_64,aarch64}-unknown-linux-gnu` (1.2 MB / 1.2 MB) - `libonnxruntime.so.1.25.1-{x86_64,aarch64}-unknown-linux-gnu` (22.7 MB / 19.2 MB) D-05 fully validated end-to-end on the producer pipeline: load-dynamic ort + matrix swap to gnu + bundled Microsoft `libonnxruntime.so` all worked. ### Consumer side - hero_skills [#210](https://forge.ourworld.tf/lhumina_code/hero_skills/pulls/210) wired `service_editor install --from-ci` with libonnxruntime.so handling. - hero_skills [#212](https://forge.ourworld.tf/lhumina_code/hero_skills/pulls/212) tiny fix-forward (`svc_verify_elf` was unexported; nu degraded the Command-not-found error to External-command-failed inside try/catch, masking the real diagnosis). - hero_demo `services/hero_editor.toml` committed direct to development with `ORT_DYLIB_PATH=__HERO_BIN__/libonnxruntime.so.1.25.1` in the UI action env. ### Heroci smoke - `service_editor install --from-ci --root` produced all 3 artifacts at `/root/hero/bin/`. - Both binaries start cleanly. `hero_editor_server` binds its Unix socket; `hero_editor_ui` prints all routes and stays alive. - Binaries are UPX-packed (which makes `file` and `readelf -d` falsely report "statically linked / no dynamic section") — `upx -d` on a copy reveals the underlying glibc-dynamic linkage (libc.so.6, libm.so.6, libgcc_s.so.1), confirming dlopen() will work for ort's runtime libonnxruntime.so resolution. - Voice WebSocket end-to-end smoke deferred to a UX-validation session per D-05 (the proof-of-load-dynamic only shows up when /ws/voice is hit). ### Producer-side fix-forward chain on rc4 Three small fixes had to land before rc4 went green; saving the lessons in the playbook: - **PR #9** — `actions/checkout@v4` had the Forgejo auth bug (extraheader + git-fetch don't agree). Editor's `build.yaml` had documented this and used a manual git clone since PR #4; same fix needed in `build-linux.yaml`. - **PR #10** — bun is not preinstalled in the runner image; `make bundle-web` needs an explicit Setup-bun step (already present in `build.yaml`). - **PR #11** — dropped `build-macos.yaml` (forge.ourworld.tf has no macOS runner; failures were just template carryover). - **FORGEJO_TOKEN** secret on hero_editor needed `write:repository` scope refresh (mirrors session 57 whiteboard pattern). ### Playbook additions Add to the 8-item Phase 2 playbook (now 14): - Item 14: `build-linux.yaml` must mirror the toolchain/auth conventions of `build.yaml` — manual clone if `build.yaml` does, plus any non-default toolchain installs (bun, deno, etc.). - Item 15: drop `build-macos.yaml` if present — forge.ourworld.tf has no macOS runner. - Item 16: pre-flight `FORGEJO_TOKEN` secret has `write:repository` scope before tagging. ### Next sessions - **Session 61**: hero_voice. Same D-05 overlay (voice already declares ort with load-dynamic, so no Cargo work — only workflow + consumer wiring + matrix swap). - **Session 62**: hero_embedder. Same as voice. Coverage projection: 15/15 + 2 → **17 services with --from-ci** after sessions 61 + 62.
Author
Owner

Session 63 — D-05 hero_embedder pilot, ONNX rollout complete (16/15 → 17/15+)

Third and final application of D-05load-dynamic + bundled libonnxruntime.so + matrix musl→gnu — after hero_editor (session 60) and hero_voice (session 61). The D-05 ONNX rollout is done; all 17 first-class services now ship CI-built artifacts.

Producer side

  • hero_embedder PR #36 merged at 2257c36 (squash of 3 commits: fmt+clippy debt cleanup; D-05 workflow port + buildenv pin; hero_embedderd added to BINARIES — caught defect, the only ort-loading binary in the workspace was missing from the release manifest).
  • Cargo invariant verified: cargo tree -e features -p hero_embedder_lib | grep -E 'download-binaries|copy-dylibs' returns empty. ort was already declared with default-features = false, features = ["load-dynamic", "api-24"] on the workspace dep, so step 1 of the D-05 playbook collapsed to verification.
  • Workspace pre-merge gate green: cargo fmt --check, cargo clippy --workspace --all-targets -- -D warnings, cargo build --workspace --release (2m 33s) all clean.
  • Tag v0.2.0-rc1 shipped 12 release assets (5 binaries × 2 archs + libonnxruntime.so.1.25.1 × 2 archs) on first attempt — zero fix-forwards. Playbook items 14–16 (carried from sessions 57+60+61) prevented the editor's 4 fix-forwards.

Consumer side

  • hero_skills PR #216 merged at 39ab04d (squash of 3 commits):
    • refactor(lib): factor svc_install_onnx_runtime_download into lib.nu (rule of 3) — voice + editor were carrying byte-identical helpers; embedder triggered the rule-of-three. The shared helper takes onnx_version and ci_target as args.
    • feat(service_embedder): add --download / --version — mirrors voice canonical shape. svx_embedderd_action prefers the bundled .so over the system /usr/local/onnxruntime install when on disk. ORT preflight (svx_ort_require) is skipped under --download because the bundled .so is the source of truth, not the system install.
    • fix(dispatcher): forward --download/--version to embedder install/start — closes the dispatcher gap surfaced session 62.
  • hero_demo f33f8a7 — added download = "..." URLs + ORT_DYLIB_PATH env to services/hero_embedder.toml (mirrors voice manifest pattern).

Heroci smoke (post-merge, hero_skills @ 39ab04d)

$ service_embedder install --download --reset
→ hero_embedder: fetching release v0.2.0-rc1 from lhumina_code/hero_embedder...
  ⤓ hero_embedder-x86_64-unknown-linux-gnu       → /root/hero/bin/hero_embedder
  ⤓ hero_embedderd-x86_64-unknown-linux-gnu      → /root/hero/bin/hero_embedderd
  ⤓ hero_embedder_server-x86_64-unknown-linux-gnu → /root/hero/bin/hero_embedder_server
  ⤓ hero_embedder_ui-x86_64-unknown-linux-gnu    → /root/hero/bin/hero_embedder_ui
  ⤓ hero_embedder_proxy-x86_64-unknown-linux-gnu → /root/hero/bin/hero_embedder_proxy
  ⤓ libonnxruntime.so.1.25.1-x86_64-unknown-linux-gnu → /root/hero/bin/libonnxruntime.so.1.25.1
  ✓ hero_embedder install complete (--download v0.2.0-rc1)

Verification:

  • nm -D /root/hero/bin/libonnxruntime.so.1.25.1 | grep OrtGetApiBaseOrtGetApiBase@@VERS_1.25.1 (correct ABI).
  • Binaries are UPX-packed; upx -d + ldd confirms glibc-dynamic linkage (libc.so.6, libm.so.6, libgcc_s.so.1).
  • hero_embedderd boots cleanly under ORT_DYLIB_PATH=/root/hero/bin/libonnxruntime.so.1.25.1.

Embedder semantic-search end-to-end is a UX gate per D-05 (same pattern as the voice-WS deferral in session 60 — D-05 only requires the binary starts with the .so resolvable; full feature exercise is a separate session).

Coverage delta

  • Before session 63: 16/15 (Phase 2 set + editor + voice).
  • After session 63: 17/15+ (+ embedder).
  • D-05 ONNX rollout: complete (editor v0.1.0-rc4, voice v0.1.0-rc2, embedder v0.2.0-rc1).

The 14 already-shipping services keep static-musl. The 3 ONNX services ship gnu-glibc binaries with a sibling libonnxruntime.so. Any future ONNX service needs only its own constants + one call into the now-shared svc_install_onnx_runtime_download helper.

Notes

  • Pure-inheritance session. Phase B.5 adversarial review was correctly skipped — third application of an already-proven pattern, no new spec.
  • Single non-trivial defect caught during consumer-side review: BINARIES in hero_embedder/buildenv.sh was missing hero_embedderd, the only binary in the workspace that loads ort dynamically. Without that fix the bundled .so would have shipped with no consumer. Filed as a third producer commit before opening for merge.
  • L-05 (--from-ci/--download not supported on start's purge-and-rebuild path) remains open and out of scope for this session.
**Session 63 — D-05 hero_embedder pilot, ONNX rollout complete (16/15 → 17/15+)** Third and final application of [D-05](https://forge.ourworld.tf/lhumina_code/hero_demo/src/branch/development/decisions/D-05-onnx-cross-compile.md) — `load-dynamic` + bundled `libonnxruntime.so` + matrix musl→gnu — after `hero_editor` (session 60) and `hero_voice` (session 61). The D-05 ONNX rollout is done; all 17 first-class services now ship CI-built artifacts. ## Producer side - [hero_embedder PR #36](https://forge.ourworld.tf/lhumina_code/hero_embedder/pulls/36) merged at `2257c36` (squash of 3 commits: fmt+clippy debt cleanup; D-05 workflow port + buildenv pin; **hero_embedderd added to BINARIES** — caught defect, the only ort-loading binary in the workspace was missing from the release manifest). - Cargo invariant verified: `cargo tree -e features -p hero_embedder_lib | grep -E 'download-binaries|copy-dylibs'` returns empty. ort was already declared with `default-features = false, features = ["load-dynamic", "api-24"]` on the workspace dep, so step 1 of the D-05 playbook collapsed to verification. - Workspace pre-merge gate green: `cargo fmt --check`, `cargo clippy --workspace --all-targets -- -D warnings`, `cargo build --workspace --release` (2m 33s) all clean. - Tag [v0.2.0-rc1](https://forge.ourworld.tf/lhumina_code/hero_embedder/releases/tag/v0.2.0-rc1) shipped **12 release assets** (5 binaries × 2 archs + `libonnxruntime.so.1.25.1` × 2 archs) on **first attempt — zero fix-forwards**. Playbook items 14–16 (carried from sessions 57+60+61) prevented the editor's 4 fix-forwards. ## Consumer side - [hero_skills PR #216](https://forge.ourworld.tf/lhumina_code/hero_skills/pulls/216) merged at `39ab04d` (squash of 3 commits): - `refactor(lib): factor svc_install_onnx_runtime_download into lib.nu (rule of 3)` — voice + editor were carrying byte-identical helpers; embedder triggered the rule-of-three. The shared helper takes `onnx_version` and `ci_target` as args. - `feat(service_embedder): add --download / --version` — mirrors voice canonical shape. `svx_embedderd_action` prefers the bundled `.so` over the system `/usr/local/onnxruntime` install when on disk. ORT preflight (`svx_ort_require`) is **skipped under `--download`** because the bundled .so is the source of truth, not the system install. - `fix(dispatcher): forward --download/--version to embedder install/start` — closes the dispatcher gap surfaced session 62. - [hero_demo `f33f8a7`](https://forge.ourworld.tf/lhumina_code/hero_demo/commit/f33f8a7) — added `download = "..."` URLs + `ORT_DYLIB_PATH` env to `services/hero_embedder.toml` (mirrors voice manifest pattern). ## Heroci smoke (post-merge, hero_skills @ `39ab04d`) ``` $ service_embedder install --download --reset → hero_embedder: fetching release v0.2.0-rc1 from lhumina_code/hero_embedder... ⤓ hero_embedder-x86_64-unknown-linux-gnu → /root/hero/bin/hero_embedder ⤓ hero_embedderd-x86_64-unknown-linux-gnu → /root/hero/bin/hero_embedderd ⤓ hero_embedder_server-x86_64-unknown-linux-gnu → /root/hero/bin/hero_embedder_server ⤓ hero_embedder_ui-x86_64-unknown-linux-gnu → /root/hero/bin/hero_embedder_ui ⤓ hero_embedder_proxy-x86_64-unknown-linux-gnu → /root/hero/bin/hero_embedder_proxy ⤓ libonnxruntime.so.1.25.1-x86_64-unknown-linux-gnu → /root/hero/bin/libonnxruntime.so.1.25.1 ✓ hero_embedder install complete (--download v0.2.0-rc1) ``` Verification: - `nm -D /root/hero/bin/libonnxruntime.so.1.25.1 | grep OrtGetApiBase` → `OrtGetApiBase@@VERS_1.25.1` (correct ABI). - Binaries are UPX-packed; `upx -d` + `ldd` confirms glibc-dynamic linkage (`libc.so.6`, `libm.so.6`, `libgcc_s.so.1`). - `hero_embedderd` boots cleanly under `ORT_DYLIB_PATH=/root/hero/bin/libonnxruntime.so.1.25.1`. Embedder semantic-search end-to-end is a **UX gate** per D-05 (same pattern as the voice-WS deferral in session 60 — D-05 only requires the binary starts with the .so resolvable; full feature exercise is a separate session). ## Coverage delta - Before session 63: 16/15 (Phase 2 set + editor + voice). - After session 63: **17/15+** (+ embedder). - D-05 ONNX rollout: ✅ complete (editor v0.1.0-rc4, voice v0.1.0-rc2, embedder v0.2.0-rc1). The 14 already-shipping services keep static-musl. The 3 ONNX services ship gnu-glibc binaries with a sibling `libonnxruntime.so`. Any future ONNX service needs only its own constants + one call into the now-shared `svc_install_onnx_runtime_download` helper. ## Notes - Pure-inheritance session. Phase B.5 adversarial review was correctly skipped — third application of an already-proven pattern, no new spec. - Single non-trivial defect caught during consumer-side review: `BINARIES` in `hero_embedder/buildenv.sh` was missing `hero_embedderd`, the *only* binary in the workspace that loads ort dynamically. Without that fix the bundled `.so` would have shipped with no consumer. Filed as a third producer commit before opening for merge. - L-05 (`--from-ci`/`--download` not supported on `start`'s purge-and-rebuild path) remains open and out of scope for this session.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_demo#54
No description provided.