Production container with pre-built service binaries #34

Closed
mik-tf wants to merge 10 commits from development_production_container into development
Owner

Summary

Production container that pre-builds ALL hero service binaries at Docker build time, producing a slim debian:bookworm-slim image (~427MB) with no Rust toolchain or SSH keys needed at runtime.

Closes #30
Depends on #32 (dev container fixes this is based on)

What's included

  • Dockerfile.prod: Multi-stage build that compiles hero_services + zinit + all 12 service repos
  • docker/build-services.sh: Build script that clones and builds each service (with build_cargo for repos needing custom builds like hero_os)
  • docker/strip-build-sections.sh: Strips [build]/[install]/[download] sections and removes "install" from profile actions
  • build-prod-container.yaml: CI workflow for production image builds (manual dispatch)
  • Code fix: stop_and_clean preserves pre-built binaries when no [build] sections exist

Test results (local build)

Build: 12/12 service repos compiled successfully

Runtime: 19/21 services running

Service Status
hero_auth Running
hero_books Blocked (depends on hero_indexer)
hero_embedder_openrpc (server) Running
hero_embedder_http (ui) Running
hero_fossil Running
hero_indexer Failed (TOML expects hero_indexer but repo builds hero_indexer_openrpc)
hero_indexer_ui Running
hero_inspector_openrpc Running
hero_inspector_http Running
hero_os_openrpc Running
hero_os_http Running
hero_osis_openrpc Running
hero_osis_http Running
hero_proxy_openrpc Running
hero_proxy_http Running
hero_redis_openrpc Running
hero_redis_http Running
hero_voice_openrpc Running
hero_voice_http Running
zinit_openrpc Running
zinit_http Running

Remaining issue

The hero_indexer.toml references binary hero_indexer but the repo now builds hero_indexer_openrpc and hero_indexer_http (restructured). Fixing the TOML exec line will bring it to 21/21. This is a pre-existing issue also affecting the dev container.

Dev vs Production container

Dev (Dockerfile) Prod (Dockerfile.prod)
Runtime base rust:slim-bookworm debian:bookworm-slim
Image size ~1.5 GB ~427 MB
First startup Clones + builds services (slow) Instant
SSH keys Required at runtime Not needed
Use case Development, testing Deployment

Run

# Build locally
docker buildx build -f Dockerfile.prod --ssh default -t hero_zero:prod .

# Run (no SSH keys needed!)
docker run --rm -it \
  -p 6666:6666 -p 3388:3388 -p 3875:3875 \
  hero_zero:prod
## Summary Production container that pre-builds ALL hero service binaries at Docker build time, producing a slim `debian:bookworm-slim` image (~427MB) with no Rust toolchain or SSH keys needed at runtime. Closes #30 Depends on #32 (dev container fixes this is based on) ## What's included - **`Dockerfile.prod`**: Multi-stage build that compiles hero_services + zinit + all 12 service repos - **`docker/build-services.sh`**: Build script that clones and builds each service (with `build_cargo` for repos needing custom builds like hero_os) - **`docker/strip-build-sections.sh`**: Strips `[build]`/`[install]`/`[download]` sections and removes "install" from profile actions - **`build-prod-container.yaml`**: CI workflow for production image builds (manual dispatch) - **Code fix**: `stop_and_clean` preserves pre-built binaries when no `[build]` sections exist ## Test results (local build) **Build: 12/12 service repos compiled successfully** **Runtime: 19/21 services running** | Service | Status | |---------|--------| | hero_auth | Running | | hero_books | Blocked (depends on hero_indexer) | | hero_embedder_openrpc (server) | Running | | hero_embedder_http (ui) | Running | | hero_fossil | Running | | hero_indexer | Failed (TOML expects `hero_indexer` but repo builds `hero_indexer_openrpc`) | | hero_indexer_ui | Running | | hero_inspector_openrpc | Running | | hero_inspector_http | Running | | hero_os_openrpc | Running | | hero_os_http | Running | | hero_osis_openrpc | Running | | hero_osis_http | Running | | hero_proxy_openrpc | Running | | hero_proxy_http | Running | | hero_redis_openrpc | Running | | hero_redis_http | Running | | hero_voice_openrpc | Running | | hero_voice_http | Running | | zinit_openrpc | Running | | zinit_http | Running | ## Remaining issue The `hero_indexer.toml` references binary `hero_indexer` but the repo now builds `hero_indexer_openrpc` and `hero_indexer_http` (restructured). Fixing the TOML exec line will bring it to 21/21. This is a pre-existing issue also affecting the dev container. ## Dev vs Production container | | Dev (`Dockerfile`) | Prod (`Dockerfile.prod`) | |---|---|---| | Runtime base | `rust:slim-bookworm` | `debian:bookworm-slim` | | Image size | ~1.5 GB | ~427 MB | | First startup | Clones + builds services (slow) | Instant | | SSH keys | Required at runtime | Not needed | | Use case | Development, testing | Deployment | ## Run ```bash # Build locally docker buildx build -f Dockerfile.prod --ssh default -t hero_zero:prod . # Run (no SSH keys needed!) docker run --rm -it \ -p 6666:6666 -p 3388:3388 -p 3875:3875 \ hero_zero:prod ```
fix: correct Dockerfile binary names, CI pipeline, and add entrypoint
All checks were successful
Build and Test / build (pull_request) Successful in 6m26s
05b6c3ff98
- Dockerfile: fix binary names (hero_services_openrpc, not hero_zero),
  build zinit workspace, use rust:slim-bookworm runtime with g++ for
  services that need C++ at install time
- CI workflow: manual git clone (actions/checkout fails in alpine DinD),
  explicit dockerd startup, SSH key via env block to prevent multiline
  mangling, StrictHostKeyChecking accept-new
- Entrypoint: start zinit_openrpc, wait for socket, launch
  hero_services_openrpc with user profile. Generic SSH key permission
  fix for any mounted key type.

Co-Authored-By: mik-tf <mik@threefold.io>
fix: use flock to prevent race condition on shared repo installs
All checks were successful
Build and Test / build (pull_request) Successful in 5m25s
a9b13c11a6
When multiple services share the same git repo (e.g. zinit_openrpc
and zinit_http both use geomind_code/zinit), their install oneshots
race on the same directory. The second install starts milliseconds
after the first finishes and fails with exit 128 (git lock conflict).

Wrap clone_or_update_sh in flock so concurrent installs serialize
their git operations on the same repo directory.

Fixes #33

Co-Authored-By: mik-tf <mik@threefold.io>
feat: add production container with pre-built service binaries
All checks were successful
Build and Test / build (pull_request) Successful in 6m23s
0f01466789
Production Dockerfile (Dockerfile.prod) compiles ALL hero service
binaries at build time, producing a slim debian:bookworm-slim image
with no Rust toolchain or SSH keys needed at runtime.

- docker/build-services.sh: clones and builds 12 service repos
- docker/strip-build-sections.sh: removes [build]/[install] TOML
  sections so orchestrator starts services without install oneshots
- build-prod-container.yaml: CI workflow for production image builds

Co-Authored-By: mik-tf <mik@threefold.io>
fix: preserve pre-built binaries in production mode
All checks were successful
Build and Test / build (pull_request) Successful in 6m21s
e422ebe918
- stop_and_clean: skip binary deletion when no services have [build]
  sections (production containers with pre-baked binaries)
- build-services.sh: add build_cargo() for repos needing direct cargo
  builds (hero_os: skip WASM/Dioxus frontend), fix status tracking
- strip-build-sections.sh: also remove "install" from profile actions
  to prevent orchestrator from writing install oneshots

Tested: 19/21 services running in production container.
Remaining: hero_indexer (TOML naming mismatch) and hero_books (blocked
by hero_indexer dependency).

Co-Authored-By: mik-tf <mik@threefold.io>
fix: split hero_indexer into openrpc/http to match repo binary names
All checks were successful
Build and Test / build (pull_request) Successful in 6m23s
d2f96f7aed
The hero_indexer repo now builds hero_indexer_openrpc + hero_indexer_http
(not a single hero_indexer binary). Split the TOML accordingly and update
all depends_on references in hero_books, hero_indexer_ui, hero_osis_openrpc.

Refs #29

Co-Authored-By: mik-tf <mik@threefold.io>
mik-tf changed title from WIP: Production container with pre-built service binaries to Production container with pre-built service binaries 2026-02-26 18:25:22 +00:00
fix: disable kill_others in production mode to prevent restart cascades
All checks were successful
Build and Test / build (pull_request) Successful in 5m24s
45ea909e95
When no [build] section exists (production container with pre-built
binaries), kill_others is unnecessary since there are no stale processes.
The flag causes _http services to kill each other's ports on simultaneous
startup, exhausting zinit's retry budget.

Refs #33

Co-Authored-By: mik-tf <mik@threefold.io>
fix: add ONNX Runtime, fix zinit double-start, fix hero_books embedder URL
All checks were successful
Build and Test / build (pull_request) Successful in 6m23s
d6c8ff3eb3
Production container fixes for 20/20 services:
- Download ONNX Runtime 1.23.2 for hero_embedder (uses load-dynamic dlopen)
- Remove zinit TOMLs from production profile (infrastructure, not user service)
- Fix hero_books embedder URL to use Unix socket instead of broken HTTP URL

Co-Authored-By: mik-tf <mik@threefold.io>
docs: add production container section to README
All checks were successful
Build and Test / build (pull_request) Successful in 5m25s
a4e6ee54d5
Document the production container image (pull, run, tags), 20 service
list, startup notes, key ports, CI build process, and architecture.
Rename existing Docker section to "Development Container".

Co-Authored-By: mik-tf <mik@threefold.io>
Add hero_os_ui Dioxus shell and hero_archipelagos standalone island
builds to the production Docker image. This enables the Hero OS desktop
environment with all island apps (settings, books, calendar, contacts,
etc.) to be served by hero_os_http.

Changes:
- Add wasm32-unknown-unknown target, wasm-pack, and dioxus-cli to builder
- New docker/build-wasm.sh script (builds shell + 37 islands)
- Copy WASM assets to runtime stage
- Set HERO_OS_ASSETS/HERO_OS_ISLANDS env vars for hero_os_http
- Pin zinit to 9d21ba5 (workaround rust-version inheritance bug)
- Expose port 8804 (hero_os_http)

Tested locally: 22 services running, shell served at /, 31/37 islands
built successfully.

Closes #37

Co-Authored-By: mik-tf <mik@threefold.io>
mik-tf closed this pull request 2026-02-27 14:52:36 +00:00
Author
Owner

Superseded by #43 (combined deploy branch with all PRs merged).

Superseded by #43 (combined deploy branch with all PRs merged).
All checks were successful
Build and Test / build (pull_request) Successful in 6m23s

Pull request closed

Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_services!34
No description provided.