Migrate hero_proc to the one-socket multi-domain model (domains as /api/{domain}/ paths) #148

Open
opened 2026-06-10 11:13:16 +00:00 by timur · 1 comment
Owner

Summary

hero_lib has moved multi-domain services to a one-socket-per-service model (domains as URL path segments), but hero_proc is only half-migrated: its service.toml, its Askama admin, and lab's hero_proc SDK still expect the old per-domain sockets (hero_proc/rpc_jobs.sock, rpc_logs.sock, …). The result: on latest hero_lib, hero_proc_server binds only hero_proc/rpc.sock, so lab service hero_proc fails (socket not found … rpc_jobs.sock) and discovery/admin break.

This was misdiagnosed as a regression — it is an intentional architecture change that needs finishing across consumers.

The new model (authoritative)

hero_lifecycle::ServiceManifest::serve_rpc_domains_with_extra (hero_lib/crates/hero_lifecycle/src/manifest.rs:768) is explicit:

"ONE socket per service. Domains are path segments… NO control-plane socket and NO rpc_<domain>.sock fan-out."

Everything is served on <service>/rpc.sock:

  • GET /api/domains.json → the domain list
  • POST /api/{domain}/rpc → JSON-RPC per domain
  • GET /api/{domain}/openrpc.json → spec per domain
  • GET /heroservice.json, GET /health.json

The macros that generate this: herolib_macros::openrpc_from_oschema! (multi-domain) and openrpc_server! (single-domain). The model landed via hero_lib dev commits 537e2b9b (multi-domain proxy routing), edb04a99 (100%-compliant web routes), ee95c88d (dispatch+manifest), 277299dc (simplify openrpc_proxy+manifest).

What is stale and must be migrated

  1. hero_proc/crates/hero_proc_server/service.toml — still declares rpc_jobs.sock, rpc_logs.sock, rpc_secrets.sock, rpc_system.sock. Should declare a single hero_proc/rpc.sock (domains live under /api/{domain}/… on it). Verify the banner/--info matches what serve_rpc_domains_with_extra actually binds.
  2. lab's hero_proc SDK (hero_skills/crates/lab, via hero_proc_sdk) — the multi-domain client still resolves $HERO_SOCKET_DIR/hero_proc/rpc_<domain>.sock. It must call <service>/rpc.sock with /api/{domain}/rpc. This is why lab service hero_proc / status fails.
  3. hero_proc's own SDK (hero_proc/crates/hero_proc_sdk) — same per-domain-socket assumption.
  4. hero_router discovery — confirm it discovers a multi-domain service via /heroservice.json + /api/domains.json and caches each /api/{domain}/openrpc.json (today router.services reports methods=0, spec=null for hero_proc's domains — see the related rpc.discover fix hero_lib@599e2c37, but the durable path is domains.json).
  5. Existing Askama hero_proc_adminkeep it working (Timur wants it as the 1:1 reference for the Dioxus admin). Update its socket/path expectations to the new model rather than removing it.

Plan

  • Migrate hero_proc_server/service.toml to one rpc.sock; confirm serve_domains_with binds exactly that + serves /api/domains.json + per-domain routes.
  • Update hero_proc_sdk + lab's hero_proc client to the <svc>/rpc.sock + /api/{domain}/rpc contract.
  • Keep the Askama hero_proc_admin running against the new model (reference UI).
  • Verify hero_router discovers + caches per-domain specs via domains.json.
  • lab service hero_proc --build --start clean; router.services shows hero_proc's domain methods > 0.

Validation / repro

PATH_CODE=<lhumina_code> lab service hero_proc --build --start   # was failing on rpc_jobs.sock
ls $HERO_SOCKET_DIR/hero_proc/                                   # latest build: only rpc.sock (correct, new model)
curl --unix-socket .../hero_proc/rpc.sock http://localhost/api/domains.json
curl --unix-socket .../hero_proc/rpc.sock http://localhost/api/jobs/openrpc.json

Note: the older hero_lib@0b06c634 (pre-refactor) + current hero_proc both bind the flat rpc_*.sock and work — that is the old model and is the thing being migrated away from.

  • Dioxus admin issue (consumer of /api/domains.json): see companion issue in hero_website_framework.
  • rpc.discover macro fix (single-domain discovery): hero_lib@599e2c37.

Filed by Claude (owner-mode work for Timur).

## Summary `hero_lib` has moved multi-domain services to a **one-socket-per-service** model (domains as URL path segments), but `hero_proc` is only **half-migrated**: its `service.toml`, its Askama admin, and `lab`'s hero_proc SDK still expect the **old per-domain sockets** (`hero_proc/rpc_jobs.sock`, `rpc_logs.sock`, …). The result: on latest `hero_lib`, `hero_proc_server` binds only `hero_proc/rpc.sock`, so `lab service hero_proc` fails (`socket not found … rpc_jobs.sock`) and discovery/admin break. This was misdiagnosed as a regression — it is an intentional architecture change that needs finishing across consumers. ## The new model (authoritative) `hero_lifecycle::ServiceManifest::serve_rpc_domains_with_extra` (`hero_lib/crates/hero_lifecycle/src/manifest.rs:768`) is explicit: > *"ONE socket per service. Domains are path segments… NO control-plane socket and NO `rpc_<domain>.sock` fan-out."* Everything is served on `<service>/rpc.sock`: - `GET /api/domains.json` → the domain list - `POST /api/{domain}/rpc` → JSON-RPC per domain - `GET /api/{domain}/openrpc.json` → spec per domain - `GET /heroservice.json`, `GET /health.json` The macros that generate this: `herolib_macros::openrpc_from_oschema!` (multi-domain) and `openrpc_server!` (single-domain). The model landed via hero_lib dev commits `537e2b9b` (multi-domain proxy routing), `edb04a99` (100%-compliant web routes), `ee95c88d` (dispatch+manifest), `277299dc` (simplify openrpc_proxy+manifest). ## What is stale and must be migrated 1. **`hero_proc/crates/hero_proc_server/service.toml`** — still declares `rpc_jobs.sock`, `rpc_logs.sock`, `rpc_secrets.sock`, `rpc_system.sock`. Should declare a single `hero_proc/rpc.sock` (domains live under `/api/{domain}/…` on it). Verify the banner/`--info` matches what `serve_rpc_domains_with_extra` actually binds. 2. **`lab`'s hero_proc SDK** (`hero_skills/crates/lab`, via `hero_proc_sdk`) — the multi-domain client still resolves `$HERO_SOCKET_DIR/hero_proc/rpc_<domain>.sock`. It must call `<service>/rpc.sock` with `/api/{domain}/rpc`. This is why `lab service hero_proc` / status fails. 3. **`hero_proc`'s own SDK** (`hero_proc/crates/hero_proc_sdk`) — same per-domain-socket assumption. 4. **`hero_router` discovery** — confirm it discovers a multi-domain service via `/heroservice.json` + `/api/domains.json` and caches each `/api/{domain}/openrpc.json` (today `router.services` reports `methods=0, spec=null` for hero_proc's domains — see the related `rpc.discover` fix `hero_lib@599e2c37`, but the durable path is `domains.json`). 5. **Existing Askama `hero_proc_admin`** — **keep it working** (Timur wants it as the 1:1 reference for the Dioxus admin). Update its socket/path expectations to the new model rather than removing it. ## Plan - [ ] Migrate `hero_proc_server/service.toml` to one `rpc.sock`; confirm `serve_domains_with` binds exactly that + serves `/api/domains.json` + per-domain routes. - [ ] Update `hero_proc_sdk` + `lab`'s hero_proc client to the `<svc>/rpc.sock` + `/api/{domain}/rpc` contract. - [ ] Keep the Askama `hero_proc_admin` running against the new model (reference UI). - [ ] Verify `hero_router` discovers + caches per-domain specs via `domains.json`. - [ ] `lab service hero_proc --build --start` clean; `router.services` shows hero_proc's domain methods > 0. ## Validation / repro ``` PATH_CODE=<lhumina_code> lab service hero_proc --build --start # was failing on rpc_jobs.sock ls $HERO_SOCKET_DIR/hero_proc/ # latest build: only rpc.sock (correct, new model) curl --unix-socket .../hero_proc/rpc.sock http://localhost/api/domains.json curl --unix-socket .../hero_proc/rpc.sock http://localhost/api/jobs/openrpc.json ``` Note: the older `hero_lib@0b06c634` (pre-refactor) + current hero_proc both bind the flat `rpc_*.sock` and work — that is the *old* model and is the thing being migrated away from. ## Related - Dioxus admin issue (consumer of `/api/domains.json`): see companion issue in `hero_website_framework`. - `rpc.discover` macro fix (single-domain discovery): `hero_lib@599e2c37`. *Filed by Claude (owner-mode work for Timur).*
Author
Owner

The new one-socket model works end-to-end on the serve + admin side — the gap is purely stale consumers:

  • Built + ran new-model hero_proc (single hero_proc/rpc.sock). It correctly serves /api/domains.json, per-domain /api/<domain>/openrpc.json, and /api/<domain>/rpc.
  • The new Dioxus admin (hero_website_framework@5e9724f) consumes it live (domains jobs/logs/secrets/system; 88 jobs methods).
  • Confirmed the rpc.discover macro fix (hero_lib@599e2c37): each per-domain /api/<domain>/rpc answers rpc.discover with its spec, so hero_router can introspect over RPC.

Still stale / to migrate (this issue):

  • hero_proc_server/service.toml still declares rpc_jobs.sock / rpc_logs.sock / rpc_secrets.sock / rpc_system.sock.
  • The Askama hero_proc_admin + lab's hero_proc SDK still resolve rpc_<domain>.sock. lab service hero_proc fails its readiness check on rpc_jobs.sock — that is the migration target (keep the Askama admin working as the Dioxus reference).
The new one-socket model works end-to-end on the **serve + admin** side — the gap is purely stale consumers: - Built + ran new-model `hero_proc` (single `hero_proc/rpc.sock`). It correctly serves `/api/domains.json`, per-domain `/api/<domain>/openrpc.json`, and `/api/<domain>/rpc`. - The new Dioxus admin (`hero_website_framework@5e9724f`) consumes it live (domains jobs/logs/secrets/system; 88 jobs methods). - Confirmed the `rpc.discover` macro fix (`hero_lib@599e2c37`): each per-domain `/api/<domain>/rpc` answers `rpc.discover` with its spec, so `hero_router` can introspect over RPC. Still **stale / to migrate** (this issue): - `hero_proc_server/service.toml` still declares `rpc_jobs.sock` / `rpc_logs.sock` / `rpc_secrets.sock` / `rpc_system.sock`. - The Askama `hero_proc_admin` + `lab`'s hero_proc SDK still resolve `rpc_<domain>.sock`. `lab service hero_proc` fails its readiness check on `rpc_jobs.sock` — that is the migration target (keep the Askama admin working as the Dioxus reference).
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_proc#148
No description provided.