Adapt hero_proc, hero_inspector, and hero_proxy to the hero_sdk architecture (hero_rpc#13) #34

Closed
opened 2026-03-31 12:02:36 +00:00 by timur · 2 comments
Owner

Context

hero_rpc is being restructured into hero_sdk and hero_osis into hero_core (see hero_rpc#13). The key architectural changes that affect the infrastructure services (hero_proc, hero_inspector, hero_proxy) are:

  1. Single binary per service — no more separate _server and _ui binaries. One binary runs everything.
  2. Three socket convention per service:
    ~/hero/var/sockets/{service}.sock                              # Service info, context management
    ~/hero/var/sockets/{service}_ui.sock                            # Admin UI (HTTP)
    ~/hero/var/sockets/{service}_server/{context}/{domain}.sock     # Per-context, per-domain OpenRPC
    
  3. HeroServer builder — services declare their domains and UI in code, the framework handles socket binding.
  4. Shared domain modelshero_sdk_models provides feature-gated domains (identity, communication, calendar, etc.) that any service can import.
  5. Context-based multi-tenancy — each service can serve multiple contexts, each with isolated domain data.
  6. Cross-domain communication — services talk to each other via domain sockets (RPC clients), not in-process wiring.

This issue explores what each infrastructure service needs to change.


1. hero_proc — Process Supervisor

Current state

  • Manages services as individual processes
  • Each logical service has 2 registrations: hero_*_server and hero_*_ui (two binaries, two PIDs)
  • Rhai scripts (install_and_run.rhai) register and start two services per project
  • quick_service_set_full(name, cmd, "exec", "") for each binary

What changes

a) Single service registration instead of two

With the unified binary, hero_proc registers one service per project:

hero_food → one process, one PID

Instead of:

hero_food_server → PID 1
hero_food_ui → PID 2

The single binary creates all three socket types on startup. hero_proc only needs to manage one process.

b) Service health should check all three socket layers

Currently hero_proc pings one socket per service. With the new architecture:

  • Service socket ({service}.sock) — basic health, context management
  • UI socket ({service}_ui.sock) — HTTP health check
  • Domain sockets ({service}_server/{context}/{domain}.sock) — per-domain health

hero_proc's health model should report:

hero_food: running (pid: 12345)
  service:  healthy
  ui:       healthy
  domains:
    root/delivery:    healthy
    root/restaurant:  healthy
    root/identity:    healthy

c) Rhai scripts simplify

Current pattern (two services):

proc.quick_service_set_full("hero_food_server", cmd_server, "exec", "");
proc.quick_service_set_full("hero_food_ui", cmd_ui, "exec", "");
proc.quick_service_start("hero_food_server");
proc.quick_service_start("hero_food_ui");

New pattern (one service):

proc.quick_service_set_full("hero_food", cmd, "exec", "");
proc.quick_service_start("hero_food");

d) Domain-level operations

hero_proc could expose domain-aware operations:

  • service.domains("hero_food") — list active domains and contexts
  • service.domain_health("hero_food", "root", "delivery") — health check a specific domain
  • These are informational; hero_proc doesn't manage individual domains (the binary does)

e) Context lifecycle integration

hero_core manages context lifecycle (create/delete). hero_proc should be aware of context changes so it can:

  • Update its domain socket tracking when a new context is added
  • Report context-level health accurately
  • Possibly expose context operations in the dashboard

2. hero_inspector — Service Discovery & Documentation

Current state

  • Scans ~/hero/var/sockets/ for .sock files
  • Each socket = one entry in the autodiscovery list
  • Shows OpenRPC spec, methods, health per socket
  • hero_food_server and hero_food_ui are separate entries

What changes

a) Hierarchical service discovery

Instead of flat socket scanning, inspector should understand the service → socket hierarchy:

hero_food (service)
  ├── hero_food.sock          (service info)
  ├── hero_food_ui.sock       (admin UI)
  └── hero_food_server/       (domain sockets)
      ├── root/
      │   ├── delivery.sock
      │   ├── restaurant.sock
      │   └── identity.sock
      └── org_acme/
          ├── delivery.sock
          └── restaurant.sock

The autodiscovery list shows one entry per service, with sub-navigation for domains.

This aligns with the already-requested changes in hero_inspector#10 — unified entries, embedded UI tab, etc.

b) Aggregate OpenRPC per service

Each domain socket has its own OpenRPC spec. Inspector should:

  • Aggregate all domain specs into a composite service view
  • Show a "Domains" tab listing all active domains with their contexts
  • Allow drilling into individual domain specs
  • Show which domains come from hero_sdk_models (shared) vs custom

c) Context-aware views

Inspector should show:

  • Which contexts are active for each service
  • Per-context domain availability
  • Data isolation boundaries

d) Service info socket

The {service}.sock provides metadata about the service:

  • Version, description
  • Registered domains and contexts
  • hero_sdk version

Inspector should read this to populate the service header.


3. hero_proxy — HTTP Reverse Proxy

Current state

  • Routes /{service_name}~/hero/var/sockets/{service_name}.sock
  • Flat mapping: each socket name = one URL path
  • http://127.0.0.1:9998/hero_food_uihero_food_ui.sock
  • http://127.0.0.1:9998/hero_food_serverhero_food_server.sock

What changes

a) Hierarchical URL routing

With three socket types, the proxy should support structured URL patterns:

GET  /{service}/                          → {service}_ui.sock       (admin UI)
POST /{service}/rpc                       → {service}.sock /rpc     (service info RPC)
POST /{service}/{context}/{domain}/rpc    → {service}_server/{context}/{domain}.sock /rpc
GET  /{service}/{context}/{domain}/openrpc.json → domain OpenRPC spec
GET  /{service}/{context}/{domain}/health  → domain health check

Examples:

GET  /hero_food/                           → hero_food admin dashboard
POST /hero_food/root/delivery/rpc          → delivery domain RPC (root context)
POST /hero_food/org_acme/restaurant/rpc    → restaurant domain RPC (org_acme context)
GET  /hero_food/root/delivery/openrpc.json → delivery OpenRPC spec

b) Backward compatibility

The current flat routing should continue to work during migration:

/{service}_ui     → {service}_ui.sock    (legacy)
/{service}_server → {service}.sock       (legacy)

c) Per-domain access control

The hierarchical socket model enables fine-grained access:

  • Give an AI agent access to only /{service}/{context}/delivery/rpc
  • Block access to identity domain from external clients
  • hero_proxy can enforce this with simple path-based rules

This is particularly powerful for the agent use case mentioned in hero_rpc#13: "give AI agent only the domain socket it needs."

d) Context routing

Proxy needs to understand that {context} is a variable path segment, not a static service name. Discovery of valid contexts could come from:

  • Scanning ~/hero/var/sockets/{service}_server/ subdirectories
  • Querying the service socket for registered contexts

4. Cross-cutting concerns

a) Socket discovery convention

All three infrastructure services need a shared understanding of the socket layout. Consider a hero_sdk_discovery crate or convention that:

  • Defines the socket path convention as constants/functions
  • Provides a discover_services() function that returns the full service tree
  • Is used by hero_proc, hero_inspector, and hero_proxy consistently

b) Service manifest

Each service's {service}.sock should serve a standardized discovery manifest at GET /.well-known/heroservice.json (this already exists in hero_rpc). It should include:

  • Service name, version, description
  • List of domains (shared + custom) with their schemas
  • List of active contexts
  • Socket paths for all sub-sockets
  • hero_sdk version

This manifest is the single source of truth that hero_proc, hero_inspector, and hero_proxy all consume.

c) Transition plan

During migration, both architectures will coexist:

  • Old services: two binaries, two flat sockets (_server + _ui)
  • New services: one binary, three socket layers

All infrastructure services must handle both patterns gracefully. The service manifest (or its absence) can be used to distinguish old vs new services.


Summary

Service Key Changes
hero_proc Single service registration, multi-layer health checks, simplified Rhai scripts, domain awareness
hero_inspector Hierarchical discovery, aggregate OpenRPC, context-aware views, service info socket
hero_proxy Hierarchical URL routing, per-domain/context paths, access control, backward compat
All Shared socket discovery convention, service manifest as source of truth, transition support

References

## Context hero_rpc is being restructured into **hero_sdk** and hero_osis into **hero_core** (see [hero_rpc#13](https://forge.ourworld.tf/lhumina_code/hero_rpc/issues/13)). The key architectural changes that affect the infrastructure services (hero_proc, hero_inspector, hero_proxy) are: 1. **Single binary per service** — no more separate `_server` and `_ui` binaries. One binary runs everything. 2. **Three socket convention** per service: ``` ~/hero/var/sockets/{service}.sock # Service info, context management ~/hero/var/sockets/{service}_ui.sock # Admin UI (HTTP) ~/hero/var/sockets/{service}_server/{context}/{domain}.sock # Per-context, per-domain OpenRPC ``` 3. **HeroServer builder** — services declare their domains and UI in code, the framework handles socket binding. 4. **Shared domain models** — `hero_sdk_models` provides feature-gated domains (identity, communication, calendar, etc.) that any service can import. 5. **Context-based multi-tenancy** — each service can serve multiple contexts, each with isolated domain data. 6. **Cross-domain communication** — services talk to each other via domain sockets (RPC clients), not in-process wiring. This issue explores what each infrastructure service needs to change. --- ## 1. hero_proc — Process Supervisor ### Current state - Manages services as individual processes - Each logical service has 2 registrations: `hero_*_server` and `hero_*_ui` (two binaries, two PIDs) - Rhai scripts (`install_and_run.rhai`) register and start two services per project - `quick_service_set_full(name, cmd, "exec", "")` for each binary ### What changes #### a) Single service registration instead of two With the unified binary, hero_proc registers **one** service per project: ``` hero_food → one process, one PID ``` Instead of: ``` hero_food_server → PID 1 hero_food_ui → PID 2 ``` The single binary creates all three socket types on startup. hero_proc only needs to manage one process. #### b) Service health should check all three socket layers Currently hero_proc pings one socket per service. With the new architecture: - **Service socket** (`{service}.sock`) — basic health, context management - **UI socket** (`{service}_ui.sock`) — HTTP health check - **Domain sockets** (`{service}_server/{context}/{domain}.sock`) — per-domain health hero_proc's health model should report: ``` hero_food: running (pid: 12345) service: healthy ui: healthy domains: root/delivery: healthy root/restaurant: healthy root/identity: healthy ``` #### c) Rhai scripts simplify Current pattern (two services): ```rhai proc.quick_service_set_full("hero_food_server", cmd_server, "exec", ""); proc.quick_service_set_full("hero_food_ui", cmd_ui, "exec", ""); proc.quick_service_start("hero_food_server"); proc.quick_service_start("hero_food_ui"); ``` New pattern (one service): ```rhai proc.quick_service_set_full("hero_food", cmd, "exec", ""); proc.quick_service_start("hero_food"); ``` #### d) Domain-level operations hero_proc could expose domain-aware operations: - `service.domains("hero_food")` — list active domains and contexts - `service.domain_health("hero_food", "root", "delivery")` — health check a specific domain - These are informational; hero_proc doesn't manage individual domains (the binary does) #### e) Context lifecycle integration hero_core manages context lifecycle (create/delete). hero_proc should be aware of context changes so it can: - Update its domain socket tracking when a new context is added - Report context-level health accurately - Possibly expose context operations in the dashboard --- ## 2. hero_inspector — Service Discovery & Documentation ### Current state - Scans `~/hero/var/sockets/` for `.sock` files - Each socket = one entry in the autodiscovery list - Shows OpenRPC spec, methods, health per socket - `hero_food_server` and `hero_food_ui` are separate entries ### What changes #### a) Hierarchical service discovery Instead of flat socket scanning, inspector should understand the **service → socket hierarchy**: ``` hero_food (service) ├── hero_food.sock (service info) ├── hero_food_ui.sock (admin UI) └── hero_food_server/ (domain sockets) ├── root/ │ ├── delivery.sock │ ├── restaurant.sock │ └── identity.sock └── org_acme/ ├── delivery.sock └── restaurant.sock ``` The autodiscovery list shows **one entry per service**, with sub-navigation for domains. This aligns with the already-requested changes in [hero_inspector#10](https://forge.ourworld.tf/lhumina_code/hero_inspector/issues/10) — unified entries, embedded UI tab, etc. #### b) Aggregate OpenRPC per service Each domain socket has its own OpenRPC spec. Inspector should: - Aggregate all domain specs into a **composite service view** - Show a "Domains" tab listing all active domains with their contexts - Allow drilling into individual domain specs - Show which domains come from `hero_sdk_models` (shared) vs custom #### c) Context-aware views Inspector should show: - Which contexts are active for each service - Per-context domain availability - Data isolation boundaries #### d) Service info socket The `{service}.sock` provides metadata about the service: - Version, description - Registered domains and contexts - hero_sdk version Inspector should read this to populate the service header. --- ## 3. hero_proxy — HTTP Reverse Proxy ### Current state - Routes `/{service_name}` → `~/hero/var/sockets/{service_name}.sock` - Flat mapping: each socket name = one URL path - `http://127.0.0.1:9998/hero_food_ui` → `hero_food_ui.sock` - `http://127.0.0.1:9998/hero_food_server` → `hero_food_server.sock` ### What changes #### a) Hierarchical URL routing With three socket types, the proxy should support structured URL patterns: ``` GET /{service}/ → {service}_ui.sock (admin UI) POST /{service}/rpc → {service}.sock /rpc (service info RPC) POST /{service}/{context}/{domain}/rpc → {service}_server/{context}/{domain}.sock /rpc GET /{service}/{context}/{domain}/openrpc.json → domain OpenRPC spec GET /{service}/{context}/{domain}/health → domain health check ``` Examples: ``` GET /hero_food/ → hero_food admin dashboard POST /hero_food/root/delivery/rpc → delivery domain RPC (root context) POST /hero_food/org_acme/restaurant/rpc → restaurant domain RPC (org_acme context) GET /hero_food/root/delivery/openrpc.json → delivery OpenRPC spec ``` #### b) Backward compatibility The current flat routing should continue to work during migration: ``` /{service}_ui → {service}_ui.sock (legacy) /{service}_server → {service}.sock (legacy) ``` #### c) Per-domain access control The hierarchical socket model enables **fine-grained access**: - Give an AI agent access to only `/{service}/{context}/delivery/rpc` - Block access to identity domain from external clients - hero_proxy can enforce this with simple path-based rules This is particularly powerful for the agent use case mentioned in hero_rpc#13: "give AI agent only the domain socket it needs." #### d) Context routing Proxy needs to understand that `{context}` is a variable path segment, not a static service name. Discovery of valid contexts could come from: - Scanning `~/hero/var/sockets/{service}_server/` subdirectories - Querying the service socket for registered contexts --- ## 4. Cross-cutting concerns ### a) Socket discovery convention All three infrastructure services need a **shared understanding** of the socket layout. Consider a `hero_sdk_discovery` crate or convention that: - Defines the socket path convention as constants/functions - Provides a `discover_services()` function that returns the full service tree - Is used by hero_proc, hero_inspector, and hero_proxy consistently ### b) Service manifest Each service's `{service}.sock` should serve a standardized discovery manifest at `GET /.well-known/heroservice.json` (this already exists in hero_rpc). It should include: - Service name, version, description - List of domains (shared + custom) with their schemas - List of active contexts - Socket paths for all sub-sockets - hero_sdk version This manifest is the **single source of truth** that hero_proc, hero_inspector, and hero_proxy all consume. ### c) Transition plan During migration, both architectures will coexist: - Old services: two binaries, two flat sockets (`_server` + `_ui`) - New services: one binary, three socket layers All infrastructure services must handle both patterns gracefully. The service manifest (or its absence) can be used to distinguish old vs new services. --- ## Summary | Service | Key Changes | |---------|-------------| | **hero_proc** | Single service registration, multi-layer health checks, simplified Rhai scripts, domain awareness | | **hero_inspector** | Hierarchical discovery, aggregate OpenRPC, context-aware views, service info socket | | **hero_proxy** | Hierarchical URL routing, per-domain/context paths, access control, backward compat | | **All** | Shared socket discovery convention, service manifest as source of truth, transition support | ## References - [hero_rpc#13 — Rethinking hero RPC, OSIS and backend architecture](https://forge.ourworld.tf/lhumina_code/hero_rpc/issues/13) - [hero_inspector#10 — Redesign service view](https://forge.ourworld.tf/lhumina_code/hero_inspector/issues/10) - [hero_proc#32 — UI improvements tracker](https://forge.ourworld.tf/lhumina_code/hero_proc/issues/32) - [hero_proc#29 — Rhai lifecycle scripts](https://forge.ourworld.tf/lhumina_code/hero_proc/issues/29)
Author
Owner

Tradeoff analysis: single binary vs separate server/UI processes

What's clearly better with single binary

  • Operational simplicity — One PID to manage, one thing to restart, one log stream. Currently hero_proc registers two services per project and the Rhai scripts are twice as long as they need to be.
  • Startup ordering eliminated — No more "server must start before UI" race conditions. The binary controls its own sequencing internally.
  • Resource efficiency — Shared memory, no duplicate runtime overhead across 20+ services.
  • Developer experienceHeroServer::new("hero_food").with_ui(router()).run() is ~15 lines vs the current 5-crate boilerplate.

What we lose

  • Independent restarts — Today you can restart hero_food_ui without touching hero_food_server. Useful when iterating on the dashboard without disrupting running services or connected clients. With a single binary, a UI fix requires full service restart.
  • Fault isolation — If the UI panics or leaks memory, the server keeps running. Single binary means a UI panic takes everything down.
  • Granular resource monitoring — hero_proc can currently see CPU/memory per component. With one binary it's all one number.

Assessment

The losses are real but minor for our context. Everything runs on a single machine, services are small, and the operational pain of managing two processes per service across 20+ services is the dominant cost. The fault isolation argument is the strongest counterpoint, but Rust services rarely panic, and the UI is just an Axum router — not a complex separate system.

The per-domain per-context sockets are the real architectural win here. That's a much more meaningful boundary than server-vs-UI. It gives proper multi-tenancy isolation, agent-scoped access, and focused OpenRPC specs — things the current architecture can't do.

Importantly, even with a single binary, the three socket types mean hero_proc can still do socket-level health checks independently. If the UI socket stops responding but domain sockets are healthy, hero_proc can report that. We don't need separate processes to get separate health signals — we just need separate endpoints, which the new architecture provides.

Verdict: Better architecture, minor tradeoffs, and the things we lose can mostly be recovered through socket-level monitoring rather than process-level separation.

## Tradeoff analysis: single binary vs separate server/UI processes ### What's clearly better with single binary - **Operational simplicity** — One PID to manage, one thing to restart, one log stream. Currently hero_proc registers two services per project and the Rhai scripts are twice as long as they need to be. - **Startup ordering eliminated** — No more "server must start before UI" race conditions. The binary controls its own sequencing internally. - **Resource efficiency** — Shared memory, no duplicate runtime overhead across 20+ services. - **Developer experience** — `HeroServer::new("hero_food").with_ui(router()).run()` is ~15 lines vs the current 5-crate boilerplate. ### What we lose - **Independent restarts** — Today you can restart `hero_food_ui` without touching `hero_food_server`. Useful when iterating on the dashboard without disrupting running services or connected clients. With a single binary, a UI fix requires full service restart. - **Fault isolation** — If the UI panics or leaks memory, the server keeps running. Single binary means a UI panic takes everything down. - **Granular resource monitoring** — hero_proc can currently see CPU/memory per component. With one binary it's all one number. ### Assessment The losses are **real but minor for our context**. Everything runs on a single machine, services are small, and the operational pain of managing two processes per service across 20+ services is the dominant cost. The fault isolation argument is the strongest counterpoint, but Rust services rarely panic, and the UI is just an Axum router — not a complex separate system. The **per-domain per-context sockets** are the real architectural win here. That's a much more meaningful boundary than server-vs-UI. It gives proper multi-tenancy isolation, agent-scoped access, and focused OpenRPC specs — things the current architecture can't do. Importantly, even with a single binary, the three socket types mean hero_proc can still do **socket-level health checks** independently. If the UI socket stops responding but domain sockets are healthy, hero_proc can report that. We don't need separate processes to get separate health signals — we just need separate endpoints, which the new architecture provides. **Verdict:** Better architecture, minor tradeoffs, and the things we lose can mostly be recovered through socket-level monitoring rather than process-level separation.
Owner

not sure I agree about one process,
we don't want to have _ui in memory if not used
this is only needed when a UI process needs it
and over time, many of them will become webassembly only ones
this also makes sure we have to over over openrpc & sdk

there is no duplicate runtime, the opposite, the UI is only there if needed
otherwise you load all the assets, ...

not sure I agree about one process, we don't want to have _ui in memory if not used this is only needed when a UI process needs it and over time, many of them will become webassembly only ones this also makes sure we have to over over openrpc & sdk there is no duplicate runtime, the opposite, the UI is only there if needed otherwise you load all the assets, ...
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_proc#34
No description provided.