[hero_codescalers] Fix service registration name — UI stuck in starting #17
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
The hero_codescalers UI is stuck in "starting" state. Root cause identified:
the service is registered under the wrong name (
hero_codescalers_server_uiinstead of the correct naming convention used by hero_router).
Steps to fix
crates/hero_codescalers_server/and
crates/hero_codescalers_ui/$HERO_SOCKET_DIR/hero_codescalers_server<N>/rpc.sockAcceptance Criteria
Implementation Spec for Issue #17
Objective
Eliminate every remaining reference to the legacy
hero_codescalers_server-prefixed socket directory naming so that hero_router reliably discovershero_codescalersunder the canonical Hero socket convention ($HERO_SOCKET_DIR/hero_codescalers/{rpc,ui}.sock) and the UI stops being stuck in "starting".Root Cause
The Rust source has already been migrated to the canonical convention (
SERVICE_NAME = "hero_codescalers"is used as the default--socknamein both binaries and as thesock_dir_name(0)value in the CLI), per the previous fix in commit7edb4e3("fix(cli): align CLI socket dir default with the server's", PR #16). However, stale documentation and a wrong env-var default inREADME.mdstill describe the OLD layout, which is what is making operators (and freshly-built environments whereHERO_CODESCALERS_SOCK_NAME=hero_codescalers_serverwas exported from a previous setup) bind to a directory that hero_router does not look in. Specifically:README.md:61-63— the "Unix socket directories" table still says instance 0 lives inhero_codescalers_server/(and instances N inhero_codescalers_serverN/), which contradicts the code (crates/hero_codescalers/src/main.rs:42-51, which produceshero_codescalers/andhero_codescalers_{N}/).README.md:243— example command points at~/hero/var/sockets/hero_codescalers_server1/rpc.sock, a path that nothing in the code now produces.README.md:265— env var table claimsHERO_CODESCALERS_SOCK_NAMEdefaults tohero_codescalers_server. The actual default, set via clap'sdefault_value = SERVICE_NAMEincrates/hero_codescalers_server/src/main.rs:43andcrates/hero_codescalers_ui/src/main.rs:59, ishero_codescalers.hero_codescalers_server_uifrom the issue title does not appear in source — it is a description of the symptom: when the UI binary inherited a staleHERO_CODESCALERS_SOCK_NAME=hero_codescalers_serverfrom the environment, both binaries wrote intohero_codescalers_server/, and hero_router (which scans for<service>/{rpc,ui}.sockand reads thenamefield from.well-known/heroservice.json) saw a directory whose name did not match the manifest'sname: "hero_codescalers"and could not stitch the UI behind the RPC service — leaving the UI listed but stuck in "starting".The code itself is consistent:
heroservice.jsonfor both crates declares"name": "hero_codescalers"; both binaries default--socknametoSERVICE_NAME = "hero_codescalers"; the CLI'ssock_dir_name(instance)returnshero_codescalersfor instance 0; both server and UI are launched withHERO_CODESCALERS_SOCK_NAME = sock_dir_name(instance)from the CLI (crates/hero_codescalers/src/main.rs:458,500). The bug is purely that the README still documents the legacy default and that there is no defensive guard against a stale exported env var overriding the CLI-supplied value.The issue body's "expected naming" —
$HERO_SOCKET_DIR/hero_codescalers_server<N>/rpc.sock— itself uses the legacy form. The canonical Hero socket convention (per thehero_socketsskill, sections 2 and 12) is unambiguous: the directory name MUST equal the service name, MUST NOT carry_server/_ui/_sdksuffixes, and<service>/rpc.sockis the canonical layout. The author of the issue likely typed the legacy name from memory; we should align the fix on the canonicalhero_codescalers/convention that the manifests and the actual Rust code already use.Requirements
hero_codescalers/,hero_codescalers_{N}/).HERO_CODESCALERS_SOCK_NAMEexported in the operator's environment MUST NOT silently override the CLI's instance-derived directory name and MUST NOT cause the UI's socket to land in a different directory from the server's.heroservice.jsonfor both crates MUST keep declaring"name": "hero_codescalers".$HERO_SOCKET_DIRMUST findhero_codescalers/rpc.sockANDhero_codescalers/ui.sockin the same directory, both manifest-namedhero_codescalers, and presenthero_codescalersas Healthy with a clickable UI link.hero_codescalers/(verified:Makefile:13-14,crates/hero_codescalers_ui/scripts/test-users-ui.sh:74).Files to Modify/Create
README.md— fix the "Unix socket directories" table (lines 61-63), the CLI example path (line 243), and the env-var defaults table (line 265) to match the code's actual behaviour.crates/hero_codescalers/src/main.rs— makeself_startimmune to a stale exportedHERO_CODESCALERS_SOCK_NAMEby adding a startup guard that warns if the externalHERO_CODESCALERS_SOCK_NAMEdiffers from the instance-derived value.Implementation Plan
Step 1: Fix the README documentation drift
Files:
README.mdChanges:
hero_codescalers/,hero_codescalers_1/,hero_codescalers_N/(matchingsock_dir_name()incrates/hero_codescalers/src/main.rs:42-51).~/hero/var/sockets/hero_codescalers_server1/rpc.sockto~/hero/var/sockets/hero_codescalers_1/rpc.sock.HERO_CODESCALERS_SOCK_NAMEreadshero_codescalers(matchingSERVICE_NAMEin both_serverand_uimain.rs).Dependencies: none
Step 2: Add a defensive startup guard in the CLI against a stale exported
HERO_CODESCALERS_SOCK_NAMEFiles:
crates/hero_codescalers/src/main.rsChanges:
build_service_definition(lines 419-539), after computinglet dir = sock_dir_name(instance);(line 433), check if the current process already hasHERO_CODESCALERS_SOCK_NAMEexported and it differs fromdir. If so, emit a singleeprintln!warning telling the operator that the exported value is being overridden by the instance-derived value for the spawned action's environment.// ── Server action ──):Dependencies: none (independent of Step 1)
Step 3: Verify hero_router picks up the service end-to-end
Files: none (verification only)
Changes:
make install && hero_codescalers --start.~/hero/var/sockets/hero_codescalers/rpc.sockand~/hero/var/sockets/hero_codescalers/ui.sockboth exist.curl --unix-socket ~/hero/var/sockets/hero_codescalers/rpc.sock http://localhost/.well-known/heroservice.jsonreturns"name":"hero_codescalers","socket":"rpc".curl --unix-socket ~/hero/var/sockets/hero_codescalers/ui.sock http://localhost/.well-known/heroservice.jsonreturns"name":"hero_codescalers","socket":"ui".http://127.0.0.1:9988/) —hero_codescalersMUST appear as Healthy and clickable; clicking the UI link MUST land on the dashboard viahttp://127.0.0.1:9988/hero_codescalers/ui/, no longer "starting".Dependencies: Step 1 and Step 2 merged.
Acceptance Criteria
README.mdno longer contains the stringshero_codescalers_server/,hero_codescalers_server1/,hero_codescalers_serverN/, or the env-var defaulthero_codescalers_server.crates/hero_codescalers/src/main.rswarns (non-fatally) when a stale exportedHERO_CODESCALERS_SOCK_NAMEwould have differed from the instance-derived directory.hero_codescalers --start,~/hero/var/sockets/hero_codescalers/rpc.sockAND~/hero/var/sockets/hero_codescalers/ui.sockboth exist (single shared directory, no_serveror_uidirectory siblings created).hero_routerdashboard listshero_codescalersas Healthy / connected / clickable and the UI link no longer reports "starting".cargo build --workspace --release,cargo fmt --all -- --check, andcargo clippy --workspace --all-targets -- -D warningsall pass.make test-serverormake test-ui(both already point at~/hero/var/sockets/hero_codescalers/{rpc,ui}.sock).Notes
$HERO_SOCKET_DIR/hero_codescalers_server<N>/rpc.sockform is the legacy layout. The canonical Hero convention (hero_socketsskill section 2 and 12) is$HERO_SOCKET_DIR/<service>/<type>.sockwith NO_server/_uisuffix on the directory. The Rust code already follows the canonical form; this fix only catches up the README and adds a defence against a stale env var. We are aligning with the canonical convention, not the issue body's example.$HERO_SOCKET_DIRperiodically (defaultHERO_ROUTER_REFRESH=30seconds); after the next stop/start ofhero_codescalers, hero_router will re-probe and pick up the (still canonical)hero_codescalers/directory. If a prior run left a stalehero_codescalers_server/directory on disk, operators shouldrm -rf ~/hero/var/sockets/hero_codescalers_server*once.~/hero/var/hero_codescalers/db) are untouched.heroservice.jsonalready correct: bothcrates/hero_codescalers_server/heroservice.jsonandcrates/hero_codescalers_ui/heroservice.jsondeclare"name": "hero_codescalers"(verified). hero_router uses this value as the authoritative service name when groupingrpc.sockandui.sockfrom the same directory under one logical service.hero_codescalers_{N}(e.g.hero_codescalers_1/). The README table fix preserves this — only the legacy_serversegment is removed.Test Results
cargo fmt --all -- --check
fail (pre-existing, unrelated to issue #17 changes)
cargo fmt --checkonHEAD(commita5d0f5c): the same 122 diffs are present, so this is a baseline issue in the repo, not a regression from this change.README.mdandcrates/hero_codescalers/src/main.rs— adding the stale-env-var warning) do not introduce any new fmt diffs in our edited region. Themain.rsdiffs are in unrelated lines (e.g. struct-init formatting in command handlers around lines 173, 251, 277, etc.).Representative excerpt (unrelated to our edit):
cargo clippy --workspace --all-targets -- -D warnings
fail (pre-existing, unrelated to issue #17 changes)
HEAD: 32 errors are reported inhero_codescalers_server(e.g.collapsible_iflint incrates/hero_codescalers_server/src/main.rs:272).dead_codeonIndexTemplate.versionincrates/hero_codescalers_ui/src/main.rs:113— also a file we did not touch.-D warningsis currently red ondevelopmentbaseline; our change does not add any new lints.Last warnings (with patch applied):
cargo test --workspace
Breakdown:
hero_codescalers(bin unittests): 2 passed (tests::duration_units,tests::duration_errors)hero_codescalers_sdk(lib unittests): 0 testshero_codescalers_server(bin unittests): 5 passed (jobs cleanup tests)hero_codescalers_ui(bin unittests): 0 testsnu_exec(lib unittests): 0 testshero_codescalers_sdkdoc-tests: 1 passednu_execdoc-tests: 2 passedAll workspace tests pass. The server bin emits 13
dead_codewarnings during the test build (pre-existing — same warnings onHEADwithout the patch).Notes
make test-serverandmake test-uiwere not run because they require a livehero_proc/hero_routerstack; that is part of the manual verification in Step 3 of the spec.developmentbaseline and are not introduced by this issue's changes; fixing them is out of scope for issue #17.hero_codescalers/form) andcrates/hero_codescalers/src/main.rs(added a non-fataleprintln!warning inbuild_service_definitionwhen a staleHERO_CODESCALERS_SOCK_NAMEenv var differs from the instance-derived directory).Implementation Summary
The Rust source already binds to the canonical
hero_codescalers/socket directory (see commit7edb4e3/ PR #16); the "stuck in starting" symptom was caused by stale README documentation pointing at the legacyhero_codescalers_server/layout, plus the absence of any guard against an operator carrying a staleHERO_CODESCALERS_SOCK_NAMEexport from a previous setup.Changes
README.md— replaced legacy socket-directory references with the canonical names:hero_codescalers_server/→hero_codescalers/hero_codescalers_server1/→hero_codescalers_1/hero_codescalers_serverN/→hero_codescalers_N/~/hero/var/sockets/hero_codescalers_server1/rpc.sock→~/hero/var/sockets/hero_codescalers_1/rpc.sockHERO_CODESCALERS_SOCK_NAMEdefault:hero_codescalers_server→hero_codescalersThe remaining
hero_codescalers_servermatches in the README are deliberately kept — they refer to the binary name (line 15) and to hero_proc action names (lines 69-71), which are distinct from the socket directory.crates/hero_codescalers/src/main.rs— added a non-fatal startup guard insidebuild_service_definition. When the caller's environment already exportsHERO_CODESCALERS_SOCK_NAMEand the value differs from the instance-derived directory, aneprintln!warning is emitted before the actions are spawned. The CLI was already passing the correct value viaActionBuilder::env(...)at lines 458 and 500, so this is purely informational and removes the silent foot-gun.Test Results
cargo test --workspace— pass (10/10: 2 hero_codescalers + 5 hero_codescalers_server + 1 sdk doctest + 2 nu_exec doctests; zero failures)cargo fmt --all -- --check— fail (122 diffs across 18 files; verified to be a pre-existing baseline ondevelopmentHEADa5d0f5c, not introduced by this patch)cargo clippy --workspace --all-targets -- -D warnings— fail (32 errors on baselineHEAD; same set with our patch applied, no new lints from the changes here)The fmt and clippy failures are pre-existing on the
developmentbranch and outside the scope of issue #17. They should be tracked in a separate ticket.Acceptance Criteria
README.mdno longer contains socket-directory references tohero_codescalers_server/,hero_codescalers_server1/,hero_codescalers_serverN/, or the env-var defaulthero_codescalers_server.crates/hero_codescalers/src/main.rswarns (non-fatally) when a stale exportedHERO_CODESCALERS_SOCK_NAMEwould have differed from the instance-derived directory.cargo check -p hero_codescalersclean; workspace tests pass.hero_codescalers --start, both~/hero/var/sockets/hero_codescalers/rpc.sockand~/hero/var/sockets/hero_codescalers/ui.sockexist (manual verification — Step 3).hero_codescalersas Healthy / connected / clickable; the UI link no longer reports "starting" (manual verification — Step 3).Manual Verification Required (Step 3 of spec)
This patch does not run the full stack. Before closing the issue, please:
make installandhero_codescalers --startfrom a freshly-checked-out copy of this branch.~/hero/var/sockets/hero_codescalers/rpc.sockand~/hero/var/sockets/hero_codescalers/ui.sockboth exist in the same directory.curl --unix-socket ~/hero/var/sockets/hero_codescalers/rpc.sock http://localhost/.well-known/heroservice.jsonreturns"name":"hero_codescalers","socket":"rpc".ui.sock, returning"socket":"ui".http://127.0.0.1:9988/) and confirmhero_codescalersappears Healthy and the UI link works.~/hero/var/sockets/hero_codescalers_server*/directory on disk, runrm -rf ~/hero/var/sockets/hero_codescalers_server*once.Branch
Changes committed locally on
development_fix_codescalers_service_name(no push, no PR — opening a PR is out of scope for this skill).