hero_osis crash-loops under hero_proc — health-check probes a socket the binary never creates #178
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_skills#178
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Symptom
service_osis startreports success.proc service status hero_osisshowsstate: running, restarts: N>0, growing.pgrep -au $USER -af hero_osisshows onlyhero_osis_ui, neverhero_osis(the backend)./hero_osis_base/rpc(and every other domain) withSocket 'rpc.sock' not found for 'hero_osis_base'.Failed to fetch contexts after 5 retries.received SIGTERM/starting — socket: ...cycles.Cause
The unified
hero_osisbackend binds per-domain sockets:It never creates a singular
hero_osis/rpc.sock. Butservice_osis.nu's action spec references that non-existent path:With
svc_server_health_policy(start_period_ms: 3000, timeout_ms: 5000, retries: 3), hero_proc:Confirmation
Running
~/hero/bin/hero_osisstandalone (without hero_proc) stays up cleanly and binds all 16 per-domain sockets immediately.restarts: 0. So the binary is fine; the action spec is the bug.Fix
Swap both fields to
hero_osis_base/rpc.sock. Root domain, registered first by the unified server, registration is atomic across all 16 domains, so a healthybasesocket is a sufficient liveness signal.hero_osis_ui's path (hero_osis/ui.sock) is correct becausehero_osis_uiactually creates that singular socket — leave it untouched.PR: #177
fixed