hero_shrimp binds sockets under $XDG_RUNTIME_DIR, diverging from the Hero socket-dir convention (breaks lab smoke tests) #87

Closed
opened 2026-06-04 10:54:14 +00:00 by rawan · 3 comments
Member

Summary

hero_shrimp resolves its Unix-socket directory differently from every other Hero service. It inserts an XDG_RUNTIME_DIR step above the Hero-standard $HOME/hero/var/sockets default. Because XDG_RUNTIME_DIR=/run/user/<uid> is always set in a login/desktop session, the daemon binds its sockets at /run/user/<uid>/hero_shrimp/{rpc,web}.sock while lab (and the rest of the ecosystem) expect them under $PATH_SOCKET = $HOME/hero/var/sockets/hero_shrimp/.

Result: lab build registers the services with hero_proc successfully, but every socket-probe smoke test fails because the sockets are not where lab looks:

FAILED: hero_shrimp_server:debug:x86_64-unknown-linux-musl
'hero_shrimp_server' registered with hero_proc but 4 smoke test(s) failed.
FAILED: hero_shrimp_web:debug:x86_64-unknown-linux-musl
'hero_shrimp_web' registered with hero_proc but 2 smoke test(s) failed.

Evidence

Resolution order in crates/hero_shrimp_types/src/paths.rs (default_socket_dir):

  1. HERO_SHRIMP_SOCKET_DIR
  2. HERO_SOCKET_DIR
  3. XDG_RUNTIME_DIR ← outranks the Hero-standard default
  4. $HOME/hero/var/sockets
  5. /tmp

Live verification on a lab-launched stack:

  • Processes are running and registered with hero_proc.
  • ss -xl shows them bound at /run/user/1000/hero_shrimp/rpc.sock and .../web.sock.
  • $HOME/hero/var/sockets/hero_shrimp/ is empty -> all smoke probes fail.
  • Decisive env on the process: XDG_RUNTIME_DIR=/run/user/1000, HERO_SOCKET_DIR unset.

Sibling services do not have the XDG step — they use HERO_SOCKET_DIR -> $HOME/hero/var/sockets -> /tmp:

  • hero_aibroker (crates/hero_aibroker_server/src/socket.rs)
  • hero_proc (crates/hero_proc_sdk/src/socket.rs)
  • hero_router (crates/hero_router/src/config.rs)

hero_shrimp is the only outlier.

Proposed fix (primary)

Remove the XDG_RUNTIME_DIR branch from default_socket_dir() (or demote it below the $HOME/hero/var/sockets default) so hero_shrimp matches the other Hero services. Add a regression test asserting that, with HERO_SOCKET_DIR unset and XDG_RUNTIME_DIR set, the resolved dir is $HOME/hero/var/sockets/hero_shrimp.

Secondary / ecosystem gap

lab user init emits PATH_SOCKET but never HERO_SOCKET_DIR. Services key off HERO_SOCKET_DIR, so today they only agree with lab by coincidence — every service's hardcoded default equals $PATH_SOCKET. If anyone sets a non-default PATH_SOCKET, every service silently ignores it. lab should also export HERO_SOCKET_DIR="$PATH_SOCKET" so the two are actually wired together. (Tracking here for visibility; may warrant a separate issue against the lab repo.)

Workaround

Export HERO_SOCKET_DIR="$PATH_SOCKET" in the environment that launches the stack (must be present in the hero_proc daemon's env, since services inherit it), then relaunch. This is operational-only and does not fix the convention divergence.

Repro

  1. Ensure XDG_RUNTIME_DIR is set (default on any desktop session) and HERO_SOCKET_DIR is unset.
  2. lab build --restart in hero_shrimp.
  3. Observe smoke-test failures; ss -xl | grep shrimp shows sockets under /run/user/<uid>/ instead of $HOME/hero/var/sockets/.
## Summary `hero_shrimp` resolves its Unix-socket directory differently from every other Hero service. It inserts an `XDG_RUNTIME_DIR` step **above** the Hero-standard `$HOME/hero/var/sockets` default. Because `XDG_RUNTIME_DIR=/run/user/<uid>` is always set in a login/desktop session, the daemon binds its sockets at `/run/user/<uid>/hero_shrimp/{rpc,web}.sock` while `lab` (and the rest of the ecosystem) expect them under `$PATH_SOCKET = $HOME/hero/var/sockets/hero_shrimp/`. Result: `lab build` registers the services with `hero_proc` successfully, but every socket-probe smoke test fails because the sockets are not where lab looks: ``` FAILED: hero_shrimp_server:debug:x86_64-unknown-linux-musl 'hero_shrimp_server' registered with hero_proc but 4 smoke test(s) failed. FAILED: hero_shrimp_web:debug:x86_64-unknown-linux-musl 'hero_shrimp_web' registered with hero_proc but 2 smoke test(s) failed. ``` ## Evidence Resolution order in `crates/hero_shrimp_types/src/paths.rs` (`default_socket_dir`): 1. `HERO_SHRIMP_SOCKET_DIR` 2. `HERO_SOCKET_DIR` 3. **`XDG_RUNTIME_DIR`** ← outranks the Hero-standard default 4. `$HOME/hero/var/sockets` 5. `/tmp` Live verification on a `lab`-launched stack: - Processes are running and registered with `hero_proc`. - `ss -xl` shows them bound at `/run/user/1000/hero_shrimp/rpc.sock` and `.../web.sock`. - `$HOME/hero/var/sockets/hero_shrimp/` is empty -> all smoke probes fail. - Decisive env on the process: `XDG_RUNTIME_DIR=/run/user/1000`, `HERO_SOCKET_DIR` unset. Sibling services do **not** have the XDG step — they use `HERO_SOCKET_DIR` -> `$HOME/hero/var/sockets` -> `/tmp`: - `hero_aibroker` (`crates/hero_aibroker_server/src/socket.rs`) - `hero_proc` (`crates/hero_proc_sdk/src/socket.rs`) - `hero_router` (`crates/hero_router/src/config.rs`) `hero_shrimp` is the only outlier. ## Proposed fix (primary) Remove the `XDG_RUNTIME_DIR` branch from `default_socket_dir()` (or demote it **below** the `$HOME/hero/var/sockets` default) so `hero_shrimp` matches the other Hero services. Add a regression test asserting that, with `HERO_SOCKET_DIR` unset and `XDG_RUNTIME_DIR` set, the resolved dir is `$HOME/hero/var/sockets/hero_shrimp`. ## Secondary / ecosystem gap `lab user init` emits `PATH_SOCKET` but never `HERO_SOCKET_DIR`. Services key off `HERO_SOCKET_DIR`, so today they only agree with lab **by coincidence** — every service's hardcoded default equals `$PATH_SOCKET`. If anyone sets a non-default `PATH_SOCKET`, every service silently ignores it. `lab` should also export `HERO_SOCKET_DIR="$PATH_SOCKET"` so the two are actually wired together. (Tracking here for visibility; may warrant a separate issue against the `lab` repo.) ## Workaround Export `HERO_SOCKET_DIR="$PATH_SOCKET"` in the environment that launches the stack (must be present in the `hero_proc` daemon's env, since services inherit it), then relaunch. This is operational-only and does not fix the convention divergence. ## Repro 1. Ensure `XDG_RUNTIME_DIR` is set (default on any desktop session) and `HERO_SOCKET_DIR` is unset. 2. `lab build --restart` in `hero_shrimp`. 3. Observe smoke-test failures; `ss -xl | grep shrimp` shows sockets under `/run/user/<uid>/` instead of `$HOME/hero/var/sockets/`.
Author
Member

Implementation Spec for Issue #87

Objective

Make hero_shrimp resolve its Unix-socket directory using the same chain as every other Hero service by removing the XDG_RUNTIME_DIR branch from default_socket_dir(). After the fix, with HERO_SHRIMP_SOCKET_DIR and HERO_SOCKET_DIR unset and XDG_RUNTIME_DIR set, the resolved directory must be $HOME/hero/var/sockets/hero_shrimp (not /run/user/<uid>/hero_shrimp), restoring lab smoke-test compatibility.

Requirements

  • Remove the XDG_RUNTIME_DIR resolution branch from default_socket_dir().
  • New resolution order: HERO_SHRIMP_SOCKET_DIR -> HERO_SOCKET_DIR -> $HOME/hero/var/sockets/hero_shrimp -> /tmp/hero_shrimp.
  • Update the module-level doc comment to drop the XDG step and renumber.
  • Add/convert a regression test asserting that with HERO_SHRIMP_SOCKET_DIR and HERO_SOCKET_DIR unset but XDG_RUNTIME_DIR set, the resolved dir is $HOME/hero/var/sockets/hero_shrimp.
  • Keep the change minimal and surgical; no behavioral changes to other functions (socket_path, default_rpc_socket, etc.).

Files to Modify/Create

  • Modify: crates/hero_shrimp_types/src/paths.rs (only file touched — both source and tests live here)

Implementation Plan

Step 1: Remove the XDG branch from default_socket_dir()

Files: crates/hero_shrimp_types/src/paths.rs

  • Delete the // XDG runtime dir comment plus the if let Ok(xdg) = std::env::var("XDG_RUNTIME_DIR") ... block that returns PathBuf::from(xdg).join(COMPONENT).
  • Leave the HERO_SOCKET_DIR block immediately followed by the HOME Hero-standard block.

Dependencies: none

Step 2: Update the module doc comment to reflect the new chain

Files: crates/hero_shrimp_types/src/paths.rs

  • Remove the //! 3. $XDG_RUNTIME_DIR/hero_shrimp doc line and renumber so $HOME/hero/var/sockets/hero_shrimp becomes step 3 and /tmp/hero_shrimp becomes step 4.

Dependencies: Step 1

Step 3: Convert the obsolete XDG test into the regression test

Files: crates/hero_shrimp_types/src/paths.rs

  • The existing test xdg_runtime_dir_used_when_socket_dir_unset currently asserts the resolved path is under /run/user/<uid>/; this assertion now fails.
  • Rename it (e.g. xdg_runtime_dir_ignored_falls_back_to_hero_standard) and change the expectation to assert default_socket_dir() returns $HOME/hero/var/sockets/hero_shrimp while XDG_RUNTIME_DIR is set and HERO_SOCKET_DIR unset.
  • Reuse the existing with_env helper and ENV_LOCK mutex pattern exactly (no new crate, no #[serial] attribute).
  • Keep all other tests unchanged.

Dependencies: Step 1

Acceptance Criteria

  • default_socket_dir() contains no reference to XDG_RUNTIME_DIR.
  • With HERO_SHRIMP_SOCKET_DIR and HERO_SOCKET_DIR unset, XDG_RUNTIME_DIR=/run/user/1000, HOME=/home/op: default_socket_dir() == /home/op/hero/var/sockets/hero_shrimp and default_rpc_socket() == /home/op/hero/var/sockets/hero_shrimp/rpc.sock.
  • Resolution order matches siblings: HERO_SHRIMP_SOCKET_DIR -> HERO_SOCKET_DIR -> $HOME/hero/var/sockets -> /tmp.
  • cargo test -p hero_shrimp_types passes (no remaining test asserts an XDG-based path).
  • cargo build / cargo clippy clean; module doc comment matches actual behavior.
  • No other files changed; no new dependencies added.

Notes

  • Env-var tests are process-global and not thread-safe. The file already handles this with a module-static ENV_LOCK: Mutex<()> and a with_env helper that snapshots and restores prior values. Continue using that pattern; do not introduce the serial_test crate. std::env::set_var/remove_var are wrapped in unsafe blocks (Rust 2024 edition), already present in the helper.
  • The fix is confined to one crate: XDG_RUNTIME_DIR is referenced only in crates/hero_shrimp_types/src/paths.rs. No callers elsewhere depend on the XDG behavior.
  • Sibling convention confirmed: hero_proc_sdk/hero_aibroker resolve via HERO_SOCKET_DIR -> $HOME/hero/var/sockets -> /tmp with no XDG step. This change aligns hero_shrimp with that convention (it does not adopt a shared helper, which would be a larger refactor out of scope here).
## Implementation Spec for Issue #87 ### Objective Make `hero_shrimp` resolve its Unix-socket directory using the same chain as every other Hero service by removing the `XDG_RUNTIME_DIR` branch from `default_socket_dir()`. After the fix, with `HERO_SHRIMP_SOCKET_DIR` and `HERO_SOCKET_DIR` unset and `XDG_RUNTIME_DIR` set, the resolved directory must be `$HOME/hero/var/sockets/hero_shrimp` (not `/run/user/<uid>/hero_shrimp`), restoring `lab` smoke-test compatibility. ### Requirements - Remove the `XDG_RUNTIME_DIR` resolution branch from `default_socket_dir()`. - New resolution order: `HERO_SHRIMP_SOCKET_DIR` -> `HERO_SOCKET_DIR` -> `$HOME/hero/var/sockets/hero_shrimp` -> `/tmp/hero_shrimp`. - Update the module-level doc comment to drop the XDG step and renumber. - Add/convert a regression test asserting that with `HERO_SHRIMP_SOCKET_DIR` and `HERO_SOCKET_DIR` unset but `XDG_RUNTIME_DIR` set, the resolved dir is `$HOME/hero/var/sockets/hero_shrimp`. - Keep the change minimal and surgical; no behavioral changes to other functions (`socket_path`, `default_rpc_socket`, etc.). ### Files to Modify/Create - Modify: `crates/hero_shrimp_types/src/paths.rs` (only file touched — both source and tests live here) ### Implementation Plan #### Step 1: Remove the XDG branch from `default_socket_dir()` Files: `crates/hero_shrimp_types/src/paths.rs` - Delete the `// XDG runtime dir` comment plus the `if let Ok(xdg) = std::env::var("XDG_RUNTIME_DIR") ...` block that returns `PathBuf::from(xdg).join(COMPONENT)`. - Leave the `HERO_SOCKET_DIR` block immediately followed by the `HOME` Hero-standard block. Dependencies: none #### Step 2: Update the module doc comment to reflect the new chain Files: `crates/hero_shrimp_types/src/paths.rs` - Remove the `//! 3. $XDG_RUNTIME_DIR/hero_shrimp` doc line and renumber so `$HOME/hero/var/sockets/hero_shrimp` becomes step 3 and `/tmp/hero_shrimp` becomes step 4. Dependencies: Step 1 #### Step 3: Convert the obsolete XDG test into the regression test Files: `crates/hero_shrimp_types/src/paths.rs` - The existing test `xdg_runtime_dir_used_when_socket_dir_unset` currently asserts the resolved path is under `/run/user/<uid>/`; this assertion now fails. - Rename it (e.g. `xdg_runtime_dir_ignored_falls_back_to_hero_standard`) and change the expectation to assert `default_socket_dir()` returns `$HOME/hero/var/sockets/hero_shrimp` while `XDG_RUNTIME_DIR` is set and `HERO_SOCKET_DIR` unset. - Reuse the existing `with_env` helper and `ENV_LOCK` mutex pattern exactly (no new crate, no `#[serial]` attribute). - Keep all other tests unchanged. Dependencies: Step 1 ### Acceptance Criteria - [ ] `default_socket_dir()` contains no reference to `XDG_RUNTIME_DIR`. - [ ] With `HERO_SHRIMP_SOCKET_DIR` and `HERO_SOCKET_DIR` unset, `XDG_RUNTIME_DIR=/run/user/1000`, `HOME=/home/op`: `default_socket_dir()` == `/home/op/hero/var/sockets/hero_shrimp` and `default_rpc_socket()` == `/home/op/hero/var/sockets/hero_shrimp/rpc.sock`. - [ ] Resolution order matches siblings: `HERO_SHRIMP_SOCKET_DIR` -> `HERO_SOCKET_DIR` -> `$HOME/hero/var/sockets` -> `/tmp`. - [ ] `cargo test -p hero_shrimp_types` passes (no remaining test asserts an XDG-based path). - [ ] `cargo build` / `cargo clippy` clean; module doc comment matches actual behavior. - [ ] No other files changed; no new dependencies added. ### Notes - Env-var tests are process-global and not thread-safe. The file already handles this with a module-static `ENV_LOCK: Mutex<()>` and a `with_env` helper that snapshots and restores prior values. Continue using that pattern; do not introduce the `serial_test` crate. `std::env::set_var`/`remove_var` are wrapped in `unsafe` blocks (Rust 2024 edition), already present in the helper. - The fix is confined to one crate: `XDG_RUNTIME_DIR` is referenced only in `crates/hero_shrimp_types/src/paths.rs`. No callers elsewhere depend on the XDG behavior. - Sibling convention confirmed: `hero_proc_sdk`/`hero_aibroker` resolve via `HERO_SOCKET_DIR -> $HOME/hero/var/sockets -> /tmp` with no XDG step. This change aligns `hero_shrimp` with that convention (it does not adopt a shared helper, which would be a larger refactor out of scope here).
Author
Member

Test Results

cargo test -p hero_shrimp_types

  • Total: 17
  • Passed: 17
  • Failed: 0

All tests pass. cargo build -p hero_shrimp_types compiles clean.

The regression test xdg_runtime_dir_ignored_falls_back_to_hero_standard passes.

Doc-tests: 6 passed, 0 failed.

## Test Results `cargo test -p hero_shrimp_types` - Total: 17 - Passed: 17 - Failed: 0 All tests pass. `cargo build -p hero_shrimp_types` compiles clean. The regression test `xdg_runtime_dir_ignored_falls_back_to_hero_standard` passes. Doc-tests: 6 passed, 0 failed.
Author
Member

Implementation Summary

The socket-directory divergence is fixed. hero_shrimp now resolves its Unix-socket directory using the same chain as every other Hero service.

Changes

File modified: crates/hero_shrimp_types/src/paths.rs

  • Removed the XDG_RUNTIME_DIR branch from default_socket_dir(). The resolution order is now:
    1. $HERO_SHRIMP_SOCKET_DIR/hero_shrimp
    2. $HERO_SOCKET_DIR/hero_shrimp
    3. $HOME/hero/var/sockets/hero_shrimp
    4. /tmp/hero_shrimp
  • Updated the module-level doc comment to drop the XDG step and renumber the chain.
  • Converted the obsolete xdg_runtime_dir_used_when_socket_dir_unset test into a regression test xdg_runtime_dir_ignored_falls_back_to_hero_standard, which asserts that with HERO_SHRIMP_SOCKET_DIR and HERO_SOCKET_DIR unset but XDG_RUNTIME_DIR=/run/user/1000 set and HOME=/home/op, both default_socket_dir() and default_rpc_socket() resolve under $HOME/hero/var/sockets/hero_shrimp.

Test Results

cargo test -p hero_shrimp_types: 17 unit tests + 6 doc-tests, all passing, 0 failures. cargo build -p hero_shrimp_types clean.

Notes

  • Change is confined to a single file; XDG_RUNTIME_DIR is no longer referenced anywhere in default_socket_dir(). No new dependencies added.
  • hero_shrimp now matches the sibling-service convention (HERO_SOCKET_DIR -> $HOME/hero/var/sockets -> /tmp), restoring lab smoke-test compatibility.
  • The secondary ecosystem gap noted in the issue (lab user init not exporting HERO_SOCKET_DIR="$PATH_SOCKET") is out of scope for this repo and is tracked separately against the lab repo.
## Implementation Summary The socket-directory divergence is fixed. `hero_shrimp` now resolves its Unix-socket directory using the same chain as every other Hero service. ### Changes File modified: `crates/hero_shrimp_types/src/paths.rs` - Removed the `XDG_RUNTIME_DIR` branch from `default_socket_dir()`. The resolution order is now: 1. `$HERO_SHRIMP_SOCKET_DIR/hero_shrimp` 2. `$HERO_SOCKET_DIR/hero_shrimp` 3. `$HOME/hero/var/sockets/hero_shrimp` 4. `/tmp/hero_shrimp` - Updated the module-level doc comment to drop the XDG step and renumber the chain. - Converted the obsolete `xdg_runtime_dir_used_when_socket_dir_unset` test into a regression test `xdg_runtime_dir_ignored_falls_back_to_hero_standard`, which asserts that with `HERO_SHRIMP_SOCKET_DIR` and `HERO_SOCKET_DIR` unset but `XDG_RUNTIME_DIR=/run/user/1000` set and `HOME=/home/op`, both `default_socket_dir()` and `default_rpc_socket()` resolve under `$HOME/hero/var/sockets/hero_shrimp`. ### Test Results `cargo test -p hero_shrimp_types`: 17 unit tests + 6 doc-tests, all passing, 0 failures. `cargo build -p hero_shrimp_types` clean. ### Notes - Change is confined to a single file; `XDG_RUNTIME_DIR` is no longer referenced anywhere in `default_socket_dir()`. No new dependencies added. - `hero_shrimp` now matches the sibling-service convention (`HERO_SOCKET_DIR` -> `$HOME/hero/var/sockets` -> `/tmp`), restoring `lab` smoke-test compatibility. - The secondary ecosystem gap noted in the issue (`lab user init` not exporting `HERO_SOCKET_DIR="$PATH_SOCKET"`) is out of scope for this repo and is tracked separately against the `lab` repo.
rawan closed this issue 2026-06-04 14:07:44 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_shrimp#87
No description provided.