feat(collab+livekit): service_livekit module + service_collab dev-auth auto-chain

sameh-farouk commented

2026-04-21 20:03:12 +00:00

Member

Summary

Makes service_collab start produce a working dev stack out of the box on a multi-user Hero host (provisioned via multi_user_add), by:

Adding service_livekit.nu — new module that wraps hero_livekit_server + hero_livekit_ui as hero_proc-supervised actions, following the same pattern as service_os / service_collab.
Updating service_collab.nu — adds --auth-mode, --seed-dev-users, --livekit-* flags, detects dev-box context, and defaults to --auth-mode=dev --seed-dev-users plus auto-chaining service_livekit when credentials are missing.
Updating hero_loader.nu — reads ~/hero/var/hero_livekit/runtime.json on shell startup and populates $env.LIVEKIT_{URL,API_KEY,API_SECRET}, so every nu shell sees consistent LiveKit credentials.

Problem this solves

On a standard multi-user dev host today:

service_collab start with no flags defaults to auth_mode=proxy (Mahmoud's documented safe-prod default).
Proxy mode expects hero_proxy in front, injecting X-User-* identity headers after OAuth/bearer auth.
The dev setup uses hero_router, not hero_proxy — grep confirms no auth-header injection anywhere in hero_skills.
Result: collab starts, then rejects every request because the auth headers never arrive. Broken end-to-end for every dev.
Separately, running LiveKit previously meant cloning hero_collab/deploy/, editing .env, docker-compose up, mirroring creds into secrets.toml — all manual, easy to skip or misconfigure.

Shape

`service_livekit.nu` (new, +253)

Standard install/start/stop/status surface mirroring service_os.nu. Soft preflight that warns (does not hard-fail) when Redis at 127.0.0.1:6379 is unreachable, since hero_livekit needs it for room/participant state.

`service_collab.nu` (+156 −21)

New flags on start:

--auth-mode <dev|proxy>      Default: auto-detect (dev on a provisioned dev box, proxy elsewhere)
--no-seed                    Skip --seed-dev-users even in dev mode
--no-livekit                 Disable LiveKit entirely
--livekit-url <url>          External LiveKit; skips auto-chain. Fallback: $env.LIVEKIT_URL
--livekit-api-key <k>        Fallback: $env.LIVEKIT_API_KEY
--livekit-api-secret <s>     Fallback: $env.LIVEKIT_API_SECRET

Dev-box detection (svx_is_dev_box): presence of ~/hero/cfg/hero_cfg.toml or /etc/hero-users/<user>.env. These are the same signals hero_loader.nu already uses to populate MYCELIUM_IP, so the convention is consistent.

Credential-resolution order for LiveKit (per field): explicit flag → $env.LIVEKIT_* → auto-start via service_livekit and read runtime.json.

`hero_loader.nu` (+29)

Additional try-block after the mycelium env block. Reads runtime.json if present, sets $env.LIVEKIT_* only when not already set (so secrets source / explicit secrets.toml overrides keep winning).

Security considerations

LiveKit credentials do not land on argv. They go into the hero_proc action spec's env: map only, which reaches hero_collab_server via /proc/<pid>/environ — owner-only readable (0400). Only --auth-mode and --seed-dev-users are on argv, and neither is sensitive.

This is deliberate: argv leaks via /proc/<pid>/cmdline are world-readable, and we just closed exactly that class of bug for service_proc start in PR #114. Not repeating the mistake.

Behavior matrix

Environment	`service_collab start` (no flags)	Notes
Multi-user dev box, hero_livekit installed	Auto-chains livekit, reads creds, dev auth + seed	Intended happy path
Multi-user dev box, hero_livekit NOT installed	`service_livekit install` auto-runs (cargo build), then starts	First-build cost, then cached
Multi-user dev box, Redis not running	Warn, continue	Runtime crashes until Redis up; diagnosable
Prod host (no `hero_cfg.toml`, no `/etc/hero-users/*`)	`--auth-mode=proxy`, LiveKit disabled	Original behaviour preserved
Anywhere with explicit `--auth-mode proxy` or `--no-livekit`	Those flags win	Opt-out path

Testing

Static syntax check on all four files via nu -c "use <file> *" — clean.

Runtime verification on a fresh dev box would be:

service_proc start — verify hero_proc up.
service_collab start — verify it auto-installs hero_livekit, starts both, prints auth mode : dev, livekit : ws://....
Open collab UI via hero_router; confirm requests succeed (no proxy-auth 401s).
ps aux | grep -E 'hero_collab|hero_livekit' — verify LiveKit creds never appear on any argv line.
service_collab start --auth-mode proxy --no-livekit on same box — verify it skips livekit auto-chain and passes proxy mode to collab (exercises opt-out path).
service_collab stop && service_livekit stop — verify clean teardown.

Notes for review

@Mahmoud-Emad — this changes the default for service_collab start on dev boxes. Wanted to flag it directly since the previous behavior was a documented design decision (your comment in the original module). Rationale:

auth_mode=proxy without hero_proxy in the path is functionally broken; every dev has been editing action specs or not using collab.
Dev-box detection is gated on the multi_user_add footprint, so prod hosts that have never seen that provisioning flow keep their old default.
Anyone who explicitly wants proxy mode on a dev box can pass --auth-mode proxy.

Happy to adjust the detection heuristic or the defaults if you'd prefer a different policy — this PR is easy to amend.

Commits

Three logical units:

5f1c744  feat(service_livekit): new module wrapping hero_livekit
1f65c10  feat(service_collab): dev-auth + livekit auto-chain, default on dev boxes
2c6a8ac  feat(loader): populate $env.LIVEKIT_* from hero_livekit runtime.json

Squash at merge if preferred.

Follow-ups (out of scope)

hero_collab source-side: verify hero_collab_server actually reads LiveKit creds from env (expected via clap env = "LIVEKIT_..."). If not, a small clap config change is needed there for this PR to reach its full effect.
Retroactive migration of existing users to hero_cfg.toml (covered in earlier convo; separate PR).

## Summary Makes `service_collab start` produce a working dev stack out of the box on a multi-user Hero host (provisioned via `multi_user_add`), by: 1. **Adding `service_livekit.nu`** — new module that wraps `hero_livekit_server` + `hero_livekit_ui` as hero_proc-supervised actions, following the same pattern as `service_os` / `service_collab`. 2. **Updating `service_collab.nu`** — adds `--auth-mode`, `--seed-dev-users`, `--livekit-*` flags, detects dev-box context, and defaults to `--auth-mode=dev --seed-dev-users` plus auto-chaining `service_livekit` when credentials are missing. 3. **Updating `hero_loader.nu`** — reads `~/hero/var/hero_livekit/runtime.json` on shell startup and populates `$env.LIVEKIT_{URL,API_KEY,API_SECRET}`, so every nu shell sees consistent LiveKit credentials. ## Problem this solves On a standard multi-user dev host today: - `service_collab start` with no flags defaults to `auth_mode=proxy` (Mahmoud's documented safe-prod default). - Proxy mode expects `hero_proxy` in front, injecting `X-User-*` identity headers after OAuth/bearer auth. - The dev setup uses `hero_router`, not `hero_proxy` — `grep` confirms no auth-header injection anywhere in hero_skills. - Result: collab starts, then rejects every request because the auth headers never arrive. Broken end-to-end for every dev. - Separately, running LiveKit previously meant cloning `hero_collab/deploy/`, editing `.env`, `docker-compose up`, mirroring creds into `secrets.toml` — all manual, easy to skip or misconfigure. ## Shape ### `service_livekit.nu` (new, +253) Standard install/start/stop/status surface mirroring `service_os.nu`. Soft preflight that warns (does not hard-fail) when Redis at `127.0.0.1:6379` is unreachable, since hero_livekit needs it for room/participant state. ### `service_collab.nu` (+156 −21) New flags on `start`: ``` --auth-mode <dev|proxy> Default: auto-detect (dev on a provisioned dev box, proxy elsewhere) --no-seed Skip --seed-dev-users even in dev mode --no-livekit Disable LiveKit entirely --livekit-url <url> External LiveKit; skips auto-chain. Fallback: $env.LIVEKIT_URL --livekit-api-key <k> Fallback: $env.LIVEKIT_API_KEY --livekit-api-secret <s> Fallback: $env.LIVEKIT_API_SECRET ``` Dev-box detection (`svx_is_dev_box`): presence of `~/hero/cfg/hero_cfg.toml` or `/etc/hero-users/<user>.env`. These are the same signals `hero_loader.nu` already uses to populate `MYCELIUM_IP`, so the convention is consistent. Credential-resolution order for LiveKit (per field): explicit flag → `$env.LIVEKIT_*` → auto-start via `service_livekit` and read `runtime.json`. ### `hero_loader.nu` (+29) Additional try-block after the mycelium env block. Reads `runtime.json` if present, sets `$env.LIVEKIT_*` only when not already set (so `secrets source` / explicit secrets.toml overrides keep winning). ## Security considerations **LiveKit credentials do not land on argv.** They go into the hero_proc action spec's `env:` map only, which reaches `hero_collab_server` via `/proc/<pid>/environ` — owner-only readable (`0400`). Only `--auth-mode` and `--seed-dev-users` are on argv, and neither is sensitive. This is deliberate: argv leaks via `/proc/<pid>/cmdline` are world-readable, and we just closed exactly that class of bug for `service_proc start` in PR #114. Not repeating the mistake. ## Behavior matrix | Environment | `service_collab start` (no flags) | Notes | |---|---|---| | Multi-user dev box, hero_livekit installed | Auto-chains livekit, reads creds, dev auth + seed | Intended happy path | | Multi-user dev box, hero_livekit NOT installed | `service_livekit install` auto-runs (cargo build), then starts | First-build cost, then cached | | Multi-user dev box, Redis not running | Warn, continue | Runtime crashes until Redis up; diagnosable | | Prod host (no `hero_cfg.toml`, no `/etc/hero-users/*`) | `--auth-mode=proxy`, LiveKit disabled | Original behaviour preserved | | Anywhere with explicit `--auth-mode proxy` or `--no-livekit` | Those flags win | Opt-out path | ## Testing Static syntax check on all four files via `nu -c "use <file> *"` — clean. Runtime verification on a fresh dev box would be: 1. `service_proc start` — verify hero_proc up. 2. `service_collab start` — verify it auto-installs `hero_livekit`, starts both, prints `auth mode : dev`, `livekit : ws://...`. 3. Open collab UI via hero_router; confirm requests succeed (no proxy-auth 401s). 4. `ps aux | grep -E 'hero_collab|hero_livekit'` — verify LiveKit creds never appear on any argv line. 5. `service_collab start --auth-mode proxy --no-livekit` on same box — verify it skips livekit auto-chain and passes proxy mode to collab (exercises opt-out path). 6. `service_collab stop && service_livekit stop` — verify clean teardown. ## Notes for review @Mahmoud-Emad — this changes the default for `service_collab start` on dev boxes. Wanted to flag it directly since the previous behavior was a documented design decision (your comment in the original module). Rationale: - `auth_mode=proxy` without `hero_proxy` in the path is functionally broken; every dev has been editing action specs or not using collab. - Dev-box detection is gated on the `multi_user_add` footprint, so prod hosts that have never seen that provisioning flow keep their old default. - Anyone who explicitly wants proxy mode on a dev box can pass `--auth-mode proxy`. Happy to adjust the detection heuristic or the defaults if you'd prefer a different policy — this PR is easy to amend. ## Commits Three logical units: ``` 5f1c744 feat(service_livekit): new module wrapping hero_livekit 1f65c10 feat(service_collab): dev-auth + livekit auto-chain, default on dev boxes 2c6a8ac feat(loader): populate $env.LIVEKIT_* from hero_livekit runtime.json ``` Squash at merge if preferred. ## Follow-ups (out of scope) - hero_collab source-side: verify `hero_collab_server` actually reads LiveKit creds from env (expected via clap `env = "LIVEKIT_..."`). If not, a small clap config change is needed there for this PR to reach its full effect. - Retroactive migration of existing users to hero_cfg.toml (covered in earlier convo; separate PR).

sameh-farouk added 3 commits

2026-04-21 20:03:12 +00:00

feat(service_livekit): new module wrapping hero_livekit 5f1c744e4d

Adds `service_livekit install|start|stop|status` following the pattern
of service_collab / service_os. Registers hero_livekit_server and
hero_livekit_ui as hero_proc actions on

  ~/hero/var/sockets/hero_livekit/{rpc,ui}.sock

Soft preflight warns when Redis is not reachable on 127.0.0.1:6379,
since hero_livekit uses it for room/participant state and crashes on
first request without it. Warn-only so operators running Redis on a
non-default setup are not blocked.

First `start` generates ~/hero/var/hero_livekit/runtime.json with a
fresh api_key/api_secret pair. hero_loader.nu picks those up into
$env.LIVEKIT_* on the next nu shell (follow-up commit).

Also registers the module in services/mod.nu so it is importable via
the standard `use services *` loader path.

feat(service_collab): dev-auth + livekit auto-chain, default on dev boxes 1f65c10b5a

service_collab start previously passed no args to hero_collab_server,
which selected production defaults (auth_mode=proxy, LiveKit off).
On a typical multi-user dev box provisioned via multi_user_add, there
is no hero_proxy in the path to authenticate users, so proxy mode was
effectively broken end-to-end: collab started but rejected requests.

Detect the dev-box context (presence of ~/hero/cfg/hero_cfg.toml or
/etc/hero-users/<user>.env) and default to --auth-mode=dev +
--seed-dev-users on dev boxes. Production hosts (neither file) keep
the previous --auth-mode=proxy default.

New flags on `start`, all opt-out:

  --auth-mode <dev|proxy>      Override the auto-detected default.
  --no-seed                    Skip --seed-dev-users even in dev mode.
  --no-livekit                 Disable livekit entirely.
  --livekit-url <url>          Use external livekit; skips auto-chain.
  --livekit-api-key <k>        Falls back to $env.LIVEKIT_API_KEY.
  --livekit-api-secret <s>     Falls back to $env.LIVEKIT_API_SECRET.

When livekit is enabled and credentials are missing from flags + env,
and we are on a dev box, auto-chain service_livekit start and read
~/hero/var/hero_livekit/runtime.json for the freshly generated creds.

Security: LiveKit credentials propagate via the action spec's `env:`
map (readable only by the process owner via /proc/<pid>/environ), not
via argv. Only --auth-mode and --seed-dev-users land on the process
argv, and those are not sensitive.

feat(loader): populate $env.LIVEKIT_* from hero_livekit runtime.json 2c6a8ac518

hero_livekit_server writes its generated API key/secret pair to
~/hero/var/hero_livekit/runtime.json on first start. Read that file
in hero_loader.nu and populate $env.LIVEKIT_URL / LIVEKIT_API_KEY /
LIVEKIT_API_SECRET so downstream consumers (service_collab in dev
auto-chain mode, developer shell invocations, etc.) see the creds
without hand-copying into secrets.toml.

Only populates fields that are not already set, so an explicit
secrets.toml override via `secrets source` still wins over the
auto-read. Silent no-op when the runtime file does not exist
(hero_livekit not installed / not started).

Mirrors the same pattern as the mycelium env block above it.

sameh-farouk added 1 commit

2026-04-21 20:25:22 +00:00

fix(service_collab): flatten multi-line and expression de135ef8bf

Nushell parses a line starting with `and` as a command invocation,
not a continuation of the previous boolean expression. The syntax

    let x = (not $foo)
        and ($bar | is-not-empty)

fails with `Command 'and' not found` at runtime.

Fold the livekit_enabled check into an if/else that short-circuits
the `$no_livekit` case and keeps the conjunction on a single line.

sameh-farouk added 1 commit

2026-04-21 20:30:57 +00:00

fix(service_livekit): call LiveKitService.configure after start 69adf790ac

hero_livekit_server does NOT auto-generate runtime.json on startup —
it waits for an explicit LiveKitService.configure RPC call before
writing the file. Without this, downstream consumers see no
credentials: hero_loader.nu finds no runtime.json to read, and
service_collab's dev auto-chain can't resolve livekit creds and
falls back to starting collab without livekit.

Add svx_configure_livekit helper that:

  - Polls the rpc.sock for up to 3s in case the supervisor is still
    binding (defensive; the hero_proc health-check above should
    already block on the socket).
  - POSTs LiveKitService.configure via curl --unix-socket with
    node_ip set from $env.MYCELIUM_IP when available, otherwise
    letting hero_livekit default to 127.0.0.1.
  - Verifies runtime.json appears before returning OK.
  - Idempotent — calling again with the same params after
    runtime.json exists is a no-op from the supervisor's perspective.

Wire it into `start` right after `proc service start` completes.

sameh-farouk added 1 commit

2026-04-21 20:43:16 +00:00

fix(service_livekit): pipe configure body via stdin, not argv 677e9b3f67

Passing the JSON payload through `-d $body` triggered a nushell
external-command word-split on the user's dev box that resulted in
curl receiving a mangled URL and erroring:

    curl: (6) Could not resolve host: exit

Feed the body via `$body | ^curl ... --data-binary @-` instead.
Side benefit: the JSON payload (which may contain future sensitive
fields) stays off /proc/<pid>/cmdline.

sameh-farouk added 1 commit

2026-04-21 20:53:06 +00:00

fix(service_livekit): always pass node_ip; post body via tmpfile f770b7528b

Two issues surfaced when dogfooding on a multi-user dev box:

1. LiveKitService.configure rejects empty params with
   "Missing required parameter: node_ip". The previous code sent
   `{}` when $env.MYCELIUM_IP was unset, which fails on any host
   without a mycelium bridge. Always pass node_ip: use
   $env.MYCELIUM_IP when present, fall back to 127.0.0.1.

2. The `$body | ^curl ... --data-binary @-` stdin-pipe form still
   triggered a nushell external-command word-split on the dev box,
   producing "curl: (6) Could not resolve host: exit". Nushell
   appears to be splitting the command in an unexpected way when
   external args intermix pipes and strings.

   Switch to writing the JSON payload to a mktemp file and passing
   it via `--data-binary @file`. That's unambiguous curl syntax,
   survives whatever nushell is doing to the invocation, and keeps
   the payload off /proc/<pid>/cmdline.

Cleans up the tmpfile after the call.

sameh-farouk added 1 commit

2026-04-21 20:58:41 +00:00

fix(service_livekit): send full configure params; shell out via bash -c 8604ccdb03

Two dev-box dogfood findings:

1. LiveKitService.configure errors with "Missing required parameter:
   api_key" when that field is omitted, despite the configuration docs
   describing it as having a default. Send the full set of fields
   (node_ip, api_key, domain, livekit_port, backend_port,
   redis_address) with empty-string / zero values where we want the
   supervisor's defaults — empty api_key specifically triggers the
   first-run auto-generation of a random api_secret.

2. Running this exact curl invocation interactively in nu succeeds,
   but the same invocation inside a `def` body (wrapped in
   `do { ... } | complete`) consistently fails with
   "curl: (6) Could not resolve host: exit" on the dev box. Nushell's
   external-command argv handling appears to mangle the call in
   module scope in ways I couldn't reproduce in REPL.

   Work around by passing the whole curl command as a single string
   to `bash -c`. Bash owns the parsing; nu just invokes bash with one
   argument. Reliable.

   The tmpfile approach still keeps the JSON payload off argv.

sameh-farouk added 1 commit

2026-04-21 21:04:30 +00:00

fix(service_livekit): include api_secret in configure payload 7c774b1615

hero_livekit's LiveKitService.configure also requires api_secret as
an explicit field in params, alongside node_ip and api_key. Missing
it results in "Missing required parameter: api_secret" at runtime.

Empty string triggers the supervisor's first-run auto-generation
(per docs: a 64-char hex secret is generated from /dev/urandom when
the stored value is the initial placeholder).

sameh-farouk added 1 commit

2026-04-21 21:11:20 +00:00

chore(service_livekit): print bash -c string for dev-box diagnosis 4f4a962fc2

Temporary — remove once the 'Could not resolve host: exit' mystery
is pinned down on the affected dev host. Prints the exact string
handed to bash so we can confirm the curl invocation is well-formed.

sameh-farouk added 1 commit

2026-04-21 21:16:02 +00:00

fix(livekit): bracket IPv6 hosts in URL; drop configure debug print f393fd7925

Two cleanups after end-to-end success on a multi-user dev box:

1. Bracket IPv6 node_ip in the constructed LiveKit URL. Plain
   `ws://543:66c5:6430:8f31:1::1:7880` is syntactically ambiguous —
   browsers and most HTTP clients cannot distinguish the address
   from the port. Wrap IPv6 hosts in square brackets:
   `ws://[543:66c5:6430:8f31:1::1]:7880`. Applied in both
   service_collab.nu (svx_resolve_livekit_creds) and hero_loader.nu
   (env population block).

2. Remove the temporary `[debug] bash -c:` print from
   svx_configure_livekit — the dev-box flakiness was stale-code
   (nu module cache + unpulled commits), not a real bash-c issue.
   Now that the working path is confirmed, the diagnostic is noise.

sameh-farouk added 1 commit

2026-04-21 23:30:05 +00:00

refactor(service_collab): forward to hero_collab CLI instead of building action spec 3a18c0f527

End-to-end testing on a multi-user Hero dev box showed the direct
action-spec approach leaves hero_collab_ui running in proxy mode
regardless of --auth-mode: ui reads COLLAB_AUTH_MODE from env only
(hero_collab_ui/routes.rs:384), and a sibling hero_proc action spec
does not inherit env from its peer. hero_collab's CLI handles this
at main.rs:250 (.env("COLLAB_AUTH_MODE", &flags.auth_mode) before
spawning ui). Bypassing the CLI loses that bridge.

Hand over all hero_proc registration, argv/env layout, and
--livekit-api-secret-file file-based secret delivery to
`hero_collab --start` (the canonical operator entrypoint, per
Mahmoud's original module comment). Module's role becomes:

  - Preflight (sudo check, hero_proc health, binary installed).
  - Resolve effective flags (dev-box default for --auth-mode, livekit
    creds from flag/env/auto-chain).
  - Materialise the LiveKit API secret into a 0600 file at
    ~/hero/cfg/livekit.secret so it passes via path, not argv or env.
  - Forward the final flag set to `hero_collab --start` / `--stop`.

Net: ~80 fewer lines in service_collab.nu, UI auth now propagates
correctly, no drift risk on future hero_collab CLI changes.

Also adjust hero_loader.nu: stop populating $env.LIVEKIT_API_SECRET
(env inheritance is broad), and instead populate
$env.LIVEKIT_API_SECRET_FILE pointing at ~/hero/cfg/livekit.secret
when the file exists.

sameh-farouk added 1 commit

2026-04-21 23:37:04 +00:00

fix(service_collab): flatten multi-line or in livekit-resolve 7672e17d44

Same nushell gotcha as the earlier multi-line `and` fix: a line
starting with `or` is parsed as a command invocation. Fold the
three-clause need_chain check onto one line.

sameh-farouk added 1 commit

2026-04-21 23:43:38 +00:00

fix(service_collab): escape literal parens in interpolated string 2cc3344220

Nushell's `$"..."` parses unescaped `(...)` as a subexpression, so
`(0600, owner-only)` was tokenised as a call to command `0600,`.
Use `\(` and `\)` to emit literal parens — same pattern already in
use elsewhere in this file (`\(dev-box detected: ...\)`).

sameh-farouk added 1 commit

2026-04-22 00:01:02 +00:00

feat(service_livekit): drive install+configure+start, not just configure 73dffd348c

Previously the module only called LiveKitService.configure (writes
runtime.json), leaving hero_livekit_server as an idle supervisor —
livekit-server and lk-backend binaries were never spawned, nothing
bound on port 7880, and browsers attempting to join huddles hit
ERR_CONNECTION_REFUSED at ws://[node_ip]:7880/rtc/v1.

Replace svx_configure_livekit with svx_bootstrap_livekit which drives
the three supervisor RPCs in order:

  1. LiveKitService.install    — download upstream binaries (idempotent
                                 no-op when already present).
  2. LiveKitService.configure  — write runtime.json (as before).
  3. LiveKitService.start      — spawn livekit-server and lk-backend;
                                 this is the step that opens port 7880.

Extract the single-RPC curl-via-bash-c machinery into svx_lk_rpc so
all three calls reuse the same transport (and the earlier nushell-
external-arg workaround).

Install failure is non-fatal (binaries may already be present from a
previous run). Configure and start failures abort the bootstrap with
a clear message; collab will still come up without livekit (huddle
button grays out).

sameh-farouk added 1 commit

2026-04-22 00:13:35 +00:00

fix(service_collab): always ensure livekit supervisor on dev boxes 9195b09493

After a `service_livekit stop`, runtime.json stays on disk, hero_loader
still populates $env.LIVEKIT_URL/KEY/SECRET at shell startup, and the
next `service_collab start` saw "creds present" → skipped the chain →
handed hero_collab flags pointing at a non-listening livekit-server.
Result: chat works, but huddles fail with ERR_CONNECTION_REFUSED to
ws://[node-ip]:7880.

"creds in env" does not imply "server reachable". On a dev box,
unconditionally chain `service_livekit start` unless the caller passed
an explicit external URL (or secret-file). service_livekit start is
idempotent — no-op when the supervisor + livekit-server are already
running, full bootstrap (install → configure → start) when they are
not.

A caller that DID pass --livekit-url or --livekit-api-secret-file is
presumed to want an external instance; we don't try to stand up a
local one behind their back.

sameh-farouk added 1 commit

2026-04-22 00:23:40 +00:00

fix(service_livekit): pass explicit empty version strings to install 04d6f8026b

LiveKitService.install expects the `livekit_version` and
`backend_version` fields to be PRESENT in the params record, not
just valid. Empty-string values are the supervisor's "resolve latest
upstream GitHub release" signal (per hero_livekit/docs/api.md).

Previously we sent `params: {}`, which failed the same "Missing
required parameter" check that configure hits. Cascade: install
errored → binaries not downloaded → start errored with
"binaries not installed — call install first".

sameh-farouk added 1 commit

2026-04-22 00:27:44 +00:00

chore(service_livekit): debug print for curl invocation during install 329096cd8d

Temporary — remove once the 'curl: (7) localhost:80' regression is
tracked down. Prints the rpc_sock value, its path-exists status, the
method being called, and the exact bash -c string handed to bash.

sameh-farouk added 1 commit

2026-04-22 00:35:17 +00:00

fix(service_livekit): run curl via script file, not bash -c bb6a104116

On the dev box the \`nu → ^bash -c \$curl_cmd\` path was somehow
losing the --unix-socket flag en route to curl. Debug print showed
the string nu built was perfect; running the exact same string from
an interactive bash shell worked. But inside the nu def, curl fell
back to TCP localhost:80 (exit 7).

Write the curl invocation to a shell script file and execute it
with \`bash <file>\`. Bash reads the file verbatim — no shell-arg
parsing in the middle for nu to mangle. Same approach used by many
nushell scripts that need to drive complex external pipelines.

sameh-farouk added 1 commit

2026-04-22 00:46:26 +00:00

fix(service_livekit): use lowercase livekitservice.* RPC method names a3d5bf3e84

The supervisor registers methods as \`livekitservice.install\`,
\`livekitservice.configure\`, \`livekitservice.start\`, etc.
(lowercase), not \`LiveKitService.install\` as the hero_livekit
docs/api.md suggested.

Confirmed by strings-extract on the \`hero_livekit_server\` binary:

    livekitservice_rpc_install
    livekitservice_rpc_configure
    livekitservice_rpc_start
    livekitservice_rpc_stop
    livekitservice_rpc_restart
    livekitservice_rpc_status
    livekitservice.install
    livekitservice.configure
    livekitservice.start
    ...

The PascalCase form returns "Method not found" for install, and
while some PascalCase calls appeared to work earlier, that was
almost certainly confirmation bias reading error messages.
Aligning to what the binary actually dispatches on.

sameh-farouk added 1 commit

2026-04-22 00:48:53 +00:00

chore: verbose debug for svx_lk_rpc (temporary) 51e2ae6f07

Prints every file involved + raw contents to the terminal before
running curl. Reverting once we've pinned the port-80 regression.

sameh-farouk added 1 commit

2026-04-22 00:52:32 +00:00

fix(service_livekit): call curl directly from nu, drop bash wrapper 35397524e0

Both bash wrappers (\`bash -c \$cmd\` and \`bash <script>\`) were
consistently losing the --unix-socket flag somewhere between nu
and curl on the dev box, despite identical strings working in a
plain bash shell. A direct \`^curl\` invocation with argv-level
flags (and --data-binary @file for the body) works around the
issue entirely and keeps the secret payload off argv.

sameh-farouk added 1 commit

2026-04-22 01:00:37 +00:00

fix(service_livekit): include lk-backend in the build/install list aac8b1bb5a

The hero_livekit_backend crate in the hero_livekit workspace produces
a binary pinned to the name \"lk-backend\". The supervisor spawns this
alongside upstream livekit-server on start, and guards the start
operation behind a binary-presence check:

    {code: -32000, message: "Invalid input: binaries not installed — call install first"}

SVX_BINARIES drove cargo's --bin selection and the install verification,
but only listed hero_livekit_server and hero_livekit_ui. lk-backend was
never built, never copied to ~/hero/bin, and the supervisor refused to
start. Adding it to the list fixes the build and the supervisor's
presence check in one step.

(This is a separate concern from the missing livekitservice.install RPC
method — that method is not registered in the dispatcher for some
reason, but it turns out we don't need it: cargo can build everything
directly, and livekit-server itself is already downloaded at
~/hero/bin/livekit-server.)

sameh-farouk added 1 commit

2026-04-22 18:16:11 +00:00

Merge remote-tracking branch 'origin/development' into feat/service-livekit-and-collab-dev-defaults a610810427