Production Hardening: Auth Security, UX, Tests, Docs #44

Open
opened 2026-03-02 17:25:09 +00:00 by mik-tf · 0 comments
Owner

Overview

Tracking issue for all gaps identified after the initial login flow was unblocked (reliable17). Items are grouped by priority tranche and repo. Work is added to existing open PRs — no merge until reviewers sign off.

Current State

  • reliable17 deployed at heroos.gent02.grid.tf — 19/19 services Running, gateway 200, /login 200
  • Fixes shipped: -32601 dispatch bug, admin seed data, proxy default route, port mapping fix
  • Login works but system is not production-safe (Tranche 1 items below)

Open PRs

Repo Branch PR Tranche work to add
hero_services development_combined_deploy #43 Tranche 2 smoke tests
hero_osis development_fix_missing_domains #11 Tranche 1 + 3 auth items
hero_os development_consolidated #19 Tranche 2 UX + Playwright
hero_rpc new branch needed new PR Tranche 1 typed error codes

Tranche 1 — Security Critical (cannot ship to real users)

hero_osis → PR #11

  • Replace DefaultHasher in random_hex() with CSPRNG (OsRng) — session tokens and challenges are currently predictable
  • Replace SHA-256 password storage with Argon2/bcrypt — no salt, no iterations currently
  • Fix login() username binding — currently matches any user with matching password_hash, not the specific user whose challenge was requested
  • Auth integration tests:
    • Challenge-response happy path (get_challenge → login → session)
    • Wrong password returns -32000 not -32601 (regression guard)
    • Expired challenge rejected
    • Used challenge cannot be replayed
    • validate_session happy path
    • logout revokes session
  • Rate limiting on get_challenge + login endpoints

hero_rpc → new branch development_typed_error_codes

  • Typed RPC error code constants so UI can distinguish auth-failed vs session-expired vs server-error without string matching
  • Add data: Option<Value> field to RpcError in client struct (currently silently dropped)
  • Dispatch error code regression test

Tranche 2 — UX & Reliability

hero_os → PR #19

  • Session token persistence in localStorage — currently lost on page refresh
  • Session restoration on app init — check for existing valid token before showing login screen
  • Map error codes to friendly messages in auth_service.rs:
    • Wrong password → "Incorrect username or password"
    • Server unreachable → "Cannot connect to server"
    • Session expired → redirect to login
  • Fix URL doubling (/hero_os_http/hero_os_http) — SPA routing / base path issue in Dioxus router
  • Playwright E2E login tests:
    • Login success → redirect to main screen
    • Wrong password → shows friendly error message
    • Session persists across page reload

hero_services → PR #43

  • Smoke test: GET /login via gateway returns 200 (proxy default route)
  • Smoke test: GET /hero_os_http/ returns 200

Tranche 3 — Polish & Docs

hero_osis → PR #11

  • Session refresh endpoint (extend TTL without full re-auth)
  • Audit logging for auth events (login success/failure, logout, token revocation)
  • Token revocation on password change
  • O(n) session lookup: consider indexed token field

hero_os → PR #19

  • Admin UI: change password from UI
  • Password visibility toggle on login form

Documentation (hero_services or dedicated docs repo)

  • Default admin credentials (admin / admin) — document and warn to change on first deploy
  • Challenge-response protocol spec for client implementers
  • Password format (current MVP: SHA-256 of plaintext)
  • Session token format, TTL, expiry policy
  • Error codes per endpoint
  • Seed data directory format and how to add users
  • How to rotate admin password in production

Notes

  • public_key field in auth protocol is actually a username — rename when Ed25519 is implemented
  • Challenge map is in-memory (lost on restart) — acceptable for single-instance, document the limitation
  • Docker port mapping: host APP_PORT (8805) → container PROXY_PORT (6666) — fixed in PR #43
## Overview Tracking issue for all gaps identified after the initial login flow was unblocked (reliable17). Items are grouped by priority tranche and repo. Work is added to existing open PRs — no merge until reviewers sign off. ## Current State - **reliable17 deployed** at `heroos.gent02.grid.tf` — 19/19 services Running, gateway 200, `/login` 200 - Fixes shipped: `-32601` dispatch bug, admin seed data, proxy default route, port mapping fix - Login works but system is **not production-safe** (Tranche 1 items below) --- ## Open PRs | Repo | Branch | PR | Tranche work to add | |------|--------|----|--------------------| | hero_services | `development_combined_deploy` | [#43](https://forge.ourworld.tf/lhumina_code/hero_services/pulls/43) | Tranche 2 smoke tests | | hero_osis | `development_fix_missing_domains` | [#11](https://forge.ourworld.tf/lhumina_code/hero_osis/pulls/11) | Tranche 1 + 3 auth items | | hero_os | `development_consolidated` | [#19](https://forge.ourworld.tf/lhumina_code/hero_os/pulls/19) | Tranche 2 UX + Playwright | | hero_rpc | new branch needed | new PR | Tranche 1 typed error codes | --- ## Tranche 1 — Security Critical (cannot ship to real users) ### hero_osis → PR [#11](https://forge.ourworld.tf/lhumina_code/hero_osis/pulls/11) - [ ] Replace `DefaultHasher` in `random_hex()` with CSPRNG (`OsRng`) — session tokens and challenges are currently predictable - [ ] Replace SHA-256 password storage with Argon2/bcrypt — no salt, no iterations currently - [ ] Fix `login()` username binding — currently matches any user with matching password_hash, not the specific user whose challenge was requested - [ ] Auth integration tests: - [ ] Challenge-response happy path (get_challenge → login → session) - [ ] Wrong password returns `-32000` not `-32601` (regression guard) - [ ] Expired challenge rejected - [ ] Used challenge cannot be replayed - [ ] `validate_session` happy path - [ ] `logout` revokes session - [ ] Rate limiting on `get_challenge` + `login` endpoints ### hero_rpc → new branch `development_typed_error_codes` - [ ] Typed RPC error code constants so UI can distinguish auth-failed vs session-expired vs server-error without string matching - [ ] Add `data: Option<Value>` field to `RpcError` in client struct (currently silently dropped) - [ ] Dispatch error code regression test --- ## Tranche 2 — UX & Reliability ### hero_os → PR [#19](https://forge.ourworld.tf/lhumina_code/hero_os/pulls/19) - [ ] Session token persistence in `localStorage` — currently lost on page refresh - [ ] Session restoration on app init — check for existing valid token before showing login screen - [ ] Map error codes to friendly messages in `auth_service.rs`: - Wrong password → "Incorrect username or password" - Server unreachable → "Cannot connect to server" - Session expired → redirect to login - [ ] Fix URL doubling (`/hero_os_http/hero_os_http`) — SPA routing / base path issue in Dioxus router - [ ] Playwright E2E login tests: - [ ] Login success → redirect to main screen - [ ] Wrong password → shows friendly error message - [ ] Session persists across page reload ### hero_services → PR [#43](https://forge.ourworld.tf/lhumina_code/hero_services/pulls/43) - [ ] Smoke test: `GET /login` via gateway returns 200 (proxy default route) - [ ] Smoke test: `GET /hero_os_http/` returns 200 --- ## Tranche 3 — Polish & Docs ### hero_osis → PR [#11](https://forge.ourworld.tf/lhumina_code/hero_osis/pulls/11) - [ ] Session refresh endpoint (extend TTL without full re-auth) - [ ] Audit logging for auth events (login success/failure, logout, token revocation) - [ ] Token revocation on password change - [ ] O(n) session lookup: consider indexed token field ### hero_os → PR [#19](https://forge.ourworld.tf/lhumina_code/hero_os/pulls/19) - [ ] Admin UI: change password from UI - [ ] Password visibility toggle on login form ### Documentation (hero_services or dedicated docs repo) - [ ] Default admin credentials (`admin` / `admin`) — document and warn to change on first deploy - [ ] Challenge-response protocol spec for client implementers - [ ] Password format (current MVP: SHA-256 of plaintext) - [ ] Session token format, TTL, expiry policy - [ ] Error codes per endpoint - [ ] Seed data directory format and how to add users - [ ] How to rotate admin password in production --- ## Notes - `public_key` field in auth protocol is actually a username — rename when Ed25519 is implemented - Challenge map is in-memory (lost on restart) — acceptable for single-instance, document the limitation - Docker port mapping: host `APP_PORT` (8805) → container `PROXY_PORT` (6666) — fixed in PR #43
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_services#44
No description provided.