osis dispatch flattens typed handler errors → all surface as -32603 "Redis operation error" (validation errors should be -32602) #83

Closed
opened 2026-05-20 07:16:32 +00:00 by mahmoud · 2 comments
Owner

Summary

Every business/validation error returned by an OSIS service handler surfaces to JSON-RPC clients as -32603 Internal error with a misleading data string prefixed Redis operation error:, regardless of the actual cause. Typed error variants (e.g. a service's InvalidInput vs Internal) are discarded by the generated dispatch glue, so clients can't distinguish a caller mistake (should be -32602 Invalid params) from a real server fault (-32603).

Seen across every osis-based service (hero_db, hero_proc, hero_compute, …). Concrete example from hero_compute's ComputeService.deploy_vm, which returns ComputeServiceError::InvalidInput("insufficient capacity …"):

code:    -32603
message: Internal error
data:    Redis operation error: Invalid input: insufficient capacity on tfnode-1774 …

The service did the right thing (typed InvalidInput), but the wire response says Internal error / Redis operation error, neither of which is true.

Mechanism (where the type info is lost)

  1. The handler returns its typed error, e.g. ComputeServiceError::InvalidInput(msg) (Display: "Invalid input: <msg>").
  2. The generated server glue (*_server_generated.rs, handle_service_call / handle_rpc_call) flattens it to a string and re-wraps it:
    Err(hero_rpc_osis::rpc::RpcError::Operation(response.error))
    
    The typed variant is gone here — only the Display string remains.
  3. RpcError::Operation's Display is hardcoded (crates/osis/src/rpc/error.rs:13):
    #[error("Redis operation error: {0}")]
    Operation(String),
    
    → the misleading Redis operation error: prefix, even when nothing Redis-related happened.
  4. The dispatch (crates/osis/src/rpc/dispatch.rs) maps the failed call to a fixed JSON-RPC code:
    • type-handler path → JsonRpcError::internal_error(&rpc_resp.data)-32603
    • service-call fallback → JsonRpcError::application_error(...)-32000
      There is no branch that produces invalid_params (-32602) for caller/validation errors.

Impact

  • Clients/UIs can't programmatically tell "you sent bad params / it won't fit" from "the server broke." Both are -32603.
  • The Redis operation error: prefix is actively misleading during debugging (suggests a storage fault that didn't occur).
  • Affects every OSIS service uniformly.

Proposed fix

Proper (preferred): preserve the handler error category through the generated glue so dispatch can map it:

  • Give the per-service error enum (e.g. ComputeServiceError) a small category() (or a shared RpcErrorKind: Invalid, NotFound, PermissionDenied, Internal).
  • Have the generator carry that category into the RpcError it constructs (add variants like RpcError::Invalid(String) / RpcError::NotFound(String) / RpcError::PermissionDenied(String) instead of funneling everything through Operation).
  • Map in dispatch.rs: Invalid → invalid_params(-32602), NotFound → -32602 (or app error -32000), PermissionDenied → app error, Internal/Operation → internal_error(-32603).
  • Rename/repurpose RpcError::Operation's message so non-Redis failures don't claim Redis operation error.

Pragmatic interim (smaller, fragile): in dispatch.rs, when building the JsonRpcError from a failed response, classify by the message prefix produced by the standard error Displays — "Invalid input:" → invalid_params(-32602), "Not found:" → -32602, "Permission denied:" → application_error. Also fix the Operation Display prefix. String-matching is brittle and should be a stopgap only.

Acceptance

  • A handler returning a validation/InvalidInput error surfaces as JSON-RPC -32602 with a clean message (no Redis operation error: / Internal error wording).
  • Real internal faults still surface as -32603.
  • Behavior consistent across all osis services after regeneration.

Context

Found while adding a capacity precheck to hero_compute deploy_vm (lhumina_code/hero_compute PR #113). The precheck correctly returns InvalidInput, but it reaches the client as -32603 Internal error because of the above. Filing here because the fix is framework + codegen, not per-service.

## Summary Every business/validation error returned by an OSIS service handler surfaces to JSON-RPC clients as `-32603 Internal error` with a misleading `data` string prefixed `Redis operation error:`, regardless of the actual cause. Typed error variants (e.g. a service's `InvalidInput` vs `Internal`) are discarded by the generated dispatch glue, so clients can't distinguish a caller mistake (should be `-32602 Invalid params`) from a real server fault (`-32603`). Seen across every osis-based service (hero_db, hero_proc, hero_compute, …). Concrete example from `hero_compute`'s `ComputeService.deploy_vm`, which returns `ComputeServiceError::InvalidInput("insufficient capacity …")`: ``` code: -32603 message: Internal error data: Redis operation error: Invalid input: insufficient capacity on tfnode-1774 … ``` The service did the right thing (typed `InvalidInput`), but the wire response says `Internal error` / `Redis operation error`, neither of which is true. ## Mechanism (where the type info is lost) 1. The handler returns its typed error, e.g. `ComputeServiceError::InvalidInput(msg)` (Display: `"Invalid input: <msg>"`). 2. The **generated** server glue (`*_server_generated.rs`, `handle_service_call` / `handle_rpc_call`) flattens it to a string and re-wraps it: ```rust Err(hero_rpc_osis::rpc::RpcError::Operation(response.error)) ``` The typed variant is gone here — only the Display string remains. 3. `RpcError::Operation`'s `Display` is hardcoded (`crates/osis/src/rpc/error.rs:13`): ```rust #[error("Redis operation error: {0}")] Operation(String), ``` → the misleading `Redis operation error:` prefix, even when nothing Redis-related happened. 4. The dispatch (`crates/osis/src/rpc/dispatch.rs`) maps the failed call to a fixed JSON-RPC code: - type-handler path → `JsonRpcError::internal_error(&rpc_resp.data)` → `-32603` - service-call fallback → `JsonRpcError::application_error(...)` → `-32000` There is no branch that produces `invalid_params` (`-32602`) for caller/validation errors. ## Impact - Clients/UIs can't programmatically tell "you sent bad params / it won't fit" from "the server broke." Both are `-32603`. - The `Redis operation error:` prefix is actively misleading during debugging (suggests a storage fault that didn't occur). - Affects every OSIS service uniformly. ## Proposed fix **Proper (preferred):** preserve the handler error category through the generated glue so dispatch can map it: - Give the per-service error enum (e.g. `ComputeServiceError`) a small `category()` (or a shared `RpcErrorKind`: `Invalid`, `NotFound`, `PermissionDenied`, `Internal`). - Have the generator carry that category into the `RpcError` it constructs (add variants like `RpcError::Invalid(String)` / `RpcError::NotFound(String)` / `RpcError::PermissionDenied(String)` instead of funneling everything through `Operation`). - Map in `dispatch.rs`: `Invalid → invalid_params(-32602)`, `NotFound → -32602` (or app error -32000), `PermissionDenied → app error`, `Internal/Operation → internal_error(-32603)`. - Rename/repurpose `RpcError::Operation`'s message so non-Redis failures don't claim `Redis operation error`. **Pragmatic interim (smaller, fragile):** in `dispatch.rs`, when building the `JsonRpcError` from a failed response, classify by the message prefix produced by the standard error Displays — `"Invalid input:" → invalid_params(-32602)`, `"Not found:" → -32602`, `"Permission denied:" → application_error`. Also fix the `Operation` Display prefix. String-matching is brittle and should be a stopgap only. ## Acceptance - A handler returning a validation/`InvalidInput` error surfaces as JSON-RPC `-32602` with a clean message (no `Redis operation error:` / `Internal error` wording). - Real internal faults still surface as `-32603`. - Behavior consistent across all osis services after regeneration. ## Context Found while adding a capacity precheck to `hero_compute` deploy_vm (lhumina_code/hero_compute PR #113). The precheck correctly returns `InvalidInput`, but it reaches the client as `-32603 Internal error` because of the above. Filing here because the fix is framework + codegen, not per-service.
Owner

Working on this in issue-83-error-categories. First commit (7d37514) lands the framework + codegen changes:

  • RpcErrorKind { Invalid, NotFound, PermissionDenied, Internal } added in crates/osis/src/rpc/error.rs, plus matching RpcError variants. Renamed RpcError::Operation Display prefix away from Redis operation error:.
  • protocol::RpcResponse and the per-domain generated RpcResponse now carry error_kind. handle_request, handle_rpc_call, and handle_service_call propagate the category instead of collapsing to RpcError::Operation(String).
  • Generator (crates/generator/src/rust/rust_rpc.rs) emits category() on every <Service>Error enum; the OSIS emit (crates/generator/src/rust/rust_osis.rs) wires it into the service-method RPC wrappers and into handle_rpc_call / handle_service_call.
  • Dispatch layers (crates/osis/src/rpc/dispatch.rs, crates/server/src/server/{domain,unified}_server.rs) now map by typed variant: Invalid and NotFound-32602, PermissionDenied-32000, Internal-32603.

Next: regenerate example/recipe_server, then add a unit/integration test exercising InvalidInput-32602.

Working on this in `issue-83-error-categories`. First commit (7d37514) lands the framework + codegen changes: - `RpcErrorKind { Invalid, NotFound, PermissionDenied, Internal }` added in `crates/osis/src/rpc/error.rs`, plus matching `RpcError` variants. Renamed `RpcError::Operation` Display prefix away from `Redis operation error:`. - `protocol::RpcResponse` and the per-domain generated `RpcResponse` now carry `error_kind`. `handle_request`, `handle_rpc_call`, and `handle_service_call` propagate the category instead of collapsing to `RpcError::Operation(String)`. - Generator (`crates/generator/src/rust/rust_rpc.rs`) emits `category()` on every `<Service>Error` enum; the OSIS emit (`crates/generator/src/rust/rust_osis.rs`) wires it into the service-method RPC wrappers and into `handle_rpc_call` / `handle_service_call`. - Dispatch layers (`crates/osis/src/rpc/dispatch.rs`, `crates/server/src/server/{domain,unified}_server.rs`) now map by typed variant: `Invalid` and `NotFound` → `-32602`, `PermissionDenied` → `-32000`, `Internal` → `-32603`. Next: regenerate `example/recipe_server`, then add a unit/integration test exercising `InvalidInput` → `-32602`.
Owner

All three pieces landed on issue-83-error-categories (4 commits, last is 46601e1). Summary of how the proposed-fix design plays out:

Shared category: RpcErrorKind { Invalid, NotFound, PermissionDenied, Internal } in crates/osis/src/rpc/error.rs, plus matching RpcError::Invalid / NotFound / PermissionDenied variants. RpcError::Operation Display fixed (no more Redis operation error: for non-Redis failures).

Codegen carries it through: every generated <Service>Error enum now has category() returning an RpcErrorKind. The OSIS server glue (*_server_generated.rs) tags RpcResponse.error_kind from e.category() for service-method errors and from RpcErrorKind::Invalid for missing-parameter errors; handle_rpc_call / handle_service_call reconstruct the typed RpcError variant from error_kind instead of funnelling through RpcError::Operation.

Dispatch maps by variant:

  • RpcError::Invalid and NotFoundJsonRpcError::invalid_params(...) (-32602)
  • RpcError::PermissionDeniedJsonRpcError::application_error(...) (-32000)
  • RpcError::Operation and other Internal-kind variants → JsonRpcError::internal_error(...) (-32603)

Applied at crates/osis/src/rpc/dispatch.rs and at crates/server/src/server/{domain,unified}_server.rs. The old string-matching classification (msg.contains("Unknown method"), msg.contains("not found"), …) is gone in favor of typed RpcError matching.

Acceptance verification: regenerated example/recipe_server shows the new glue (verified by grep on osis_server_generated.rs). Three integration tests in tests_error_category.rs drive OsisRecipes::handle_service_call and assert:

  • missing param → RpcError::Invalid(...), message contains no Redis operation error: prefix
  • handler returning RecipeServiceError::NotFoundRpcError::NotFound(...), same Redis-prefix guard
  • unknown service method still carries the Unknown method: marker dispatch keys on for -32601

Unit tests in error.rs (3) and dispatch.rs (6) pin the kind→code mapping. All 77 hero_rpc_osis lib tests + 6 recipe_server tests pass. The branch is ready for review at https://forge.ourworld.tf/lhumina_code/hero_rpc/compare/development...issue-83-error-categories

All three pieces landed on `issue-83-error-categories` (4 commits, last is 46601e1). Summary of how the proposed-fix design plays out: **Shared category:** `RpcErrorKind { Invalid, NotFound, PermissionDenied, Internal }` in `crates/osis/src/rpc/error.rs`, plus matching `RpcError::Invalid` / `NotFound` / `PermissionDenied` variants. `RpcError::Operation` Display fixed (no more `Redis operation error:` for non-Redis failures). **Codegen carries it through:** every generated `<Service>Error` enum now has `category()` returning an `RpcErrorKind`. The OSIS server glue (`*_server_generated.rs`) tags `RpcResponse.error_kind` from `e.category()` for service-method errors and from `RpcErrorKind::Invalid` for missing-parameter errors; `handle_rpc_call` / `handle_service_call` reconstruct the typed `RpcError` variant from `error_kind` instead of funnelling through `RpcError::Operation`. **Dispatch maps by variant:** - `RpcError::Invalid` and `NotFound` → `JsonRpcError::invalid_params(...)` (-32602) - `RpcError::PermissionDenied` → `JsonRpcError::application_error(...)` (-32000) - `RpcError::Operation` and other Internal-kind variants → `JsonRpcError::internal_error(...)` (-32603) Applied at `crates/osis/src/rpc/dispatch.rs` and at `crates/server/src/server/{domain,unified}_server.rs`. The old string-matching classification (`msg.contains("Unknown method")`, `msg.contains("not found")`, …) is gone in favor of typed `RpcError` matching. **Acceptance verification:** regenerated `example/recipe_server` shows the new glue (verified by grep on `osis_server_generated.rs`). Three integration tests in `tests_error_category.rs` drive `OsisRecipes::handle_service_call` and assert: - missing param → `RpcError::Invalid(...)`, message contains no `Redis operation error:` prefix - handler returning `RecipeServiceError::NotFound` → `RpcError::NotFound(...)`, same Redis-prefix guard - unknown service method still carries the `Unknown method:` marker dispatch keys on for `-32601` Unit tests in `error.rs` (3) and `dispatch.rs` (6) pin the kind→code mapping. All 77 `hero_rpc_osis` lib tests + 6 recipe_server tests pass. The branch is ready for review at https://forge.ourworld.tf/lhumina_code/hero_rpc/compare/development...issue-83-error-categories
timur closed this issue 2026-05-20 08:16:43 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_rpc#83
No description provided.