rpc.discover can return a stale OpenRPC spec — regenerated clients then drift #32
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_rpc#32
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
rpc.discoveris implemented by reading a cached OpenRPC document that the server holds in memory. In practice, this document can fall out of sync with the live binary — if the server registers a service after caching, if the cache is populated from a staticinclude_str!'d file that pre-dates the last regen, or if the service registry mutates between cache-build and request time.Observed: a running hero_logic binary (built Apr 20) serves a
rpc.discoverresponse that omitsLogicService.play_startand siblings, even though those methods are absolutely present in the compiled code and the.oschema. Every downstream tool that regenerates fromrpc.discoverthen inherits the miss.Expected
rpc.discovershould return the exact same bytes as the OpenRPC spec compiled into the running binary. No caching layer between "what the code says" and "what the discover endpoint returns." If caching is needed for performance, it should invalidate on server start, not persist across a restart.Proposed fix
Option A: Serve
rpc.discoverdirectly from theinclude_str!'d spec constant. No runtime rebuild, no cache. O(1), always correct.Option B: Keep the current cache, but rebuild it on every server start (clear on startup, first call triggers rebuild from schema). Slightly more work on first call, safe across restarts.
Either is fine. Option A is simpler.
Related
rpc.discoverisn't the canonical source of truth anymore)This issue is specifically about the discover endpoint's correctness, regardless of what consumes it.
Promoting from "can land any time" to prerequisite for hero_logic#13.
#13's flow library + Service Agent v3 work depends on
LogicService.flow_library_searchreturning matches based on the current set of methods, and the agent then generating Python that calls the current generated client. Ifrpc.discoveron hero_logic serves a stale embedded spec, the router's cached~/.hero/var/router/python/hero_logic_client.pywon't match what the running service actually accepts — the LLM would generate calls against methods that no longer exist (or miss methods that just landed), and #13 will hit non-deterministic failures that look like agent bugs but are actually cache-drift.Proposed scope unchanged from this issue's original body: serve
rpc.discoverdirectly from theinclude_str!'d constant — no in-memory cache between the embedded spec and the response. Optionally: invalidate the router's per-service hash on service-process-restart so a rebuilt service triggers regeneration without waiting for the next scanner pass.Ready to pick this one up next.
timur referenced this issue2026-05-05 11:25:56 +00:00