Phase 4: Request-based version router — pick_version_flow #8
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
After Phases 1-3 (#5, #6, #7), workflows have versions and each version has benchmark data (success rate, avg duration, cost estimate). This issue wires that into request-based selection: given a user request and preferences (cost/speed/accuracy weights), pick the version that best matches.
Example use case: the user asks "what's the weather?" — a simple request that any version handles fast. The router picks the cheapest, fastest version. The user asks "audit my calendar for conflicts across 3 timezones and propose a rebalancing" — a complex request that needs the accurate-but-expensive version.
Goal
A
pick_version_flowtemplate that:Also: a convenience RPC
logicservice.pick_version(workflow_sid, user_request, preferences?) -> {version_sid, confidence, reasoning}that runs this flow and returns the pick synchronously.Depends on
OSchema changes (small)
The
Benchmarkrecord (from #7) already hasdifficulty_rating: f32set to 0.5 by default. This issue populates it properly: the AI-generated mock inputs inbenchmark_floware rated for difficulty when the benchmark runs. Requires a minor addition tobenchmark_flow(not to the schema):benchmark_flow'scompute_stats_and_savenode ALSO asks an AI node to rate the average difficulty of the mock inputs (0.0 = trivial, 1.0 = very complex), stored inBenchmark.difficulty_ratingpick_version_flowcan compute it at query timeNew flow template
File:
crates/hero_logic/templates/pick_version_flow.jsonA 3-node flow:
Node 1:
rate_request(AI)Input:
user_request.{difficulty: 0.0-1.0, rationale: "..."}.{difficulty, rationale}Node 2:
score_versions(Python)Input: difficulty from node 1,
target_workflow_sid,preferences.benchmark_list_for_workflow(target_workflow_sid)workflow_version_sid, keep latest per versionnorm_cost[v] = cost[v] / max_cost(0.0 = cheapest, 1.0 = most expensive)norm_duration[v] = duration[v] / max_durationnorm_reliability[v] = success_rate[v] / 100.0effective_accuracy_weight = accuracy_weight * (1 + difficulty)), boost cheap-fast for easy ones. Re-normalize weights.{scored: [{version_sid, score, normalized_metrics, raw_metrics}], difficulty: float}Node 3:
return_pick(Python)Input: scored versions from node 2.
current_version_sidwithconfidence: 0.0and a note{version_sid, confidence: float, reasoning: "..."}confidence= normalized score gap between winner and runner-up (1.0 = runner-up is 50%+ worse, 0.0 = tied)Flow inputs:
RPC convenience method
File:
crates/hero_logic/src/logic/server/rpc.rsThis is a composition, not a new flow type. It wraps the pick_version_flow execution in a simple synchronous interface.
UI changes
File:
crates/hero_logic_ui/templates/workflow_editor.html+ JSpick_versionRPC and shows the chosen version_sid + reasoning without starting a playplay_startwith the picked versionIntegration with service_agent
Service agent could optionally route:
router.agent.startaccepts aroute: trueflag. If set:pick_version(service_agent_workflow_sid, prompt)firstversion_sidfor the playThis is a pattern, not a strict requirement for this issue.
Acceptance criteria
pick_version_flowtemplate exists and loadscurrent_version_sidwith confidence 0.0 when no benchmarks existlogicservice.pick_versionRPC convenience method workspick_versionwith "simple ping" picks the cheap version and "complex multi-step task" picks the accurate oneOut of scope
Future work
Backend landed:
05a3ef7Phase 4 is functional. For brevity I implemented
pick_versionas a server method directly rather than a separate flow template — the scoring math is deterministic and easier to read/test in Rust than in a 3-node DAG. Apick_version_flowtemplate with an AI-based difficulty rater can be added later as a layered improvement (my current difficulty is a length-based heuristic).What's live
RPC method:
pick_version(workflow_sid, user_request, preferences_json) -> str— returns JSON string with{version_sid, confidence, difficulty, reasoning, considered: [...]}Algorithm:
cost_weight/speed_weight/accuracy_weight) with equal-third defaults; re-normalize to sum 1.0adj_acc_w = acc_w * (1 + difficulty)then re-normalize(1 - norm_cost) * cost_w + (1 - norm_dur) * speed_w + norm_success_rate * adj_acc_w1 - runner_up_score / winner_score(gap-based)Fallback: no benchmarks →
current_version_sidwith confidence 0.0Verified
Remaining
pick_versionRPC picks best version from benchmarks + weightspick_version_flowtemplate with AI-based difficulty node — would replace the current heuristic. Recommend as a follow-up once Phase 3 has richer benchmark data to differentiate versions.Summary of all four phases
All four backends are live:
d5b3c95671d02698831aa05a3ef7All four are functional end-to-end from the RPC surface. UI wiring is the remaining common thread — it's a bigger chunk of work that fits better as its own issue covering all phases' UI needs cohesively.
Phase 4 UI landed (commit
c9bfd09)Routebutton + preferences modal are live in the workflow editor:Routebutton opens a modal with a request<textarea>and three weight sliders (cost / speed / accuracy). Sliders auto-normalize to sum to 1.0 at submit, so users can leave them at any scale (defaults 33/33/34).logicservice.pick_version(workflow_sid, user_request, preferences_json); result panel renders:version_label (version_sid)confidence %(rounded)difficultyheuristic (2dp)reasoningstringconsideredversion sidsStart play on this versionbutton on the result:workflow_set_current_versionif the picked version differs from current (so the play runs against the recommended version)promptinput, pre-fills it with the user's original request — so the router's choice flows end-to-end without the user retyping.Verified:
pick_versionon workflow00dz(no benchmarks yet) returns{version_sid: current, confidence: 0.0, reasoning: "No benchmarked versions — falling back to current_version_sid"}→ the UI correctly shows 0% confidence and reasoning.hl-route-modal,openRouteModal, weight sliders) renders on/workflows/00dz.Still open for this issue:
pick_version_flow.json) replacing the current heuristic — gives the router actual semantic grounding.estimated_cost_usdandsuccess_rate; currently cost is 0 until hero_proc surfaces per-job token counts to the benchmark executor.