feat: UI + model revamp — Logic everywhere, recursive composition, single logic view #38

Closed
opened 2026-05-14 12:23:23 +00:00 by timur · 1 comment
Owner

feat: UI + model revamp — "Logic" everywhere, recursive composition, single logic view

TL;DR

Reframe the system around logics instead of workflows. A logic is a unit of typed I/O backed by Python code. Every @logic-decorated function inside a logic's source is itself a logic — composable all the way down. Sub-logic navigation breadcrumbs into the child's view; back out to the parent. Stop at primitive Python (imported clients, raw lines, built-ins).

The UI collapses to two views only: a dashboard listing logics, and the logic view itself. Within the logic view:

  • Left sidebar = info: title · description · inputs · outputs · versions · breadcrumbs
  • Middle = flow / code (the visual flow of sub-logics, and Monaco source)
  • Right sidebar = stats: benchmark stats for the current version when idle, play stats when a play is active
  • Bottom play bar = inputs (values) + examples + plays-history (left col) | logs + steps + pause-forms (mid col) | output (right col) + ▶ Run

Everything else — separate /examples, separate /plays list, the dedicated /plays/{sid} detail page — goes away.


1. The new mental model

A Logic has:

  • name (identifier, unique-ish)
  • description
  • inputs: [FlowField]
  • outputs: [FlowField]
  • versions: [LogicVersion] — each carries its own python_source
  • current_version_sid

A LogicVersion's python_source is the Python code. Every @logic-decorated function in that source is a sub-logic. Calls to other top-level logics via logic.invoke("name", ...) resolve to other Logic records. Either way, when the runtime executes, every @logic call opens a span — same as today's @flow. The flow view is the visualisation of those spans (live during a play, or pinned to the latest play when idle).

Stopping rule: the recursion stops at non-@logic code:

  • imported client method calls (HeroAibrokerClient().chat(...))
  • standard library functions (re.sub(...), json.loads(...))
  • inline expressions, loops, conditionals — the structure of these is what the flow view eventually visualises, but they aren't themselves clickable sub-logics

So "is this a sub-logic?" = "is this a @logic-decorated function defined inside the parent's source, or a logic.invoke("name", ...) call to a named Logic record?"


2. Layout

2.1 The logic view (single page, fixed regions)

┌──────────────────────────────────────────────────────────────────────┐
│                                                                      │
│  LEFT (info)        │   MIDDLE (flow / code)   │  RIGHT (stats)      │
│                                                                      │
│  ↶ Breadcrumb       │                          │  ┌────────────────┐ │
│   parent / sub /    │                          │  │ Benchmark v3   │ │
│   current           │                          │  │ success: 75%   │ │
│                     │                          │  │ p50 dur: 7.2s  │ │
│  Title              │   ▼ Flow view            │  │ avg cost: …    │ │
│  Description        │     fetch_catalog →      │  │ runs: 24       │ │
│                     │     select_services →    │  └────────────────┘ │
│  Inputs             │     attempt 1 →          │                     │
│   prompt: string    │       service_code_gen → │  (or, when a play   │
│   model: string     │         model_call →     │   is selected:      │
│                     │       script_execution → │   stats of THAT     │
│  Outputs            │     summarize            │   play replace the  │
│   summary: string   │                          │   benchmark card)   │
│                     │   [graph / code toggle]  │                     │
│  Versions           │                          │                     │
│   v1 v2 v3 (current)│                          │                     │
│                                                                      │
├──────────────────────────────────────────────────────────────────────┤
│  PLAY BAR (always visible, three columns)                            │
│                                                                      │
│  ◀ Inputs + Examples     ┃ Live trace + Pause forms ┃ Output         │
│                          ┃                          ┃                │
│  prompt:    [_________]  ┃ 09:14:22  fetch_catalog  ┃ {              │
│  model:     [_________]  ┃           ok      400ms  ┃   "summary":   │
│                          ┃ 09:14:23  select_serv…   ┃     "…"        │
│  Examples ▾              ┃           ok    2.6s     ┃ }              │
│   • Calendar event       ┃ 09:14:25  ⏸ ask_user    ┃                │
│   • Find a contact       ┃   ◉ keep   ○ replace    ┃                │
│   • Marketplace tokens   ┃   ○ cancel               ┃                │
│                          ┃   [Submit]               ┃                │
│  Plays ▾                 ┃                          ┃                │
│   02xy success  3m ago   ┃                          ┃                │
│   02xx failed   8m ago   ┃                          ┃                │
│                          ┃                          ┃                │
│  [▶ Run]                 ┃                          ┃                │
└──────────────────────────────────────────────────────────────────────┘

Regions are persistent. The play bar is always there. The flow column always renders whichever is the current overlay (latest play when first loaded; whichever the user picked from the Plays list inside the play bar; live one when ▶ Run is hit).

2.2 Dashboard

A single page at /. Lists every Logic by name + description + last run status + last benchmark success rate. Click a logic → open /logics/{sid}. That's it.

2.3 Routes that go away

  • /workflows (list) — superseded by dashboard
  • /workflows/{sid} — renamed to /logics/{sid}; old route 302s
  • /workflows/new — replaced by a "+ New logic" button on the dashboard
  • /examples — examples are inline on each logic's play bar; no global page
  • /plays (list) — plays are inline on each logic's play bar
  • /plays/{sid} — was the dedicated detail page from #32; finishes its removal (already a redirect today)
  • The top toolbar / header on the logic view — run controls move to the play bar

3. Conceptual rename (data + code)

Workflow and friends become Logic. Done as a hard rename in code with serde aliases on storage so existing OTOML records keep loading.

Old New Notes
Workflow rootobject Logic #[serde(alias = "Workflow")] on the type tag if/when OTOML serializes it. Field names like workflow_sidlogic_sid get #[serde(alias = "workflow_sid")] aliases.
WorkflowVersion LogicVersion Same alias treatment.
Workflow.current_version_sid unchanged label The field name "current_version_sid" still makes sense.
Play.workflow_sid / Play.workflow_version_sid Play.logic_sid / Play.logic_version_sid Plus #[serde(alias = "workflow_sid")].
Example.workflow_sid / Example.workflow_version_sid Example.logic_sid / Example.logic_version_sid Same.
Benchmark.workflow_sid / Benchmark.workflow_version_sid Benchmark.logic_sid / Benchmark.logic_version_sid Same.
LogicService (service name) unchanged Already says "Logic".
LogicService.workflow_* RPC methods LogicService.logic_* Old method names registered as aliases at the dispatch layer to keep generated clients working through a deprecation cycle.
@flow(...) decorator @logic(...) flow stays exported from hero_tracing as an alias of logic for one release.
flow.invoke(name, ...) logic.invoke(name, ...) Same alias rule.
flow.pause(...) / ask_user.* logic.pause(...) / ask_user.* Same.
hero_tracing module name unchanged for now The exports rename; the module stays so stored sources keep importing.

The @flow@logic rename is purely an alias — both names refer to the same decorator. Stored python_source that says from hero_tracing import flow keeps working; new sources say from hero_tracing import logic.


4. The flow view (middle column)

4.1 Idle (no overlay)

Parse the current version's python_source for @logic-decorated functions and logic.invoke(...) calls. Render a static graph of declared sub-logics in source-order:

fetch_catalog → select_services → attempt 1 (loop) → summarize
                                  └─ service_code_gen → model_call
                                  └─ script_execution

Each node is clickable. Click a sub-logic node → breadcrumb-navigate into that sub-logic's view. The breadcrumb lives at the top of the left sidebar:

↶ service_agent / service_code_gen

Click "service_agent" in the breadcrumb → back out to the parent. Stop at primitive calls (imported clients, stdlib): these render as leaf nodes with a "primitive" badge and are not clickable.

4.2 Active (a play is selected)

Render the actual span tree of the play. Same node shapes, but now they carry status + duration + the recorded inputs/outputs. Replayed spans dashed; failed spans red; in-progress pulse. Clicking a span node = same drill-in behaviour as idle: breadcrumb into that sub-logic, except now the sub-logic's flow view shows ITS spans inside the parent play (filtered to spans whose path descends from the clicked node). Breadcrumb back out, parent re-renders.

4.3 Code view

Toggle in the middle column header: [Graph] [Code] [Split]. Code is the Monaco editor bound to LogicVersion.python_source. Clicking a graph node highlights the corresponding source lines.

4.4 Future direction (out of scope, called out)

The flow view eventually becomes a visual code editor:

  • Drag logic.invoke("...") blocks from a palette of saved logics on the right.
  • Drag imported client method calls as primitive nodes.
  • Visualise loops as repeating blocks, conditionals as branch points, asyncio.gather as parallel forks.
  • Show data flow between sub-logics as connecting lines: where does each input come from?
  • Two-way binding: editing the graph updates the source; editing the source re-renders the graph.

Not in this issue's scope — but the data model (@logic + named sub-logics + typed I/O) is designed so this is a future-compatible direction.


5. The play bar (bottom, three columns)

Always visible. Heights persist via localStorage. The three columns:

5.1 Left column — Inputs + Examples + Plays history

  • Inputs: one labeled field per declared input. Type-appropriate widget (text / number / boolean / json textarea). Type comes from Logic.inputs[i].field_type.
  • Examples ▾: collapsible list of saved Example records for this logic. Click one → populate the input fields. "Save as example" button writes the current values back as a new Example.
  • Plays ▾: collapsible list of the most recent N plays. Click one → load it as the current overlay on the flow view + populate input fields with its input_data. Right sidebar switches to that play's stats.
  • ▶ Run button: validates inputs against declared types, calls logic.play_start, the new play becomes the overlay.

5.2 Middle column — Live trace + Pause forms

  • Live span events scroll here as they arrive (the JSONL feed the SDK emits).
  • When the active play hits awaiting_resume, the pause form is rendered at the top of the middle column as a banner (always visible until answered, regardless of how the user has scrolled the log feed below it).
  • Pause forms render per ResumeRequest.ui.kind: text / number / choice / multi_choice / confirm. Submit posts play_resume.

5.3 Right column — Output

  • Renders the play's output_data as it accumulates.
  • If Logic.outputs is declared, render one labeled card per output field; else render raw JSON.
  • For a paused play, output is empty (the flow hasn't returned yet). For a successful play, output is final. For a failed play, output is empty + an error banner is shown in the middle column.

5.4 Pause UX nit

When a play pauses, the play bar visually emphasizes the pause: middle column shifts the pause form to the top + adds an accent border. The user shouldn't have to find the pause form — the play bar makes it the most prominent thing.


6. The right stats sidebar

Two modes, switched by whether a play is currently selected:

6.1 No play selected → version benchmark stats

Shows the latest Benchmark for Logic.current_version_sid:

  • success rate (0-100%)
  • p50 / p95 duration
  • avg tokens (prompt + completion)
  • estimated cost USD
  • difficulty rating
  • "Run benchmark" button → opens a small dialog to configure num_runs + which example set, then kicks off a benchmark play set

6.2 Play selected → play stats

Shows that play's:

  • status + duration
  • total tokens (prompt + completion)
  • estimated cost USD (if computed)
  • attempts (count, if the flow uses retry loops)
  • error summary (if failed)
  • "Cancel" button (if status is running / awaiting_resume)

Compact. No tabs. No interaction beyond the cancel button — drill-in lives elsewhere (the flow view middle column, the play bar columns).


7. Worked example: service_agent

This is what the new UI looks like for the existing service_agent flow. No code changes to service_agent.py itself — just renames + the UI rendering.

7.1 Dashboard → click service_agent

User lands on /logics/{service_agent_sid}.

7.2 Initial state (no play overlay)

Left sidebar:

↶ service_agent

Title:        service_agent
Description:  Self-contained AI agent: discovers Hero services
              via hero_router, compiles selected service stubs,
              generates a Python script, runs it, and summarizes
              the result.

Inputs:
  prompt:           string  (required)  "user's request in natural language"
  code_gen_model:   string  ""          "override for the code-gen model"

Outputs:
  summary:          string              "natural-language reply to the prompt"

Versions: v1  v2  v3 (current)

Middle (flow view, idle, parsed from source):

service_agent
  ├─ fetch_catalog          (sub-logic)
  ├─ select_services        (sub-logic)
  ├─ compile_stubs          (sub-logic)
  ├─ for attempt in range(3):
  │    ├─ service_code_gen  (sub-logic; contains model_call)
  │    ├─ script_execution  (sub-logic)
  │    └─ debug_feedback    (sub-logic, on failure)
  └─ summarize              (sub-logic)

Each sub-logic is a clickable node. for attempt in range(3) renders as a loop container. The model_call inside service_code_gen only shows when that node is expanded.

Right sidebar (benchmark stats for v3):

Benchmark — v3
  runs:          24
  success rate:  75%
  p50 duration:  6.8s
  p95 duration:  12.4s
  avg tokens:    1284 prompt + 482 completion
  est. cost:     $0.018 / run
  difficulty:    0.42

  [Run new benchmark]

Bottom play bar (idle):

┃ Inputs                       ┃                              ┃ Output            ┃
┃ prompt:    [____________]    ┃   (no play yet — click ▶)    ┃                   ┃
┃ model:     [____________]    ┃                              ┃                   ┃
┃                              ┃                              ┃                   ┃
┃ Examples ▾                   ┃                              ┃                   ┃
┃  • Calendar event            ┃                              ┃                   ┃
┃  • Find a contact            ┃                              ┃                   ┃
┃  • List healthy routers      ┃                              ┃                   ┃
┃  • Marketplace tokens        ┃                              ┃                   ┃
┃                              ┃                              ┃                   ┃
┃ Plays ▾                      ┃                              ┃                   ┃
┃  02hq  failed     2 hr ago   ┃                              ┃                   ┃
┃  02ca  success    1 d ago    ┃                              ┃                   ┃
┃                              ┃                              ┃                   ┃
┃ [▶ Run]                      ┃                              ┃                   ┃

7.3 User clicks "Calendar event" example, hits ▶ Run

Inputs auto-fill: prompt: "Create a calendar event titled X tomorrow at 10am", model: "". Run → logic.play_start → new play 02j7 becomes the overlay.

Right sidebar switches to play stats (status: running, started 0:00).

Middle column (flow view, live):

service_agent                    [running]   0.4s
  ├─ fetch_catalog               [ok]       423ms
  ├─ select_services             [ok]       2.6s
  ├─ compile_stubs               [ok]         5ms
  └─ attempt 1                   [running]
       └─ service_code_gen       [running]
            └─ model_call        [running]

Play bar middle column — live log feed:

09:14:22  fetch_catalog              ok      423ms
09:14:23  select_services            ok      2.6s    (picked: hero_osis_calendar)
09:14:25  compile_stubs              ok        5ms
09:14:25  attempt 1
09:14:25  service_code_gen           running
09:14:26    model_call               running ai.chat (groq-strong)

7.4 A step pauses with ask_user.choice(...)

Suppose select_services decides the chosen service's rootobjects don't clearly match the prompt and calls:

ask_user.choice(
    "hero_osis_calendar has these rootobjects matching 'event':",
    options=["Event", "RecurringEvent", "Reminder", "Cancel"],
)

The play exits with awaiting_resume. Right sidebar shows play status awaiting_resume. Play bar middle column shifts the pause form to a banner at the top:

┃                              ┃ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ┃                   ┃
┃                              ┃  ⏸  ask_user.choice          ┃                   ┃
┃                              ┃  hero_osis_calendar has      ┃                   ┃
┃                              ┃  these rootobjects matching  ┃                   ┃
┃                              ┃  'event':                    ┃                   ┃
┃                              ┃                              ┃                   ┃
┃                              ┃   ◉ Event                    ┃                   ┃
┃                              ┃   ○ RecurringEvent           ┃                   ┃
┃                              ┃   ○ Reminder                 ┃                   ┃
┃                              ┃   ○ Cancel                   ┃                   ┃
┃                              ┃   [Submit]                   ┃                   ┃
┃                              ┃ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ┃                   ┃
┃ Plays ▾                      ┃                              ┃                   ┃
┃  02j7  awaiting…  0:09       ┃ 09:14:22 fetch_catalog ok …  ┃                   ┃

User picks "Event", submits → play_resume posts the answer → server respawns the subprocess with the answer cached → flow continues from where it paused with fetch_catalog, select_services, compile_stubs replayed (dashed in the flow tree) and service_code_gen executing fresh.

7.5 User clicks service_code_gen node in the flow view (mid-play)

Breadcrumb in left sidebar updates:

↶ service_agent / service_code_gen

The whole logic view rerenders for service_code_gen:

  • Left sidebar: service_code_gen's title, description, inputs (prompt, services), outputs (script), versions.
  • Middle: service_code_gen's flow view, scoped to that part of the current play's span tree — just shows model_call inside attempt 1.
  • Right sidebar: stats for service_code_gen's contribution to this play (its tokens, its duration).
  • Play bar:
    • Left col: service_code_gen's declared inputs, prefilled with the values the parent passed (prompt=..., services=[...]). Examples list is service_code_gen's saved examples. Plays list shows past plays of service_code_gen standalone (when it was invoked as the root logic).
    • Middle col: filtered log feed — only service_code_gen's spans + descendants.
    • Right col: this sub-logic's output for this play.

If the user wants to fork off a standalone play of service_code_gen from here (with the prefilled inputs), they hit ▶ Run — that starts a new top-level play of service_code_gen independent of the parent.

Click "service_agent" in the breadcrumb → zoom back out. Same play, parent view.

7.6 Authoring mid-play

If the user wants to fix something in service_code_gen's source, they toggle Code view in the middle column header. Monaco loads service_code_gen's python_source. Edit. Save → creates a new LogicVersion. The current play is unaffected (it's pinned to the version it started on), but next runs use the new version. Step-memoization cache invalidates globally for that logic (the version_sid is in the step_key).


8. What replaces what

Today Becomes
/workflows (workflow list) dashboard / (logic list)
/workflows/{sid}/edit (editor) /logics/{sid} (logic view)
top toolbar with title + version + run controls + plays dropdown + benchmark widget left sidebar (title, version) + bottom play bar (run + plays + examples) + right sidebar (benchmark/play stats)
right "inputs editor" sidebar (name + type + value mixed) left sidebar (just declared inputs, type, description) + play bar left col (input VALUES + Run)
/plays (plays list page) bottom play bar left column → "Plays ▾"
/plays/{sid} (dedicated detail page) removed; plays render inside the logic view as an overlay
/examples (examples list page) bottom play bar left column → "Examples ▾"

9. Implementation phases

  1. Schema + code renames with aliases. Hard-rename Workflow*Logic* and @flow@logic. Add #[serde(alias = ...)] on every field + every old RPC method name. SDK exports both flow and logic. Old data keeps loading, old python_source keeps importing. (Closes 70% of the work; nothing visible to the user yet.)

  2. Dashboard restructure. / becomes the logic list. Remove /workflows, /examples, /plays. 301 the old URLs to / or /logics/{sid} as appropriate.

  3. Logic view layout rebuild. Implement the three-region body (left info / middle flow-or-code / right stats) + the bottom play bar. Move inputs/examples/plays into the play bar. Move benchmark/play stats into the right sidebar. Remove the top toolbar from the editor.

  4. Flow-view static parse. When idle, render the sub-logic graph from a parse of python_source (find @logic-decorated defs and logic.invoke("...") calls). Click → breadcrumb-navigate.

  5. Breadcrumb navigation + sub-logic view. Clicking a sub-logic node loads that sub-logic's view. When a play is overlaid, the sub-logic's flow view filters the parent play's spans to descendants of the clicked node.

  6. Pause-form prominence. Move pause forms to a banner at the top of the play bar's middle column when awaiting_resume.

  7. Stats sidebar. Conditional rendering — benchmark stats when no play overlay, play stats when one is selected.

  8. Cleanup. Delete play_detail.html, examples.html, plays.html, workflows.html. Drop the old routes from main.rs.

Out of scope (called out for the future): visual flow editor with drag/drop, conditional/loop/parallel visualisation, data-flow lines, two-way graph↔code binding.


10. Acceptance

  • / shows a logic list with name + description + last-run status. Click → logic view.
  • The logic view has the three-region body + bottom play bar described in §2.1.
  • A play started from the bottom play bar's ▶ Run streams spans into the play bar's middle column AND renders in the flow view's middle column simultaneously.
  • A pause from inside the flow shows the pause form as a banner at the top of the play bar's middle column. Submitting it resumes the play.
  • Clicking a sub-logic node in the flow view loads that sub-logic's view with the breadcrumb populated. Clicking the parent in the breadcrumb navigates back.
  • The right sidebar shows benchmark stats when no play is overlaid, and play stats when one is.
  • /workflows/*, /examples, /plays, /plays/{sid} all 301 to the new layout or are removed.
  • Pre-rename data keeps loading (stored Workflow records read fine via serde aliases).
  • Stored python_source with from hero_tracing import flow keeps working (the SDK exports flow as an alias of logic).
# feat: UI + model revamp — "Logic" everywhere, recursive composition, single logic view ## TL;DR Reframe the system around **logics** instead of workflows. A logic is a unit of typed I/O backed by Python code. Every `@logic`-decorated function inside a logic's source is itself a logic — composable all the way down. Sub-logic navigation breadcrumbs into the child's view; back out to the parent. Stop at primitive Python (imported clients, raw lines, built-ins). The UI collapses to **two views only**: a dashboard listing logics, and the logic view itself. Within the logic view: - **Left sidebar** = info: title · description · inputs · outputs · versions · breadcrumbs - **Middle** = flow / code (the visual flow of sub-logics, and Monaco source) - **Right sidebar** = stats: benchmark stats for the current version when idle, play stats when a play is active - **Bottom play bar** = inputs (values) + examples + plays-history (left col) | logs + steps + pause-forms (mid col) | output (right col) + ▶ Run Everything else — separate `/examples`, separate `/plays` list, the dedicated `/plays/{sid}` detail page — goes away. --- ## 1. The new mental model A **Logic** has: - `name` (identifier, unique-ish) - `description` - `inputs: [FlowField]` - `outputs: [FlowField]` - `versions: [LogicVersion]` — each carries its own `python_source` - `current_version_sid` A **LogicVersion**'s `python_source` is the Python code. Every `@logic`-decorated function in that source is a sub-logic. Calls to other top-level logics via `logic.invoke("name", ...)` resolve to other Logic records. Either way, when the runtime executes, every `@logic` call opens a span — same as today's `@flow`. The flow view is the visualisation of those spans (live during a play, or pinned to the latest play when idle). **Stopping rule:** the recursion stops at non-`@logic` code: - imported client method calls (`HeroAibrokerClient().chat(...)`) - standard library functions (`re.sub(...)`, `json.loads(...)`) - inline expressions, loops, conditionals — *the structure* of these is what the flow view eventually visualises, but they aren't themselves clickable sub-logics So "is this a sub-logic?" = "is this a `@logic`-decorated function defined inside the parent's source, or a `logic.invoke("name", ...)` call to a named Logic record?" --- ## 2. Layout ### 2.1 The logic view (single page, fixed regions) ``` ┌──────────────────────────────────────────────────────────────────────┐ │ │ │ LEFT (info) │ MIDDLE (flow / code) │ RIGHT (stats) │ │ │ │ ↶ Breadcrumb │ │ ┌────────────────┐ │ │ parent / sub / │ │ │ Benchmark v3 │ │ │ current │ │ │ success: 75% │ │ │ │ │ │ p50 dur: 7.2s │ │ │ Title │ ▼ Flow view │ │ avg cost: … │ │ │ Description │ fetch_catalog → │ │ runs: 24 │ │ │ │ select_services → │ └────────────────┘ │ │ Inputs │ attempt 1 → │ │ │ prompt: string │ service_code_gen → │ (or, when a play │ │ model: string │ model_call → │ is selected: │ │ │ script_execution → │ stats of THAT │ │ Outputs │ summarize │ play replace the │ │ summary: string │ │ benchmark card) │ │ │ [graph / code toggle] │ │ │ Versions │ │ │ │ v1 v2 v3 (current)│ │ │ │ │ ├──────────────────────────────────────────────────────────────────────┤ │ PLAY BAR (always visible, three columns) │ │ │ │ ◀ Inputs + Examples ┃ Live trace + Pause forms ┃ Output │ │ ┃ ┃ │ │ prompt: [_________] ┃ 09:14:22 fetch_catalog ┃ { │ │ model: [_________] ┃ ok 400ms ┃ "summary": │ │ ┃ 09:14:23 select_serv… ┃ "…" │ │ Examples ▾ ┃ ok 2.6s ┃ } │ │ • Calendar event ┃ 09:14:25 ⏸ ask_user ┃ │ │ • Find a contact ┃ ◉ keep ○ replace ┃ │ │ • Marketplace tokens ┃ ○ cancel ┃ │ │ ┃ [Submit] ┃ │ │ Plays ▾ ┃ ┃ │ │ 02xy success 3m ago ┃ ┃ │ │ 02xx failed 8m ago ┃ ┃ │ │ ┃ ┃ │ │ [▶ Run] ┃ ┃ │ └──────────────────────────────────────────────────────────────────────┘ ``` Regions are persistent. The play bar is always there. The flow column always renders whichever is the current overlay (latest play when first loaded; whichever the user picked from the Plays list inside the play bar; live one when ▶ Run is hit). ### 2.2 Dashboard A single page at `/`. Lists every Logic by name + description + last run status + last benchmark success rate. Click a logic → open `/logics/{sid}`. That's it. ### 2.3 Routes that go away - `/workflows` (list) — superseded by dashboard - `/workflows/{sid}` — renamed to `/logics/{sid}`; old route 302s - `/workflows/new` — replaced by a "+ New logic" button on the dashboard - `/examples` — examples are inline on each logic's play bar; no global page - `/plays` (list) — plays are inline on each logic's play bar - `/plays/{sid}` — was the dedicated detail page from #32; finishes its removal (already a redirect today) - The top toolbar / header on the logic view — run controls move to the play bar --- ## 3. Conceptual rename (data + code) `Workflow` and friends become `Logic`. Done as a **hard rename in code with serde aliases on storage** so existing OTOML records keep loading. | Old | New | Notes | |---|---|---| | `Workflow` rootobject | `Logic` | `#[serde(alias = "Workflow")]` on the type tag if/when OTOML serializes it. Field names like `workflow_sid` → `logic_sid` get `#[serde(alias = "workflow_sid")]` aliases. | | `WorkflowVersion` | `LogicVersion` | Same alias treatment. | | `Workflow.current_version_sid` | unchanged label | The field name "current_version_sid" still makes sense. | | `Play.workflow_sid` / `Play.workflow_version_sid` | `Play.logic_sid` / `Play.logic_version_sid` | Plus `#[serde(alias = "workflow_sid")]`. | | `Example.workflow_sid` / `Example.workflow_version_sid` | `Example.logic_sid` / `Example.logic_version_sid` | Same. | | `Benchmark.workflow_sid` / `Benchmark.workflow_version_sid` | `Benchmark.logic_sid` / `Benchmark.logic_version_sid` | Same. | | `LogicService` (service name) | unchanged | Already says "Logic". | | `LogicService.workflow_*` RPC methods | `LogicService.logic_*` | Old method names registered as aliases at the dispatch layer to keep generated clients working through a deprecation cycle. | | `@flow(...)` decorator | `@logic(...)` | `flow` stays exported from `hero_tracing` as an alias of `logic` for one release. | | `flow.invoke(name, ...)` | `logic.invoke(name, ...)` | Same alias rule. | | `flow.pause(...)` / `ask_user.*` | `logic.pause(...)` / `ask_user.*` | Same. | | `hero_tracing` module name | unchanged for now | The exports rename; the module stays so stored sources keep importing. | The `@flow` → `@logic` rename is purely an alias — both names refer to the same decorator. Stored python_source that says `from hero_tracing import flow` keeps working; new sources say `from hero_tracing import logic`. --- ## 4. The flow view (middle column) ### 4.1 Idle (no overlay) Parse the current version's `python_source` for `@logic`-decorated functions and `logic.invoke(...)` calls. Render a static graph of declared sub-logics in source-order: ``` fetch_catalog → select_services → attempt 1 (loop) → summarize └─ service_code_gen → model_call └─ script_execution ``` Each node is clickable. Click a sub-logic node → breadcrumb-navigate into that sub-logic's view. The breadcrumb lives at the top of the left sidebar: ``` ↶ service_agent / service_code_gen ``` Click "service_agent" in the breadcrumb → back out to the parent. Stop at primitive calls (imported clients, stdlib): these render as leaf nodes with a "primitive" badge and are not clickable. ### 4.2 Active (a play is selected) Render the **actual span tree** of the play. Same node shapes, but now they carry status + duration + the recorded inputs/outputs. Replayed spans dashed; failed spans red; in-progress pulse. Clicking a span node = same drill-in behaviour as idle: breadcrumb into that sub-logic, except now the sub-logic's flow view shows ITS spans inside the parent play (filtered to spans whose path descends from the clicked node). Breadcrumb back out, parent re-renders. ### 4.3 Code view Toggle in the middle column header: `[Graph] [Code] [Split]`. Code is the Monaco editor bound to `LogicVersion.python_source`. Clicking a graph node highlights the corresponding source lines. ### 4.4 Future direction (out of scope, called out) The flow view eventually becomes a **visual code editor**: - Drag `logic.invoke("...")` blocks from a palette of saved logics on the right. - Drag imported client method calls as primitive nodes. - Visualise loops as repeating blocks, conditionals as branch points, `asyncio.gather` as parallel forks. - Show data flow between sub-logics as connecting lines: where does each input come from? - Two-way binding: editing the graph updates the source; editing the source re-renders the graph. Not in this issue's scope — but the data model (`@logic` + named sub-logics + typed I/O) is designed so this is a future-compatible direction. --- ## 5. The play bar (bottom, three columns) Always visible. Heights persist via `localStorage`. The three columns: ### 5.1 Left column — Inputs + Examples + Plays history - **Inputs**: one labeled field per declared input. Type-appropriate widget (text / number / boolean / json textarea). Type comes from `Logic.inputs[i].field_type`. - **Examples ▾**: collapsible list of saved `Example` records for this logic. Click one → populate the input fields. "Save as example" button writes the current values back as a new Example. - **Plays ▾**: collapsible list of the most recent N plays. Click one → load it as the current overlay on the flow view + populate input fields with its `input_data`. Right sidebar switches to that play's stats. - **▶ Run button**: validates inputs against declared types, calls `logic.play_start`, the new play becomes the overlay. ### 5.2 Middle column — Live trace + Pause forms - Live span events scroll here as they arrive (the JSONL feed the SDK emits). - When the active play hits `awaiting_resume`, the pause form is rendered at the **top** of the middle column as a banner (always visible until answered, regardless of how the user has scrolled the log feed below it). - Pause forms render per `ResumeRequest.ui.kind`: text / number / choice / multi_choice / confirm. Submit posts `play_resume`. ### 5.3 Right column — Output - Renders the play's `output_data` as it accumulates. - If `Logic.outputs` is declared, render one labeled card per output field; else render raw JSON. - For a paused play, output is empty (the flow hasn't returned yet). For a successful play, output is final. For a failed play, output is empty + an error banner is shown in the middle column. ### 5.4 Pause UX nit When a play pauses, the play bar visually emphasizes the pause: middle column shifts the pause form to the top + adds an accent border. The user shouldn't have to find the pause form — the play bar makes it the most prominent thing. --- ## 6. The right stats sidebar Two modes, switched by whether a play is currently selected: ### 6.1 No play selected → version benchmark stats Shows the latest `Benchmark` for `Logic.current_version_sid`: - success rate (0-100%) - p50 / p95 duration - avg tokens (prompt + completion) - estimated cost USD - difficulty rating - "Run benchmark" button → opens a small dialog to configure num_runs + which example set, then kicks off a benchmark play set ### 6.2 Play selected → play stats Shows that play's: - status + duration - total tokens (prompt + completion) - estimated cost USD (if computed) - attempts (count, if the flow uses retry loops) - error summary (if failed) - "Cancel" button (if status is running / awaiting_resume) Compact. No tabs. No interaction beyond the cancel button — drill-in lives elsewhere (the flow view middle column, the play bar columns). --- ## 7. Worked example: service_agent This is what the new UI looks like for the existing `service_agent` flow. No code changes to `service_agent.py` itself — just renames + the UI rendering. ### 7.1 Dashboard → click `service_agent` User lands on `/logics/{service_agent_sid}`. ### 7.2 Initial state (no play overlay) **Left sidebar:** ``` ↶ service_agent Title: service_agent Description: Self-contained AI agent: discovers Hero services via hero_router, compiles selected service stubs, generates a Python script, runs it, and summarizes the result. Inputs: prompt: string (required) "user's request in natural language" code_gen_model: string "" "override for the code-gen model" Outputs: summary: string "natural-language reply to the prompt" Versions: v1 v2 v3 (current) ``` **Middle (flow view, idle, parsed from source):** ``` service_agent ├─ fetch_catalog (sub-logic) ├─ select_services (sub-logic) ├─ compile_stubs (sub-logic) ├─ for attempt in range(3): │ ├─ service_code_gen (sub-logic; contains model_call) │ ├─ script_execution (sub-logic) │ └─ debug_feedback (sub-logic, on failure) └─ summarize (sub-logic) ``` Each sub-logic is a clickable node. `for attempt in range(3)` renders as a loop container. The `model_call` inside `service_code_gen` only shows when that node is expanded. **Right sidebar (benchmark stats for v3):** ``` Benchmark — v3 runs: 24 success rate: 75% p50 duration: 6.8s p95 duration: 12.4s avg tokens: 1284 prompt + 482 completion est. cost: $0.018 / run difficulty: 0.42 [Run new benchmark] ``` **Bottom play bar (idle):** ``` ┃ Inputs ┃ ┃ Output ┃ ┃ prompt: [____________] ┃ (no play yet — click ▶) ┃ ┃ ┃ model: [____________] ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ Examples ▾ ┃ ┃ ┃ ┃ • Calendar event ┃ ┃ ┃ ┃ • Find a contact ┃ ┃ ┃ ┃ • List healthy routers ┃ ┃ ┃ ┃ • Marketplace tokens ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ Plays ▾ ┃ ┃ ┃ ┃ 02hq failed 2 hr ago ┃ ┃ ┃ ┃ 02ca success 1 d ago ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ [▶ Run] ┃ ┃ ┃ ``` ### 7.3 User clicks "Calendar event" example, hits ▶ Run Inputs auto-fill: `prompt: "Create a calendar event titled X tomorrow at 10am"`, `model: ""`. Run → `logic.play_start` → new play `02j7` becomes the overlay. **Right sidebar switches to play stats** (status: running, started 0:00). **Middle column (flow view, live):** ``` service_agent [running] 0.4s ├─ fetch_catalog [ok] 423ms ├─ select_services [ok] 2.6s ├─ compile_stubs [ok] 5ms └─ attempt 1 [running] └─ service_code_gen [running] └─ model_call [running] ``` **Play bar middle column** — live log feed: ``` 09:14:22 fetch_catalog ok 423ms 09:14:23 select_services ok 2.6s (picked: hero_osis_calendar) 09:14:25 compile_stubs ok 5ms 09:14:25 attempt 1 09:14:25 service_code_gen running 09:14:26 model_call running ai.chat (groq-strong) ``` ### 7.4 A step pauses with `ask_user.choice(...)` Suppose `select_services` decides the chosen service's rootobjects don't clearly match the prompt and calls: ```python ask_user.choice( "hero_osis_calendar has these rootobjects matching 'event':", options=["Event", "RecurringEvent", "Reminder", "Cancel"], ) ``` The play exits with `awaiting_resume`. Right sidebar shows play status `awaiting_resume`. Play bar middle column shifts the pause form to a banner at the top: ``` ┃ ┃ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ┃ ┃ ┃ ┃ ⏸ ask_user.choice ┃ ┃ ┃ ┃ hero_osis_calendar has ┃ ┃ ┃ ┃ these rootobjects matching ┃ ┃ ┃ ┃ 'event': ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ◉ Event ┃ ┃ ┃ ┃ ○ RecurringEvent ┃ ┃ ┃ ┃ ○ Reminder ┃ ┃ ┃ ┃ ○ Cancel ┃ ┃ ┃ ┃ [Submit] ┃ ┃ ┃ ┃ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ┃ ┃ ┃ Plays ▾ ┃ ┃ ┃ ┃ 02j7 awaiting… 0:09 ┃ 09:14:22 fetch_catalog ok … ┃ ┃ ``` User picks "Event", submits → `play_resume` posts the answer → server respawns the subprocess with the answer cached → flow continues from where it paused with `fetch_catalog`, `select_services`, `compile_stubs` replayed (dashed in the flow tree) and `service_code_gen` executing fresh. ### 7.5 User clicks `service_code_gen` node in the flow view (mid-play) Breadcrumb in left sidebar updates: ``` ↶ service_agent / service_code_gen ``` The whole logic view rerenders for `service_code_gen`: - Left sidebar: `service_code_gen`'s title, description, inputs (prompt, services), outputs (script), versions. - Middle: `service_code_gen`'s flow view, scoped to that part of the current play's span tree — just shows `model_call` inside `attempt 1`. - Right sidebar: stats for `service_code_gen`'s contribution to this play (its tokens, its duration). - Play bar: - Left col: `service_code_gen`'s declared inputs, prefilled with the values the parent passed (`prompt=...`, `services=[...]`). Examples list is `service_code_gen`'s saved examples. Plays list shows past plays of `service_code_gen` standalone (when it was invoked as the root logic). - Middle col: filtered log feed — only `service_code_gen`'s spans + descendants. - Right col: this sub-logic's output for this play. If the user wants to fork off a standalone play of `service_code_gen` from here (with the prefilled inputs), they hit ▶ Run — that starts a new top-level play of `service_code_gen` independent of the parent. Click "service_agent" in the breadcrumb → zoom back out. Same play, parent view. ### 7.6 Authoring mid-play If the user wants to fix something in `service_code_gen`'s source, they toggle Code view in the middle column header. Monaco loads `service_code_gen`'s python_source. Edit. Save → creates a new LogicVersion. The current play is unaffected (it's pinned to the version it started on), but next runs use the new version. Step-memoization cache invalidates globally for that logic (the version_sid is in the step_key). --- ## 8. What replaces what | Today | Becomes | |---|---| | `/workflows` (workflow list) | dashboard `/` (logic list) | | `/workflows/{sid}/edit` (editor) | `/logics/{sid}` (logic view) | | top toolbar with title + version + run controls + plays dropdown + benchmark widget | left sidebar (title, version) + bottom play bar (run + plays + examples) + right sidebar (benchmark/play stats) | | right "inputs editor" sidebar (name + type + value mixed) | left sidebar (just declared inputs, type, description) + play bar left col (input VALUES + Run) | | `/plays` (plays list page) | bottom play bar left column → "Plays ▾" | | `/plays/{sid}` (dedicated detail page) | removed; plays render inside the logic view as an overlay | | `/examples` (examples list page) | bottom play bar left column → "Examples ▾" | --- ## 9. Implementation phases 1. **Schema + code renames with aliases.** Hard-rename `Workflow*` → `Logic*` and `@flow` → `@logic`. Add `#[serde(alias = ...)]` on every field + every old RPC method name. SDK exports both `flow` and `logic`. Old data keeps loading, old python_source keeps importing. (Closes 70% of the work; nothing visible to the user yet.) 2. **Dashboard restructure.** `/` becomes the logic list. Remove `/workflows`, `/examples`, `/plays`. 301 the old URLs to `/` or `/logics/{sid}` as appropriate. 3. **Logic view layout rebuild.** Implement the three-region body (left info / middle flow-or-code / right stats) + the bottom play bar. Move inputs/examples/plays into the play bar. Move benchmark/play stats into the right sidebar. Remove the top toolbar from the editor. 4. **Flow-view static parse.** When idle, render the sub-logic graph from a parse of `python_source` (find `@logic`-decorated `def`s and `logic.invoke("...")` calls). Click → breadcrumb-navigate. 5. **Breadcrumb navigation + sub-logic view.** Clicking a sub-logic node loads that sub-logic's view. When a play is overlaid, the sub-logic's flow view filters the parent play's spans to descendants of the clicked node. 6. **Pause-form prominence.** Move pause forms to a banner at the top of the play bar's middle column when `awaiting_resume`. 7. **Stats sidebar.** Conditional rendering — benchmark stats when no play overlay, play stats when one is selected. 8. **Cleanup.** Delete `play_detail.html`, `examples.html`, `plays.html`, `workflows.html`. Drop the old routes from `main.rs`. Out of scope (called out for the future): visual flow editor with drag/drop, conditional/loop/parallel visualisation, data-flow lines, two-way graph↔code binding. --- ## 10. Acceptance - `/` shows a logic list with name + description + last-run status. Click → logic view. - The logic view has the three-region body + bottom play bar described in §2.1. - A play started from the bottom play bar's ▶ Run streams spans into the play bar's middle column AND renders in the flow view's middle column simultaneously. - A pause from inside the flow shows the pause form as a banner at the top of the play bar's middle column. Submitting it resumes the play. - Clicking a sub-logic node in the flow view loads that sub-logic's view with the breadcrumb populated. Clicking the parent in the breadcrumb navigates back. - The right sidebar shows benchmark stats when no play is overlaid, and play stats when one is. - `/workflows/*`, `/examples`, `/plays`, `/plays/{sid}` all 301 to the new layout or are removed. - Pre-rename data keeps loading (stored `Workflow` records read fine via serde aliases). - Stored python_source with `from hero_tracing import flow` keeps working (the SDK exports `flow` as an alias of `logic`).
Author
Owner

Superseded by #39 — the discussion in the chat shrunk the design further (no spans, no instrument(), no inline-vs-named distinction, no Benchmark rootobject; every function is a Logic, every invocation is a Play, plays form a tree).

Superseded by #39 — the discussion in the chat shrunk the design further (no spans, no instrument(), no inline-vs-named distinction, no Benchmark rootobject; every function is a Logic, every invocation is a Play, plays form a tree).
timur closed this issue 2026-05-14 13:10:52 +00:00
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_logic#38
No description provided.