lhumina_code/hero_logic

Fork 0

feat: UI + model revamp — Logic everywhere, recursive composition, single logic view #38

New issue

Closed

opened 2026-05-14 12:23:23 +00:00 by timur · 1 comment

timur commented

2026-05-14 12:23:23 +00:00

Owner

feat: UI + model revamp — "Logic" everywhere, recursive composition, single logic view

TL;DR

Reframe the system around logics instead of workflows. A logic is a unit of typed I/O backed by Python code. Every @logic-decorated function inside a logic's source is itself a logic — composable all the way down. Sub-logic navigation breadcrumbs into the child's view; back out to the parent. Stop at primitive Python (imported clients, raw lines, built-ins).

The UI collapses to two views only: a dashboard listing logics, and the logic view itself. Within the logic view:

Left sidebar = info: title · description · inputs · outputs · versions · breadcrumbs
Middle = flow / code (the visual flow of sub-logics, and Monaco source)
Right sidebar = stats: benchmark stats for the current version when idle, play stats when a play is active
Bottom play bar = inputs (values) + examples + plays-history (left col) | logs + steps + pause-forms (mid col) | output (right col) + ▶ Run

Everything else — separate /examples, separate /plays list, the dedicated /plays/{sid} detail page — goes away.

1. The new mental model

A Logic has:

name (identifier, unique-ish)
description
inputs: [FlowField]
outputs: [FlowField]
versions: [LogicVersion] — each carries its own python_source
current_version_sid

A LogicVersion's python_source is the Python code. Every @logic-decorated function in that source is a sub-logic. Calls to other top-level logics via logic.invoke("name", ...) resolve to other Logic records. Either way, when the runtime executes, every @logic call opens a span — same as today's @flow. The flow view is the visualisation of those spans (live during a play, or pinned to the latest play when idle).

Stopping rule: the recursion stops at non-@logic code:

imported client method calls (HeroAibrokerClient().chat(...))
standard library functions (re.sub(...), json.loads(...))
inline expressions, loops, conditionals — the structure of these is what the flow view eventually visualises, but they aren't themselves clickable sub-logics

So "is this a sub-logic?" = "is this a @logic-decorated function defined inside the parent's source, or a logic.invoke("name", ...) call to a named Logic record?"

2. Layout

2.1 The logic view (single page, fixed regions)

┌──────────────────────────────────────────────────────────────────────┐
│                                                                      │
│  LEFT (info)        │   MIDDLE (flow / code)   │  RIGHT (stats)      │
│                                                                      │
│  ↶ Breadcrumb       │                          │  ┌────────────────┐ │
│   parent / sub /    │                          │  │ Benchmark v3   │ │
│   current           │                          │  │ success: 75%   │ │
│                     │                          │  │ p50 dur: 7.2s  │ │
│  Title              │   ▼ Flow view            │  │ avg cost: …    │ │
│  Description        │     fetch_catalog →      │  │ runs: 24       │ │
│                     │     select_services →    │  └────────────────┘ │
│  Inputs             │     attempt 1 →          │                     │
│   prompt: string    │       service_code_gen → │  (or, when a play   │
│   model: string     │         model_call →     │   is selected:      │
│                     │       script_execution → │   stats of THAT     │
│  Outputs            │     summarize            │   play replace the  │
│   summary: string   │                          │   benchmark card)   │
│                     │   [graph / code toggle]  │                     │
│  Versions           │                          │                     │
│   v1 v2 v3 (current)│                          │                     │
│                                                                      │
├──────────────────────────────────────────────────────────────────────┤
│  PLAY BAR (always visible, three columns)                            │
│                                                                      │
│  ◀ Inputs + Examples     ┃ Live trace + Pause forms ┃ Output         │
│                          ┃                          ┃                │
│  prompt:    [_________]  ┃ 09:14:22  fetch_catalog  ┃ {              │
│  model:     [_________]  ┃           ok      400ms  ┃   "summary":   │
│                          ┃ 09:14:23  select_serv…   ┃     "…"        │
│  Examples ▾              ┃           ok    2.6s     ┃ }              │
│   • Calendar event       ┃ 09:14:25  ⏸ ask_user    ┃                │
│   • Find a contact       ┃   ◉ keep   ○ replace    ┃                │
│   • Marketplace tokens   ┃   ○ cancel               ┃                │
│                          ┃   [Submit]               ┃                │
│  Plays ▾                 ┃                          ┃                │
│   02xy success  3m ago   ┃                          ┃                │
│   02xx failed   8m ago   ┃                          ┃                │
│                          ┃                          ┃                │
│  [▶ Run]                 ┃                          ┃                │
└──────────────────────────────────────────────────────────────────────┘

Regions are persistent. The play bar is always there. The flow column always renders whichever is the current overlay (latest play when first loaded; whichever the user picked from the Plays list inside the play bar; live one when ▶ Run is hit).

2.2 Dashboard

A single page at /. Lists every Logic by name + description + last run status + last benchmark success rate. Click a logic → open /logics/{sid}. That's it.

2.3 Routes that go away

/workflows (list) — superseded by dashboard
/workflows/{sid} — renamed to /logics/{sid}; old route 302s
/workflows/new — replaced by a "+ New logic" button on the dashboard
/examples — examples are inline on each logic's play bar; no global page
/plays (list) — plays are inline on each logic's play bar
/plays/{sid} — was the dedicated detail page from #32; finishes its removal (already a redirect today)
The top toolbar / header on the logic view — run controls move to the play bar

3. Conceptual rename (data + code)

Workflow and friends become Logic. Done as a hard rename in code with serde aliases on storage so existing OTOML records keep loading.

Old	New	Notes
`Workflow` rootobject	`Logic`	`#[serde(alias = "Workflow")]` on the type tag if/when OTOML serializes it. Field names like `workflow_sid` → `logic_sid` get `#[serde(alias = "workflow_sid")]` aliases.
`WorkflowVersion`	`LogicVersion`	Same alias treatment.
`Workflow.current_version_sid`	unchanged label	The field name "current_version_sid" still makes sense.
`Play.workflow_sid` / `Play.workflow_version_sid`	`Play.logic_sid` / `Play.logic_version_sid`	Plus `#[serde(alias = "workflow_sid")]`.
`Example.workflow_sid` / `Example.workflow_version_sid`	`Example.logic_sid` / `Example.logic_version_sid`	Same.
`Benchmark.workflow_sid` / `Benchmark.workflow_version_sid`	`Benchmark.logic_sid` / `Benchmark.logic_version_sid`	Same.
`LogicService` (service name)	unchanged	Already says "Logic".
`LogicService.workflow_*` RPC methods	`LogicService.logic_*`	Old method names registered as aliases at the dispatch layer to keep generated clients working through a deprecation cycle.
`@flow(...)` decorator	`@logic(...)`	`flow` stays exported from `hero_tracing` as an alias of `logic` for one release.
`flow.invoke(name, ...)`	`logic.invoke(name, ...)`	Same alias rule.
`flow.pause(...)` / `ask_user.*`	`logic.pause(...)` / `ask_user.*`	Same.
`hero_tracing` module name	unchanged for now	The exports rename; the module stays so stored sources keep importing.

The @flow → @logic rename is purely an alias — both names refer to the same decorator. Stored python_source that says from hero_tracing import flow keeps working; new sources say from hero_tracing import logic.

4. The flow view (middle column)

4.1 Idle (no overlay)

Parse the current version's python_source for @logic-decorated functions and logic.invoke(...) calls. Render a static graph of declared sub-logics in source-order:

fetch_catalog → select_services → attempt 1 (loop) → summarize
                                  └─ service_code_gen → model_call
                                  └─ script_execution

Each node is clickable. Click a sub-logic node → breadcrumb-navigate into that sub-logic's view. The breadcrumb lives at the top of the left sidebar:

↶ service_agent / service_code_gen

Click "service_agent" in the breadcrumb → back out to the parent. Stop at primitive calls (imported clients, stdlib): these render as leaf nodes with a "primitive" badge and are not clickable.

4.2 Active (a play is selected)

Render the actual span tree of the play. Same node shapes, but now they carry status + duration + the recorded inputs/outputs. Replayed spans dashed; failed spans red; in-progress pulse. Clicking a span node = same drill-in behaviour as idle: breadcrumb into that sub-logic, except now the sub-logic's flow view shows ITS spans inside the parent play (filtered to spans whose path descends from the clicked node). Breadcrumb back out, parent re-renders.

4.3 Code view

Toggle in the middle column header: [Graph] [Code] [Split]. Code is the Monaco editor bound to LogicVersion.python_source. Clicking a graph node highlights the corresponding source lines.

4.4 Future direction (out of scope, called out)

The flow view eventually becomes a visual code editor:

Drag logic.invoke("...") blocks from a palette of saved logics on the right.
Drag imported client method calls as primitive nodes.
Visualise loops as repeating blocks, conditionals as branch points, asyncio.gather as parallel forks.
Show data flow between sub-logics as connecting lines: where does each input come from?
Two-way binding: editing the graph updates the source; editing the source re-renders the graph.

Not in this issue's scope — but the data model (@logic + named sub-logics + typed I/O) is designed so this is a future-compatible direction.

5. The play bar (bottom, three columns)

Always visible. Heights persist via localStorage. The three columns:

5.1 Left column — Inputs + Examples + Plays history

Inputs: one labeled field per declared input. Type-appropriate widget (text / number / boolean / json textarea). Type comes from Logic.inputs[i].field_type.
Examples ▾: collapsible list of saved Example records for this logic. Click one → populate the input fields. "Save as example" button writes the current values back as a new Example.
Plays ▾: collapsible list of the most recent N plays. Click one → load it as the current overlay on the flow view + populate input fields with its input_data. Right sidebar switches to that play's stats.
▶ Run button: validates inputs against declared types, calls logic.play_start, the new play becomes the overlay.

5.2 Middle column — Live trace + Pause forms

Live span events scroll here as they arrive (the JSONL feed the SDK emits).
When the active play hits awaiting_resume, the pause form is rendered at the top of the middle column as a banner (always visible until answered, regardless of how the user has scrolled the log feed below it).
Pause forms render per ResumeRequest.ui.kind: text / number / choice / multi_choice / confirm. Submit posts play_resume.

5.3 Right column — Output

Renders the play's output_data as it accumulates.
If Logic.outputs is declared, render one labeled card per output field; else render raw JSON.
For a paused play, output is empty (the flow hasn't returned yet). For a successful play, output is final. For a failed play, output is empty + an error banner is shown in the middle column.

5.4 Pause UX nit

When a play pauses, the play bar visually emphasizes the pause: middle column shifts the pause form to the top + adds an accent border. The user shouldn't have to find the pause form — the play bar makes it the most prominent thing.

6. The right stats sidebar

Two modes, switched by whether a play is currently selected:

6.1 No play selected → version benchmark stats

Shows the latest Benchmark for Logic.current_version_sid:

success rate (0-100%)
p50 / p95 duration
avg tokens (prompt + completion)
estimated cost USD
difficulty rating
"Run benchmark" button → opens a small dialog to configure num_runs + which example set, then kicks off a benchmark play set

6.2 Play selected → play stats

Shows that play's:

status + duration
total tokens (prompt + completion)
estimated cost USD (if computed)
attempts (count, if the flow uses retry loops)
error summary (if failed)
"Cancel" button (if status is running / awaiting_resume)

Compact. No tabs. No interaction beyond the cancel button — drill-in lives elsewhere (the flow view middle column, the play bar columns).

7. Worked example: service_agent

This is what the new UI looks like for the existing service_agent flow. No code changes to service_agent.py itself — just renames + the UI rendering.

7.1 Dashboard → click `service_agent`

User lands on /logics/{service_agent_sid}.

7.2 Initial state (no play overlay)

Left sidebar:

↶ service_agent

Title:        service_agent
Description:  Self-contained AI agent: discovers Hero services
              via hero_router, compiles selected service stubs,
              generates a Python script, runs it, and summarizes
              the result.

Inputs:
  prompt:           string  (required)  "user's request in natural language"
  code_gen_model:   string  ""          "override for the code-gen model"

Outputs:
  summary:          string              "natural-language reply to the prompt"

Versions: v1  v2  v3 (current)

Middle (flow view, idle, parsed from source):

service_agent
  ├─ fetch_catalog          (sub-logic)
  ├─ select_services        (sub-logic)
  ├─ compile_stubs          (sub-logic)
  ├─ for attempt in range(3):
  │    ├─ service_code_gen  (sub-logic; contains model_call)
  │    ├─ script_execution  (sub-logic)
  │    └─ debug_feedback    (sub-logic, on failure)
  └─ summarize              (sub-logic)

Each sub-logic is a clickable node. for attempt in range(3) renders as a loop container. The model_call inside service_code_gen only shows when that node is expanded.

Right sidebar (benchmark stats for v3):

Benchmark — v3
  runs:          24
  success rate:  75%
  p50 duration:  6.8s
  p95 duration:  12.4s
  avg tokens:    1284 prompt + 482 completion
  est. cost:     $0.018 / run
  difficulty:    0.42

  [Run new benchmark]

Bottom play bar (idle):

┃ Inputs                       ┃                              ┃ Output            ┃
┃ prompt:    [____________]    ┃   (no play yet — click ▶)    ┃                   ┃
┃ model:     [____________]    ┃                              ┃                   ┃
┃                              ┃                              ┃                   ┃
┃ Examples ▾                   ┃                              ┃                   ┃
┃  • Calendar event            ┃                              ┃                   ┃
┃  • Find a contact            ┃                              ┃                   ┃
┃  • List healthy routers      ┃                              ┃                   ┃
┃  • Marketplace tokens        ┃                              ┃                   ┃
┃                              ┃                              ┃                   ┃
┃ Plays ▾                      ┃                              ┃                   ┃
┃  02hq  failed     2 hr ago   ┃                              ┃                   ┃
┃  02ca  success    1 d ago    ┃                              ┃                   ┃
┃                              ┃                              ┃                   ┃
┃ [▶ Run]                      ┃                              ┃                   ┃

7.3 User clicks "Calendar event" example, hits ▶ Run

Inputs auto-fill: prompt: "Create a calendar event titled X tomorrow at 10am", model: "". Run → logic.play_start → new play 02j7 becomes the overlay.

Right sidebar switches to play stats (status: running, started 0:00).

Middle column (flow view, live):

service_agent                    [running]   0.4s
  ├─ fetch_catalog               [ok]       423ms
  ├─ select_services             [ok]       2.6s
  ├─ compile_stubs               [ok]         5ms
  └─ attempt 1                   [running]
       └─ service_code_gen       [running]
            └─ model_call        [running]

Play bar middle column — live log feed:

09:14:22  fetch_catalog              ok      423ms
09:14:23  select_services            ok      2.6s    (picked: hero_osis_calendar)
09:14:25  compile_stubs              ok        5ms
09:14:25  attempt 1
09:14:25  service_code_gen           running
09:14:26    model_call               running ai.chat (groq-strong)

7.4 A step pauses with `ask_user.choice(...)`

Suppose select_services decides the chosen service's rootobjects don't clearly match the prompt and calls:

ask_user.choice(
    "hero_osis_calendar has these rootobjects matching 'event':",
    options=["Event", "RecurringEvent", "Reminder", "Cancel"],
)

The play exits with awaiting_resume. Right sidebar shows play status awaiting_resume. Play bar middle column shifts the pause form to a banner at the top:

┃                              ┃ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ┃                   ┃
┃                              ┃  ⏸  ask_user.choice          ┃                   ┃
┃                              ┃  hero_osis_calendar has      ┃                   ┃
┃                              ┃  these rootobjects matching  ┃                   ┃
┃                              ┃  'event':                    ┃                   ┃
┃                              ┃                              ┃                   ┃
┃                              ┃   ◉ Event                    ┃                   ┃
┃                              ┃   ○ RecurringEvent           ┃                   ┃
┃                              ┃   ○ Reminder                 ┃                   ┃
┃                              ┃   ○ Cancel                   ┃                   ┃
┃                              ┃   [Submit]                   ┃                   ┃
┃                              ┃ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ┃                   ┃
┃ Plays ▾                      ┃                              ┃                   ┃
┃  02j7  awaiting…  0:09       ┃ 09:14:22 fetch_catalog ok …  ┃                   ┃

User picks "Event", submits → play_resume posts the answer → server respawns the subprocess with the answer cached → flow continues from where it paused with fetch_catalog, select_services, compile_stubs replayed (dashed in the flow tree) and service_code_gen executing fresh.

7.5 User clicks `service_code_gen` node in the flow view (mid-play)

Breadcrumb in left sidebar updates:

↶ service_agent / service_code_gen

The whole logic view rerenders for service_code_gen:

Left sidebar: service_code_gen's title, description, inputs (prompt, services), outputs (script), versions.
Middle: service_code_gen's flow view, scoped to that part of the current play's span tree — just shows model_call inside attempt 1.
Right sidebar: stats for service_code_gen's contribution to this play (its tokens, its duration).
Play bar:
- Left col: service_code_gen's declared inputs, prefilled with the values the parent passed (prompt=..., services=[...]). Examples list is service_code_gen's saved examples. Plays list shows past plays of service_code_gen standalone (when it was invoked as the root logic).
- Middle col: filtered log feed — only service_code_gen's spans + descendants.
- Right col: this sub-logic's output for this play.

If the user wants to fork off a standalone play of service_code_gen from here (with the prefilled inputs), they hit ▶ Run — that starts a new top-level play of service_code_gen independent of the parent.

Click "service_agent" in the breadcrumb → zoom back out. Same play, parent view.

7.6 Authoring mid-play

If the user wants to fix something in service_code_gen's source, they toggle Code view in the middle column header. Monaco loads service_code_gen's python_source. Edit. Save → creates a new LogicVersion. The current play is unaffected (it's pinned to the version it started on), but next runs use the new version. Step-memoization cache invalidates globally for that logic (the version_sid is in the step_key).

8. What replaces what

Today	Becomes
`/workflows` (workflow list)	dashboard `/` (logic list)
`/workflows/{sid}/edit` (editor)	`/logics/{sid}` (logic view)
top toolbar with title + version + run controls + plays dropdown + benchmark widget	left sidebar (title, version) + bottom play bar (run + plays + examples) + right sidebar (benchmark/play stats)
right "inputs editor" sidebar (name + type + value mixed)	left sidebar (just declared inputs, type, description) + play bar left col (input VALUES + Run)
`/plays` (plays list page)	bottom play bar left column → "Plays ▾"
`/plays/{sid}` (dedicated detail page)	removed; plays render inside the logic view as an overlay
`/examples` (examples list page)	bottom play bar left column → "Examples ▾"

9. Implementation phases

Schema + code renames with aliases. Hard-rename Workflow* → Logic* and @flow → @logic. Add #[serde(alias = ...)] on every field + every old RPC method name. SDK exports both flow and logic. Old data keeps loading, old python_source keeps importing. (Closes 70% of the work; nothing visible to the user yet.)
Dashboard restructure. / becomes the logic list. Remove /workflows, /examples, /plays. 301 the old URLs to / or /logics/{sid} as appropriate.
Logic view layout rebuild. Implement the three-region body (left info / middle flow-or-code / right stats) + the bottom play bar. Move inputs/examples/plays into the play bar. Move benchmark/play stats into the right sidebar. Remove the top toolbar from the editor.
Flow-view static parse. When idle, render the sub-logic graph from a parse of python_source (find @logic-decorated defs and logic.invoke("...") calls). Click → breadcrumb-navigate.
Breadcrumb navigation + sub-logic view. Clicking a sub-logic node loads that sub-logic's view. When a play is overlaid, the sub-logic's flow view filters the parent play's spans to descendants of the clicked node.
Pause-form prominence. Move pause forms to a banner at the top of the play bar's middle column when awaiting_resume.
Stats sidebar. Conditional rendering — benchmark stats when no play overlay, play stats when one is selected.
Cleanup. Delete play_detail.html, examples.html, plays.html, workflows.html. Drop the old routes from main.rs.

Out of scope (called out for the future): visual flow editor with drag/drop, conditional/loop/parallel visualisation, data-flow lines, two-way graph↔code binding.

10. Acceptance

/ shows a logic list with name + description + last-run status. Click → logic view.
The logic view has the three-region body + bottom play bar described in §2.1.
A play started from the bottom play bar's ▶ Run streams spans into the play bar's middle column AND renders in the flow view's middle column simultaneously.
A pause from inside the flow shows the pause form as a banner at the top of the play bar's middle column. Submitting it resumes the play.
Clicking a sub-logic node in the flow view loads that sub-logic's view with the breadcrumb populated. Clicking the parent in the breadcrumb navigates back.
The right sidebar shows benchmark stats when no play is overlaid, and play stats when one is.
/workflows/*, /examples, /plays, /plays/{sid} all 301 to the new layout or are removed.
Pre-rename data keeps loading (stored Workflow records read fine via serde aliases).
Stored python_source with from hero_tracing import flow keeps working (the SDK exports flow as an alias of logic).

# feat: UI + model revamp — "Logic" everywhere, recursive composition, single logic view ## TL;DR Reframe the system around **logics** instead of workflows. A logic is a unit of typed I/O backed by Python code. Every `@logic`-decorated function inside a logic's source is itself a logic — composable all the way down. Sub-logic navigation breadcrumbs into the child's view; back out to the parent. Stop at primitive Python (imported clients, raw lines, built-ins). The UI collapses to **two views only**: a dashboard listing logics, and the logic view itself. Within the logic view: - **Left sidebar** = info: title · description · inputs · outputs · versions · breadcrumbs - **Middle** = flow / code (the visual flow of sub-logics, and Monaco source) - **Right sidebar** = stats: benchmark stats for the current version when idle, play stats when a play is active - **Bottom play bar** = inputs (values) + examples + plays-history (left col) | logs + steps + pause-forms (mid col) | output (right col) + ▶ Run Everything else — separate `/examples`, separate `/plays` list, the dedicated `/plays/{sid}` detail page — goes away. --- ## 1. The new mental model A **Logic** has: - `name` (identifier, unique-ish) - `description` - `inputs: [FlowField]` - `outputs: [FlowField]` - `versions: [LogicVersion]` — each carries its own `python_source` - `current_version_sid` A **LogicVersion**'s `python_source` is the Python code. Every `@logic`-decorated function in that source is a sub-logic. Calls to other top-level logics via `logic.invoke("name", ...)` resolve to other Logic records. Either way, when the runtime executes, every `@logic` call opens a span — same as today's `@flow`. The flow view is the visualisation of those spans (live during a play, or pinned to the latest play when idle). **Stopping rule:** the recursion stops at non-`@logic` code: - imported client method calls (`HeroAibrokerClient().chat(...)`) - standard library functions (`re.sub(...)`, `json.loads(...)`) - inline expressions, loops, conditionals — *the structure* of these is what the flow view eventually visualises, but they aren't themselves clickable sub-logics So "is this a sub-logic?" = "is this a `@logic`-decorated function defined inside the parent's source, or a `logic.invoke("name", ...)` call to a named Logic record?" --- ## 2. Layout ### 2.1 The logic view (single page, fixed regions) ``` ┌──────────────────────────────────────────────────────────────────────┐ │ │ │ LEFT (info) │ MIDDLE (flow / code) │ RIGHT (stats) │ │ │ │ ↶ Breadcrumb │ │ ┌────────────────┐ │ │ parent / sub / │ │ │ Benchmark v3 │ │ │ current │ │ │ success: 75% │ │ │ │ │ │ p50 dur: 7.2s │ │ │ Title │ ▼ Flow view │ │ avg cost: … │ │ │ Description │ fetch_catalog → │ │ runs: 24 │ │ │ │ select_services → │ └────────────────┘ │ │ Inputs │ attempt 1 → │ │ │ prompt: string │ service_code_gen → │ (or, when a play │ │ model: string │ model_call → │ is selected: │ │ │ script_execution → │ stats of THAT │ │ Outputs │ summarize │ play replace the │ │ summary: string │ │ benchmark card) │ │ │ [graph / code toggle] │ │ │ Versions │ │ │ │ v1 v2 v3 (current)│ │ │ │ │ ├──────────────────────────────────────────────────────────────────────┤ │ PLAY BAR (always visible, three columns) │ │ │ │ ◀ Inputs + Examples ┃ Live trace + Pause forms ┃ Output │ │ ┃ ┃ │ │ prompt: [_________] ┃ 09:14:22 fetch_catalog ┃ { │ │ model: [_________] ┃ ok 400ms ┃ "summary": │ │ ┃ 09:14:23 select_serv… ┃ "…" │ │ Examples ▾ ┃ ok 2.6s ┃ } │ │ • Calendar event ┃ 09:14:25 ⏸ ask_user ┃ │ │ • Find a contact ┃ ◉ keep ○ replace ┃ │ │ • Marketplace tokens ┃ ○ cancel ┃ │ │ ┃ [Submit] ┃ │ │ Plays ▾ ┃ ┃ │ │ 02xy success 3m ago ┃ ┃ │ │ 02xx failed 8m ago ┃ ┃ │ │ ┃ ┃ │ │ [▶ Run] ┃ ┃ │ └──────────────────────────────────────────────────────────────────────┘ ``` Regions are persistent. The play bar is always there. The flow column always renders whichever is the current overlay (latest play when first loaded; whichever the user picked from the Plays list inside the play bar; live one when ▶ Run is hit). ### 2.2 Dashboard A single page at `/`. Lists every Logic by name + description + last run status + last benchmark success rate. Click a logic → open `/logics/{sid}`. That's it. ### 2.3 Routes that go away - `/workflows` (list) — superseded by dashboard - `/workflows/{sid}` — renamed to `/logics/{sid}`; old route 302s - `/workflows/new` — replaced by a "+ New logic" button on the dashboard - `/examples` — examples are inline on each logic's play bar; no global page - `/plays` (list) — plays are inline on each logic's play bar - `/plays/{sid}` — was the dedicated detail page from #32; finishes its removal (already a redirect today) - The top toolbar / header on the logic view — run controls move to the play bar --- ## 3. Conceptual rename (data + code) `Workflow` and friends become `Logic`. Done as a **hard rename in code with serde aliases on storage** so existing OTOML records keep loading. | Old | New | Notes | |---|---|---| | `Workflow` rootobject | `Logic` | `#[serde(alias = "Workflow")]` on the type tag if/when OTOML serializes it. Field names like `workflow_sid` → `logic_sid` get `#[serde(alias = "workflow_sid")]` aliases. | | `WorkflowVersion` | `LogicVersion` | Same alias treatment. | | `Workflow.current_version_sid` | unchanged label | The field name "current_version_sid" still makes sense. | | `Play.workflow_sid` / `Play.workflow_version_sid` | `Play.logic_sid` / `Play.logic_version_sid` | Plus `#[serde(alias = "workflow_sid")]`. | | `Example.workflow_sid` / `Example.workflow_version_sid` | `Example.logic_sid` / `Example.logic_version_sid` | Same. | | `Benchmark.workflow_sid` / `Benchmark.workflow_version_sid` | `Benchmark.logic_sid` / `Benchmark.logic_version_sid` | Same. | | `LogicService` (service name) | unchanged | Already says "Logic". | | `LogicService.workflow_*` RPC methods | `LogicService.logic_*` | Old method names registered as aliases at the dispatch layer to keep generated clients working through a deprecation cycle. | | `@flow(...)` decorator | `@logic(...)` | `flow` stays exported from `hero_tracing` as an alias of `logic` for one release. | | `flow.invoke(name, ...)` | `logic.invoke(name, ...)` | Same alias rule. | | `flow.pause(...)` / `ask_user.*` | `logic.pause(...)` / `ask_user.*` | Same. | | `hero_tracing` module name | unchanged for now | The exports rename; the module stays so stored sources keep importing. | The `@flow` → `@logic` rename is purely an alias — both names refer to the same decorator. Stored python_source that says `from hero_tracing import flow` keeps working; new sources say `from hero_tracing import logic`. --- ## 4. The flow view (middle column) ### 4.1 Idle (no overlay) Parse the current version's `python_source` for `@logic`-decorated functions and `logic.invoke(...)` calls. Render a static graph of declared sub-logics in source-order: ``` fetch_catalog → select_services → attempt 1 (loop) → summarize └─ service_code_gen → model_call └─ script_execution ``` Each node is clickable. Click a sub-logic node → breadcrumb-navigate into that sub-logic's view. The breadcrumb lives at the top of the left sidebar: ``` ↶ service_agent / service_code_gen ``` Click "service_agent" in the breadcrumb → back out to the parent. Stop at primitive calls (imported clients, stdlib): these render as leaf nodes with a "primitive" badge and are not clickable. ### 4.2 Active (a play is selected) Render the **actual span tree** of the play. Same node shapes, but now they carry status + duration + the recorded inputs/outputs. Replayed spans dashed; failed spans red; in-progress pulse. Clicking a span node = same drill-in behaviour as idle: breadcrumb into that sub-logic, except now the sub-logic's flow view shows ITS spans inside the parent play (filtered to spans whose path descends from the clicked node). Breadcrumb back out, parent re-renders. ### 4.3 Code view Toggle in the middle column header: `[Graph] [Code] [Split]`. Code is the Monaco editor bound to `LogicVersion.python_source`. Clicking a graph node highlights the corresponding source lines. ### 4.4 Future direction (out of scope, called out) The flow view eventually becomes a **visual code editor**: - Drag `logic.invoke("...")` blocks from a palette of saved logics on the right. - Drag imported client method calls as primitive nodes. - Visualise loops as repeating blocks, conditionals as branch points, `asyncio.gather` as parallel forks. - Show data flow between sub-logics as connecting lines: where does each input come from? - Two-way binding: editing the graph updates the source; editing the source re-renders the graph. Not in this issue's scope — but the data model (`@logic` + named sub-logics + typed I/O) is designed so this is a future-compatible direction. --- ## 5. The play bar (bottom, three columns) Always visible. Heights persist via `localStorage`. The three columns: ### 5.1 Left column — Inputs + Examples + Plays history - **Inputs**: one labeled field per declared input. Type-appropriate widget (text / number / boolean / json textarea). Type comes from `Logic.inputs[i].field_type`. - **Examples ▾**: collapsible list of saved `Example` records for this logic. Click one → populate the input fields. "Save as example" button writes the current values back as a new Example. - **Plays ▾**: collapsible list of the most recent N plays. Click one → load it as the current overlay on the flow view + populate input fields with its `input_data`. Right sidebar switches to that play's stats. - **▶ Run button**: validates inputs against declared types, calls `logic.play_start`, the new play becomes the overlay. ### 5.2 Middle column — Live trace + Pause forms - Live span events scroll here as they arrive (the JSONL feed the SDK emits). - When the active play hits `awaiting_resume`, the pause form is rendered at the **top** of the middle column as a banner (always visible until answered, regardless of how the user has scrolled the log feed below it). - Pause forms render per `ResumeRequest.ui.kind`: text / number / choice / multi_choice / confirm. Submit posts `play_resume`. ### 5.3 Right column — Output - Renders the play's `output_data` as it accumulates. - If `Logic.outputs` is declared, render one labeled card per output field; else render raw JSON. - For a paused play, output is empty (the flow hasn't returned yet). For a successful play, output is final. For a failed play, output is empty + an error banner is shown in the middle column. ### 5.4 Pause UX nit When a play pauses, the play bar visually emphasizes the pause: middle column shifts the pause form to the top + adds an accent border. The user shouldn't have to find the pause form — the play bar makes it the most prominent thing. --- ## 6. The right stats sidebar Two modes, switched by whether a play is currently selected: ### 6.1 No play selected → version benchmark stats Shows the latest `Benchmark` for `Logic.current_version_sid`: - success rate (0-100%) - p50 / p95 duration - avg tokens (prompt + completion) - estimated cost USD - difficulty rating - "Run benchmark" button → opens a small dialog to configure num_runs + which example set, then kicks off a benchmark play set ### 6.2 Play selected → play stats Shows that play's: - status + duration - total tokens (prompt + completion) - estimated cost USD (if computed) - attempts (count, if the flow uses retry loops) - error summary (if failed) - "Cancel" button (if status is running / awaiting_resume) Compact. No tabs. No interaction beyond the cancel button — drill-in lives elsewhere (the flow view middle column, the play bar columns). --- ## 7. Worked example: service_agent This is what the new UI looks like for the existing `service_agent` flow. No code changes to `service_agent.py` itself — just renames + the UI rendering. ### 7.1 Dashboard → click `service_agent` User lands on `/logics/{service_agent_sid}`. ### 7.2 Initial state (no play overlay) **Left sidebar:** ``` ↶ service_agent Title: service_agent Description: Self-contained AI agent: discovers Hero services via hero_router, compiles selected service stubs, generates a Python script, runs it, and summarizes the result. Inputs: prompt: string (required) "user's request in natural language" code_gen_model: string "" "override for the code-gen model" Outputs: summary: string "natural-language reply to the prompt" Versions: v1 v2 v3 (current) ``` **Middle (flow view, idle, parsed from source):** ``` service_agent ├─ fetch_catalog (sub-logic) ├─ select_services (sub-logic) ├─ compile_stubs (sub-logic) ├─ for attempt in range(3): │ ├─ service_code_gen (sub-logic; contains model_call) │ ├─ script_execution (sub-logic) │ └─ debug_feedback (sub-logic, on failure) └─ summarize (sub-logic) ``` Each sub-logic is a clickable node. `for attempt in range(3)` renders as a loop container. The `model_call` inside `service_code_gen` only shows when that node is expanded. **Right sidebar (benchmark stats for v3):** ``` Benchmark — v3 runs: 24 success rate: 75% p50 duration: 6.8s p95 duration: 12.4s avg tokens: 1284 prompt + 482 completion est. cost: $0.018 / run difficulty: 0.42 [Run new benchmark] ``` **Bottom play bar (idle):** ``` ┃ Inputs ┃ ┃ Output ┃ ┃ prompt: [____________] ┃ (no play yet — click ▶) ┃ ┃ ┃ model: [____________] ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ Examples ▾ ┃ ┃ ┃ ┃ • Calendar event ┃ ┃ ┃ ┃ • Find a contact ┃ ┃ ┃ ┃ • List healthy routers ┃ ┃ ┃ ┃ • Marketplace tokens ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ Plays ▾ ┃ ┃ ┃ ┃ 02hq failed 2 hr ago ┃ ┃ ┃ ┃ 02ca success 1 d ago ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ [▶ Run] ┃ ┃ ┃ ``` ### 7.3 User clicks "Calendar event" example, hits ▶ Run Inputs auto-fill: `prompt: "Create a calendar event titled X tomorrow at 10am"`, `model: ""`. Run → `logic.play_start` → new play `02j7` becomes the overlay. **Right sidebar switches to play stats** (status: running, started 0:00). **Middle column (flow view, live):** ``` service_agent [running] 0.4s ├─ fetch_catalog [ok] 423ms ├─ select_services [ok] 2.6s ├─ compile_stubs [ok] 5ms └─ attempt 1 [running] └─ service_code_gen [running] └─ model_call [running] ``` **Play bar middle column** — live log feed: ``` 09:14:22 fetch_catalog ok 423ms 09:14:23 select_services ok 2.6s (picked: hero_osis_calendar) 09:14:25 compile_stubs ok 5ms 09:14:25 attempt 1 09:14:25 service_code_gen running 09:14:26 model_call running ai.chat (groq-strong) ``` ### 7.4 A step pauses with `ask_user.choice(...)` Suppose `select_services` decides the chosen service's rootobjects don't clearly match the prompt and calls: ```python ask_user.choice( "hero_osis_calendar has these rootobjects matching 'event':", options=["Event", "RecurringEvent", "Reminder", "Cancel"], ) ``` The play exits with `awaiting_resume`. Right sidebar shows play status `awaiting_resume`. Play bar middle column shifts the pause form to a banner at the top: ``` ┃ ┃ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ┃ ┃ ┃ ┃ ⏸ ask_user.choice ┃ ┃ ┃ ┃ hero_osis_calendar has ┃ ┃ ┃ ┃ these rootobjects matching ┃ ┃ ┃ ┃ 'event': ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ◉ Event ┃ ┃ ┃ ┃ ○ RecurringEvent ┃ ┃ ┃ ┃ ○ Reminder ┃ ┃ ┃ ┃ ○ Cancel ┃ ┃ ┃ ┃ [Submit] ┃ ┃ ┃ ┃ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ┃ ┃ ┃ Plays ▾ ┃ ┃ ┃ ┃ 02j7 awaiting… 0:09 ┃ 09:14:22 fetch_catalog ok … ┃ ┃ ``` User picks "Event", submits → `play_resume` posts the answer → server respawns the subprocess with the answer cached → flow continues from where it paused with `fetch_catalog`, `select_services`, `compile_stubs` replayed (dashed in the flow tree) and `service_code_gen` executing fresh. ### 7.5 User clicks `service_code_gen` node in the flow view (mid-play) Breadcrumb in left sidebar updates: ``` ↶ service_agent / service_code_gen ``` The whole logic view rerenders for `service_code_gen`: - Left sidebar: `service_code_gen`'s title, description, inputs (prompt, services), outputs (script), versions. - Middle: `service_code_gen`'s flow view, scoped to that part of the current play's span tree — just shows `model_call` inside `attempt 1`. - Right sidebar: stats for `service_code_gen`'s contribution to this play (its tokens, its duration). - Play bar: - Left col: `service_code_gen`'s declared inputs, prefilled with the values the parent passed (`prompt=...`, `services=[...]`). Examples list is `service_code_gen`'s saved examples. Plays list shows past plays of `service_code_gen` standalone (when it was invoked as the root logic). - Middle col: filtered log feed — only `service_code_gen`'s spans + descendants. - Right col: this sub-logic's output for this play. If the user wants to fork off a standalone play of `service_code_gen` from here (with the prefilled inputs), they hit ▶ Run — that starts a new top-level play of `service_code_gen` independent of the parent. Click "service_agent" in the breadcrumb → zoom back out. Same play, parent view. ### 7.6 Authoring mid-play If the user wants to fix something in `service_code_gen`'s source, they toggle Code view in the middle column header. Monaco loads `service_code_gen`'s python_source. Edit. Save → creates a new LogicVersion. The current play is unaffected (it's pinned to the version it started on), but next runs use the new version. Step-memoization cache invalidates globally for that logic (the version_sid is in the step_key). --- ## 8. What replaces what | Today | Becomes | |---|---| | `/workflows` (workflow list) | dashboard `/` (logic list) | | `/workflows/{sid}/edit` (editor) | `/logics/{sid}` (logic view) | | top toolbar with title + version + run controls + plays dropdown + benchmark widget | left sidebar (title, version) + bottom play bar (run + plays + examples) + right sidebar (benchmark/play stats) | | right "inputs editor" sidebar (name + type + value mixed) | left sidebar (just declared inputs, type, description) + play bar left col (input VALUES + Run) | | `/plays` (plays list page) | bottom play bar left column → "Plays ▾" | | `/plays/{sid}` (dedicated detail page) | removed; plays render inside the logic view as an overlay | | `/examples` (examples list page) | bottom play bar left column → "Examples ▾" | --- ## 9. Implementation phases 1. **Schema + code renames with aliases.** Hard-rename `Workflow*` → `Logic*` and `@flow` → `@logic`. Add `#[serde(alias = ...)]` on every field + every old RPC method name. SDK exports both `flow` and `logic`. Old data keeps loading, old python_source keeps importing. (Closes 70% of the work; nothing visible to the user yet.) 2. **Dashboard restructure.** `/` becomes the logic list. Remove `/workflows`, `/examples`, `/plays`. 301 the old URLs to `/` or `/logics/{sid}` as appropriate. 3. **Logic view layout rebuild.** Implement the three-region body (left info / middle flow-or-code / right stats) + the bottom play bar. Move inputs/examples/plays into the play bar. Move benchmark/play stats into the right sidebar. Remove the top toolbar from the editor. 4. **Flow-view static parse.** When idle, render the sub-logic graph from a parse of `python_source` (find `@logic`-decorated `def`s and `logic.invoke("...")` calls). Click → breadcrumb-navigate. 5. **Breadcrumb navigation + sub-logic view.** Clicking a sub-logic node loads that sub-logic's view. When a play is overlaid, the sub-logic's flow view filters the parent play's spans to descendants of the clicked node. 6. **Pause-form prominence.** Move pause forms to a banner at the top of the play bar's middle column when `awaiting_resume`. 7. **Stats sidebar.** Conditional rendering — benchmark stats when no play overlay, play stats when one is selected. 8. **Cleanup.** Delete `play_detail.html`, `examples.html`, `plays.html`, `workflows.html`. Drop the old routes from `main.rs`. Out of scope (called out for the future): visual flow editor with drag/drop, conditional/loop/parallel visualisation, data-flow lines, two-way graph↔code binding. --- ## 10. Acceptance - `/` shows a logic list with name + description + last-run status. Click → logic view. - The logic view has the three-region body + bottom play bar described in §2.1. - A play started from the bottom play bar's ▶ Run streams spans into the play bar's middle column AND renders in the flow view's middle column simultaneously. - A pause from inside the flow shows the pause form as a banner at the top of the play bar's middle column. Submitting it resumes the play. - Clicking a sub-logic node in the flow view loads that sub-logic's view with the breadcrumb populated. Clicking the parent in the breadcrumb navigates back. - The right sidebar shows benchmark stats when no play is overlaid, and play stats when one is. - `/workflows/*`, `/examples`, `/plays`, `/plays/{sid}` all 301 to the new layout or are removed. - Pre-rename data keeps loading (stored `Workflow` records read fine via serde aliases). - Stored python_source with `from hero_tracing import flow` keeps working (the SDK exports `flow` as an alias of `logic`).

timur referenced this issue

2026-05-14 13:10:41 +00:00

feat: Logic / LogicVersion / Play model + single logic view (supersedes #38) #39

timur commented

2026-05-14 13:10:52 +00:00

Author

Owner

Superseded by #39 — the discussion in the chat shrunk the design further (no spans, no instrument(), no inline-vs-named distinction, no Benchmark rootobject; every function is a Logic, every invocation is a Play, plays form a tree).

timur closed this issue

2026-05-14 13:10:52 +00:00

No labels

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

lhumina_code/hero_logic#38

No description provided.

Rows
Columns