#2 - AI pipeline: incoherent scene generation — prompts generated blind, hardcoded count, no narrative arc - lhumina_code/hero_videos

casper-stevens commented

2026-05-19 14:19:34 +00:00

Member

Problem

The current scene generation pipeline produces scenes that do not form a coherent story. Three distinct root causes combine to produce this:

1. Scene count is hardcoded to 5

In workers/mod.rs, generate_scenes_work calls:

providers.ai.generate_scenes(&intent, 5).await

The generate_scenes RPC method takes no count parameter, and the UI shows no control for it.

Users have no way to say "I want a 3-scene short" vs "I want a 12-scene video". All projects start with exactly 5 scenes regardless of intent length, complexity, or target duration.

Fix needed: accept a scene_count parameter from the client (reasonable range: 3–20); expose it as a number input in the New Project modal and in the Planning step.

2. Image prompt and video prompt are generated in a single AI call — before any image exists

In providers/ai.rs, generate_scenes() sends one request to the LLM that returns both image_prompt and video_prompt in the same JSON array:

let system = format!(
    "Generate exactly {count} scenes … \
    Each scene needs an image_prompt … \
    and a video_prompt … \
    Return only JSON: {{\"scenes\": [{{\"image_prompt\": \"...\", \"video_prompt\": \"...\"}}]}}"
);

This means the video prompt (camera movement, subject action, atmosphere) is written before any image exists. The video prompt cannot account for:

What the image actually looks like (composition, lighting, subject position)
Whether the AI chose a wide angle or tight close-up
The actual colour palette or environment that was generated

When a video model receives a prompt that doesn't match the reference image, the result is visually incoherent motion or ignored prompts.

Fix needed: split generation into two separate AI calls:

Generate image prompts only (one call, all scenes at once for narrative coherence — see below)
After images are generated and a candidate is selected, generate the video prompt per scene, with the selected image URL or description as additional context

This also means the video_prompt field on Scene should not be populated during scene planning — it should be derived later.

3. No narrative coherence — scenes are generated as independent parallel items

The system prompt asks the LLM to "generate N scenes" in a single JSON array. The LLM treats each array entry as an independent creative unit with no enforced relationship to the others. There is no instruction to:

Maintain a consistent visual style, lighting, or location across scenes
Follow a narrative arc (setup → tension → resolution, or intro → body → CTA)
Ensure character or subject continuity between scenes
Reference what happened in the previous scene

The result is a set of scenes that look like they come from different videos.

Fix needed: the scene generation prompt needs explicit narrative structure:

A story arc instruction (e.g. "scenes should form a coherent [story / product demo / documentary sequence]")
A visual consistency directive (same environment, same subject, consistent style)
Either a chain-of-thought pass (generate the story arc first, then the scenes) or few-shot examples of coherent multi-scene outputs

Alternatively: generate a narrative outline first (one short sentence per scene describing its role in the story), then generate image prompts grounded in that outline. This two-pass approach produces dramatically more coherent results.

Summary of required changes

Area	Current	Needed
Scene count	Hardcoded `5` in worker	User-controlled input, passed through RPC
Prompt generation	Both prompts in one call, before images	Image prompts first (with narrative), video prompts after image selection
Narrative coherence	None — independent parallel items	Story arc pass, visual consistency directive
Video prompt timing	Generated at scene creation	Generated after image is selected, using image as context

See issue #1 for the user-facing side of this (field naming, structured brief input). The improvements here are backend/pipeline; they will also require a UI change to remove the video prompt textarea from the Planning step.

## Problem The current scene generation pipeline produces scenes that do not form a coherent story. Three distinct root causes combine to produce this: --- ### 1. Scene count is hardcoded to 5 In `workers/mod.rs`, `generate_scenes_work` calls: ```rust providers.ai.generate_scenes(&intent, 5).await ``` The `generate_scenes` RPC method takes no count parameter, and the UI shows no control for it. Users have no way to say "I want a 3-scene short" vs "I want a 12-scene video". All projects start with exactly 5 scenes regardless of intent length, complexity, or target duration. **Fix needed:** accept a `scene_count` parameter from the client (reasonable range: 3–20); expose it as a number input in the New Project modal and in the Planning step. --- ### 2. Image prompt and video prompt are generated in a single AI call — before any image exists In `providers/ai.rs`, `generate_scenes()` sends one request to the LLM that returns both `image_prompt` and `video_prompt` in the same JSON array: ```rust let system = format!( "Generate exactly {count} scenes … \ Each scene needs an image_prompt … \ and a video_prompt … \ Return only JSON: {{\"scenes\": [{{\"image_prompt\": \"...\", \"video_prompt\": \"...\"}}]}}" ); ``` This means the video prompt (camera movement, subject action, atmosphere) is written **before any image exists**. The video prompt cannot account for: - What the image actually looks like (composition, lighting, subject position) - Whether the AI chose a wide angle or tight close-up - The actual colour palette or environment that was generated When a video model receives a prompt that doesn't match the reference image, the result is visually incoherent motion or ignored prompts. **Fix needed:** split generation into two separate AI calls: 1. Generate **image prompts only** (one call, all scenes at once for narrative coherence — see below) 2. After images are generated and a candidate is selected, generate the **video prompt** per scene, with the selected image URL or description as additional context This also means the `video_prompt` field on `Scene` should not be populated during scene planning — it should be derived later. --- ### 3. No narrative coherence — scenes are generated as independent parallel items The system prompt asks the LLM to "generate N scenes" in a single JSON array. The LLM treats each array entry as an independent creative unit with no enforced relationship to the others. There is no instruction to: - Maintain a consistent visual style, lighting, or location across scenes - Follow a narrative arc (setup → tension → resolution, or intro → body → CTA) - Ensure character or subject continuity between scenes - Reference what happened in the previous scene The result is a set of scenes that look like they come from different videos. **Fix needed:** the scene generation prompt needs explicit narrative structure: - A story arc instruction (e.g. "scenes should form a coherent [story / product demo / documentary sequence]") - A visual consistency directive (same environment, same subject, consistent style) - Either a chain-of-thought pass (generate the story arc first, then the scenes) or few-shot examples of coherent multi-scene outputs Alternatively: generate a **narrative outline** first (one short sentence per scene describing its role in the story), then generate image prompts grounded in that outline. This two-pass approach produces dramatically more coherent results. --- ## Summary of required changes | Area | Current | Needed | |---|---|---| | Scene count | Hardcoded `5` in worker | User-controlled input, passed through RPC | | Prompt generation | Both prompts in one call, before images | Image prompts first (with narrative), video prompts after image selection | | Narrative coherence | None — independent parallel items | Story arc pass, visual consistency directive | | Video prompt timing | Generated at scene creation | Generated after image is selected, using image as context | ## Related See issue #1 for the user-facing side of this (field naming, structured brief input). The improvements here are backend/pipeline; they will also require a UI change to remove the video prompt textarea from the Planning step.

casper-stevens referenced this issue

2026-05-19 16:03:19 +00:00

Port hero_videos_web to Dioxus #3

casper-stevens commented

2026-05-19 16:12:35 +00:00

Author

Member

Implementation Spec — Issue #2

Objective

Fix three interconnected problems in the AI scene generation pipeline:

Hardcoded scene count of 5 — replace with a user-supplied value (range 3–20) threaded from the UI modal and Planning step through the RPC layer and worker to the AI provider.
Combined single-call generation of both image_prompt and video_prompt — split into two separate AI calls: image prompts first (all scenes at once), video prompts per scene only after an image is selected.
No narrative coherence — scene image prompts are generated as isolated parallel items; add a two-pass generation strategy that first produces a story arc / outline, then grounds each scene's image prompt in that arc.

Requirements

The user may specify a scene_count between 3 and 20 (inclusive) when creating a project or when triggering generate_scenes from the Planning step.
The generate_scenes RPC method accepts an optional scene_count: u32 parameter (defaults to 5 when omitted for backward compatibility).
Scene image prompts are generated in a single coherent AI call that first produces a narrative outline then derives grounded image prompts for each scene.
video_prompt is removed from the initial scene generation entirely; it starts as the arc sentence (a human-readable one-liner of that scene's narrative role) in the Scene struct.
A new AI method generate_video_prompt(intent, arc_sentence, image_prompt, selected_image_description) is called per scene at the point the user selects a candidate image. It uses the image URL/description as context.
video_prompt on a scene is editable and saveable, same as image_prompt.
The OpenRPC spec (openrpc.json) is updated to document the new parameter and method.
All existing downstream consumers of scene.video_prompt (the clip generation worker) continue to work unchanged — they already read scene.video_prompt from storage, so as long as it is populated before generate_clip is called, no changes are needed there.

Files to Modify

File	Change
`crates/hero_videos_server/src/providers/ai.rs`	Replace `generate_scenes` with `generate_image_prompts` (returns arc sentence + image prompt per scene) and add `generate_video_prompt` (single scene, post-image-selection)
`crates/hero_videos_server/src/workers/mod.rs`	Accept `scene_count: u32`, call `generate_image_prompts`, store arc sentence as initial `video_prompt`; add `generate_video_prompt_work`
`crates/hero_videos_server/src/rpc/mod.rs`	Add `scene_count: Option<u32>` to `generate_scenes`; add `generate_video_prompt` RPC handler
`crates/hero_videos_server/src/main.rs`	Add `--scene-count` CLI arg to `WorkerTask::Scenes`; add `WorkerTask::VideoPrompt` variant
`crates/hero_videos_server/openrpc.json`	Add `scene_count` param; add `generate_video_prompt` method
`crates/hero_videos_web/templates/app.html`	Add scene count input to Planning step; auto-trigger `generate_video_prompt` after image selection

Implementation Plan

Step 1 — Split AI provider methods in `providers/ai.rs`

File: crates/hero_videos_server/src/providers/ai.rs

Replace generate_scenes with two methods:

generate_image_prompts(intent, count) -> Vec<(String, String)> (arc_sentence, image_prompt)

Single chain-of-thought AI call. The prompt asks the model to:

Generate a story_arc: an array of count one-sentence scene descriptions forming a narrative arc (setup → development → resolution).
For each arc sentence, derive an image_prompt grounded in that scene (photorealistic, 16:9, vivid, consistent visual style across all scenes).

JSON schema:

{ "story_arc": ["..."], "scenes": [{"image_prompt": "..."}] }

System prompt includes:

"The scenes must form a coherent narrative arc: establish a world, develop tension or progression, resolve."
"Maintain consistent lighting, color palette, and protagonist appearance across all scenes."

generate_video_prompt(intent, arc_sentence, image_prompt, selected_image_description) -> String

Single AI call. Prompt: "You are a video director. Given a selected still image and scene context, write a single video prompt (2–3 sentences) for AI video generation. Describe camera movement, subject motion, and atmosphere."

Dependencies: none

Step 2 — Update workers in `workers/mod.rs`

File: crates/hero_videos_server/src/workers/mod.rs

Change generate_scenes_work to accept scene_count: u32; replace the hardcoded 5 and the generate_scenes call with generate_image_prompts(&intent, scene_count as usize).
When constructing Scene objects: image_prompt = the returned image prompt, video_prompt = the returned arc sentence.
Add generate_video_prompt_work(project_id, scene_id, osis, providers): loads project + scene, calls providers.ai.generate_video_prompt(...), writes result back to scene.video_prompt.

Dependencies: Step 1

Step 3 — Update RPC handlers in `rpc/mod.rs`

File: crates/hero_videos_server/src/rpc/mod.rs

Add scene_count: Option<u32> to generate_scenes. Validate range 3–20; default to 5.
Thread scene_count through to the worker CLI args and the tokio::spawn fallback.
Add generate_video_prompt(project_id, scene_id) handler: validate scene exists and has a selected candidate, launch WorkerTask::VideoPrompt job.

Dependencies: Step 2

Step 4 — Update CLI in `main.rs`

File: crates/hero_videos_server/src/main.rs

Add scene_count: u32 (default 5) field to WorkerTask::Scenes.
Pass it to generate_scenes_work.
Add WorkerTask::VideoPrompt { project_id, scene_id } variant dispatching to generate_video_prompt_work.

Dependencies: Step 2

Step 5 — Update `openrpc.json`

File: crates/hero_videos_server/openrpc.json

Add scene_count (optional integer 3–20) to generate_scenes.
Add generate_video_prompt method with project_id and scene_id params.

Dependencies: none (can run in parallel with Steps 1–4)

Step 6 — Update Planning step UI in `app.html`

File: crates/hero_videos_web/templates/app.html

Add a scene count number input (min=3, max=20, default=5) next to "Generate Scenes" button in the Planning section.
Update generateScenes() to read and pass scene_count.
After select_image succeeds in the Imaging step, fire rpc('generate_video_prompt', { project_id, scene_id }) in a try/catch (non-blocking).
Add videoPromptGenStartedAt tracking and extend schedulePoll to poll when video prompt generation is active.

Dependencies: Step 3

Acceptance Criteria

generate_scenes RPC accepts scene_count (3–20); rejects out-of-range values with a clear error
generate_scenes with no scene_count defaults to 5 (backward compatible)
Scenes generated by generate_image_prompts have coherent image_prompt values grounded in a story arc
Initial video_prompt for each scene is the arc sentence (human-readable)
After a user selects a candidate image, generate_video_prompt is triggered automatically; scene.video_prompt is updated once complete
Clip generation (generate_clip) works unchanged
Planning step UI shows a number input (3–20) next to "Generate Scenes"
hero_proc worker CLI (WorkerTask::Scenes) accepts --scene-count
OpenRPC spec reflects the new parameter and new method

Notes

scene.video_prompt type stays str — no OSchema regeneration needed.
The crates/hero_videos_admin/static/openrpc.json is a separate copy; check build.rs to see if it is auto-synced, otherwise update it manually alongside the server copy.
generate_video_prompt is a fast single-inference call; 60s timeout is sufficient.
Do not make video_prompt optional in the OpenRPC schema — it is always present.

## Implementation Spec — Issue #2 ### Objective Fix three interconnected problems in the AI scene generation pipeline: 1. Hardcoded scene count of 5 — replace with a user-supplied value (range 3–20) threaded from the UI modal and Planning step through the RPC layer and worker to the AI provider. 2. Combined single-call generation of both `image_prompt` and `video_prompt` — split into two separate AI calls: image prompts first (all scenes at once), video prompts per scene only after an image is selected. 3. No narrative coherence — scene image prompts are generated as isolated parallel items; add a two-pass generation strategy that first produces a story arc / outline, then grounds each scene's image prompt in that arc. --- ### Requirements - The user may specify a `scene_count` between 3 and 20 (inclusive) when creating a project or when triggering `generate_scenes` from the Planning step. - The `generate_scenes` RPC method accepts an optional `scene_count: u32` parameter (defaults to 5 when omitted for backward compatibility). - Scene image prompts are generated in a single coherent AI call that first produces a narrative outline then derives grounded image prompts for each scene. - `video_prompt` is removed from the initial scene generation entirely; it starts as the arc sentence (a human-readable one-liner of that scene's narrative role) in the `Scene` struct. - A new AI method `generate_video_prompt(intent, arc_sentence, image_prompt, selected_image_description)` is called per scene at the point the user selects a candidate image. It uses the image URL/description as context. - `video_prompt` on a scene is editable and saveable, same as `image_prompt`. - The OpenRPC spec (`openrpc.json`) is updated to document the new parameter and method. - All existing downstream consumers of `scene.video_prompt` (the clip generation worker) continue to work unchanged — they already read `scene.video_prompt` from storage, so as long as it is populated before `generate_clip` is called, no changes are needed there. --- ### Files to Modify | File | Change | |---|---| | `crates/hero_videos_server/src/providers/ai.rs` | Replace `generate_scenes` with `generate_image_prompts` (returns arc sentence + image prompt per scene) and add `generate_video_prompt` (single scene, post-image-selection) | | `crates/hero_videos_server/src/workers/mod.rs` | Accept `scene_count: u32`, call `generate_image_prompts`, store arc sentence as initial `video_prompt`; add `generate_video_prompt_work` | | `crates/hero_videos_server/src/rpc/mod.rs` | Add `scene_count: Option<u32>` to `generate_scenes`; add `generate_video_prompt` RPC handler | | `crates/hero_videos_server/src/main.rs` | Add `--scene-count` CLI arg to `WorkerTask::Scenes`; add `WorkerTask::VideoPrompt` variant | | `crates/hero_videos_server/openrpc.json` | Add `scene_count` param; add `generate_video_prompt` method | | `crates/hero_videos_web/templates/app.html` | Add scene count input to Planning step; auto-trigger `generate_video_prompt` after image selection | --- ### Implementation Plan #### Step 1 — Split AI provider methods in `providers/ai.rs` **File:** `crates/hero_videos_server/src/providers/ai.rs` Replace `generate_scenes` with two methods: **`generate_image_prompts(intent, count) -> Vec<(String, String)>`** (arc_sentence, image_prompt) Single chain-of-thought AI call. The prompt asks the model to: 1. Generate a `story_arc`: an array of `count` one-sentence scene descriptions forming a narrative arc (setup → development → resolution). 2. For each arc sentence, derive an `image_prompt` grounded in that scene (photorealistic, 16:9, vivid, consistent visual style across all scenes). JSON schema: ```json { "story_arc": ["..."], "scenes": [{"image_prompt": "..."}] } ``` System prompt includes: - "The scenes must form a coherent narrative arc: establish a world, develop tension or progression, resolve." - "Maintain consistent lighting, color palette, and protagonist appearance across all scenes." **`generate_video_prompt(intent, arc_sentence, image_prompt, selected_image_description) -> String`** Single AI call. Prompt: "You are a video director. Given a selected still image and scene context, write a single video prompt (2–3 sentences) for AI video generation. Describe camera movement, subject motion, and atmosphere." Dependencies: none --- #### Step 2 — Update workers in `workers/mod.rs` **File:** `crates/hero_videos_server/src/workers/mod.rs` 1. Change `generate_scenes_work` to accept `scene_count: u32`; replace the hardcoded `5` and the `generate_scenes` call with `generate_image_prompts(&intent, scene_count as usize)`. 2. When constructing `Scene` objects: `image_prompt` = the returned image prompt, `video_prompt` = the returned arc sentence. 3. Add `generate_video_prompt_work(project_id, scene_id, osis, providers)`: loads project + scene, calls `providers.ai.generate_video_prompt(...)`, writes result back to `scene.video_prompt`. Dependencies: Step 1 --- #### Step 3 — Update RPC handlers in `rpc/mod.rs` **File:** `crates/hero_videos_server/src/rpc/mod.rs` 1. Add `scene_count: Option<u32>` to `generate_scenes`. Validate range 3–20; default to 5. 2. Thread `scene_count` through to the worker CLI args and the tokio::spawn fallback. 3. Add `generate_video_prompt(project_id, scene_id)` handler: validate scene exists and has a selected candidate, launch `WorkerTask::VideoPrompt` job. Dependencies: Step 2 --- #### Step 4 — Update CLI in `main.rs` **File:** `crates/hero_videos_server/src/main.rs` 1. Add `scene_count: u32` (default 5) field to `WorkerTask::Scenes`. 2. Pass it to `generate_scenes_work`. 3. Add `WorkerTask::VideoPrompt { project_id, scene_id }` variant dispatching to `generate_video_prompt_work`. Dependencies: Step 2 --- #### Step 5 — Update `openrpc.json` **File:** `crates/hero_videos_server/openrpc.json` 1. Add `scene_count` (optional integer 3–20) to `generate_scenes`. 2. Add `generate_video_prompt` method with `project_id` and `scene_id` params. Dependencies: none (can run in parallel with Steps 1–4) --- #### Step 6 — Update Planning step UI in `app.html` **File:** `crates/hero_videos_web/templates/app.html` 1. Add a scene count number input (min=3, max=20, default=5) next to "Generate Scenes" button in the Planning section. 2. Update `generateScenes()` to read and pass `scene_count`. 3. After `select_image` succeeds in the Imaging step, fire `rpc('generate_video_prompt', { project_id, scene_id })` in a try/catch (non-blocking). 4. Add `videoPromptGenStartedAt` tracking and extend `schedulePoll` to poll when video prompt generation is active. Dependencies: Step 3 --- ### Acceptance Criteria - [ ] `generate_scenes` RPC accepts `scene_count` (3–20); rejects out-of-range values with a clear error - [ ] `generate_scenes` with no `scene_count` defaults to 5 (backward compatible) - [ ] Scenes generated by `generate_image_prompts` have coherent `image_prompt` values grounded in a story arc - [ ] Initial `video_prompt` for each scene is the arc sentence (human-readable) - [ ] After a user selects a candidate image, `generate_video_prompt` is triggered automatically; `scene.video_prompt` is updated once complete - [ ] Clip generation (`generate_clip`) works unchanged - [ ] Planning step UI shows a number input (3–20) next to "Generate Scenes" - [ ] `hero_proc` worker CLI (`WorkerTask::Scenes`) accepts `--scene-count` - [ ] OpenRPC spec reflects the new parameter and new method --- ### Notes - `scene.video_prompt` type stays `str` — no OSchema regeneration needed. - The `crates/hero_videos_admin/static/openrpc.json` is a separate copy; check `build.rs` to see if it is auto-synced, otherwise update it manually alongside the server copy. - `generate_video_prompt` is a fast single-inference call; 60s timeout is sufficient. - Do not make `video_prompt` optional in the OpenRPC schema — it is always present.

casper-stevens commented

2026-05-19 16:25:08 +00:00

Author

Member

Test Results

Total: 4
Passed: 3
Failed: 1

Failures

Doc-test: crates/hero_videos_sdk/src/lib.rs - (line 6)

The doc-test example references RunWorkflowInput and client.run_workflow() which no longer exist in hero_videos_sdk. The SDK does not export RunWorkflowInput and HeroVideosClient has no run_workflow method.

error[E0432]: unresolved import `hero_videos_sdk::RunWorkflowInput`
error[E0599]: no method named `run_workflow` found for struct `HeroVideosClient`

The 3 unit tests in hero_videos_server all passed:

videos::tests::test_collection_crud
videos::tests::test_project_crud
videos::tests::test_videos_all_objects

## Test Results - Total: 4 - Passed: 3 - Failed: 1 ### Failures **Doc-test:** `crates/hero_videos_sdk/src/lib.rs - (line 6)` The doc-test example references `RunWorkflowInput` and `client.run_workflow()` which no longer exist in `hero_videos_sdk`. The SDK does not export `RunWorkflowInput` and `HeroVideosClient` has no `run_workflow` method. ``` error[E0432]: unresolved import `hero_videos_sdk::RunWorkflowInput` error[E0599]: no method named `run_workflow` found for struct `HeroVideosClient` ``` The 3 unit tests in `hero_videos_server` all passed: - `videos::tests::test_collection_crud` - `videos::tests::test_project_crud` - `videos::tests::test_videos_all_objects`

casper-stevens commented

2026-05-19 16:26:33 +00:00

Author

Member

Implementation Complete

Changes Made

crates/hero_videos_server/src/providers/ai.rs

Removed the combined generate_scenes method
Added generate_image_prompts(intent, count): single chain-of-thought AI call that first produces a story_arc (one sentence per scene forming a narrative arc), then derives an image_prompt grounded in each arc sentence. System prompt enforces consistent lighting, color palette, and protagonist appearance across all scenes. Returns Vec<(arc_sentence, image_prompt)>.
Added generate_video_prompt(intent, arc_sentence, image_prompt, selected_image_description): called after image selection; prompts the AI as a video director to write a 2–3 sentence camera/motion/atmosphere prompt using the selected image as context.

crates/hero_videos_server/src/workers/mod.rs

generate_scenes_work now accepts scene_count: u32; calls generate_image_prompts instead of the old single-call method. Each scene's video_prompt is initialized to the arc sentence (human-readable narrative one-liner).
Added generate_video_prompt_work(project_id, scene_id, osis, providers): loads scene, calls ai.generate_video_prompt, writes the result back to scene.video_prompt.

crates/hero_videos_server/src/rpc/mod.rs

generate_scenes now accepts optional scene_count: u32 (range 3–20, default 5); validates and threads the value through to the worker CLI and tokio::spawn fallback.
Added generate_video_prompt(project_id, scene_id) RPC handler: validates scene exists and has a selected image, then launches generate_video_prompt_work.

crates/hero_videos_server/src/sockets/mod.rs

Wired generate_video_prompt into the dispatch table and rpc.discover methods list.

crates/hero_videos_server/src/main.rs

WorkerTask::Scenes now accepts --scene-count CLI arg (default 5).
Added WorkerTask::VideoPrompt { project_id, scene_id } variant dispatching to generate_video_prompt_work.

crates/hero_videos_server/openrpc.json and crates/hero_videos_admin/static/openrpc.json

Added optional scene_count parameter (integer 3–20) to generate_scenes.
Added new generate_video_prompt method.

crates/hero_videos_web/templates/app.html

Added scene count number input (min=3, max=20, default=5) next to "Generate Scenes" button in the Planning step.
generateScenes() now reads and passes scene_count in the RPC call.
After select_image succeeds, generate_video_prompt is fired automatically (fire-and-forget).
Added videoPromptGenStartedAt tracking; polling continues while video prompt generation is active for any scene.

crates/hero_videos_sdk/src/lib.rs

Fixed stale doc-test that referenced removed run_workflow / RunWorkflowInput.

Test Results

Total: 3
Passed: 3
Failed: 0

--- ## Implementation Complete ### Changes Made **`crates/hero_videos_server/src/providers/ai.rs`** - Removed the combined `generate_scenes` method - Added `generate_image_prompts(intent, count)`: single chain-of-thought AI call that first produces a `story_arc` (one sentence per scene forming a narrative arc), then derives an `image_prompt` grounded in each arc sentence. System prompt enforces consistent lighting, color palette, and protagonist appearance across all scenes. Returns `Vec<(arc_sentence, image_prompt)>`. - Added `generate_video_prompt(intent, arc_sentence, image_prompt, selected_image_description)`: called after image selection; prompts the AI as a video director to write a 2–3 sentence camera/motion/atmosphere prompt using the selected image as context. **`crates/hero_videos_server/src/workers/mod.rs`** - `generate_scenes_work` now accepts `scene_count: u32`; calls `generate_image_prompts` instead of the old single-call method. Each scene's `video_prompt` is initialized to the arc sentence (human-readable narrative one-liner). - Added `generate_video_prompt_work(project_id, scene_id, osis, providers)`: loads scene, calls `ai.generate_video_prompt`, writes the result back to `scene.video_prompt`. **`crates/hero_videos_server/src/rpc/mod.rs`** - `generate_scenes` now accepts optional `scene_count: u32` (range 3–20, default 5); validates and threads the value through to the worker CLI and tokio::spawn fallback. - Added `generate_video_prompt(project_id, scene_id)` RPC handler: validates scene exists and has a selected image, then launches `generate_video_prompt_work`. **`crates/hero_videos_server/src/sockets/mod.rs`** - Wired `generate_video_prompt` into the dispatch table and `rpc.discover` methods list. **`crates/hero_videos_server/src/main.rs`** - `WorkerTask::Scenes` now accepts `--scene-count` CLI arg (default 5). - Added `WorkerTask::VideoPrompt { project_id, scene_id }` variant dispatching to `generate_video_prompt_work`. **`crates/hero_videos_server/openrpc.json`** and **`crates/hero_videos_admin/static/openrpc.json`** - Added optional `scene_count` parameter (integer 3–20) to `generate_scenes`. - Added new `generate_video_prompt` method. **`crates/hero_videos_web/templates/app.html`** - Added scene count number input (min=3, max=20, default=5) next to "Generate Scenes" button in the Planning step. - `generateScenes()` now reads and passes `scene_count` in the RPC call. - After `select_image` succeeds, `generate_video_prompt` is fired automatically (fire-and-forget). - Added `videoPromptGenStartedAt` tracking; polling continues while video prompt generation is active for any scene. **`crates/hero_videos_sdk/src/lib.rs`** - Fixed stale doc-test that referenced removed `run_workflow` / `RunWorkflowInput`. ### Test Results - Total: 3 - Passed: 3 - Failed: 0 ---

casper-stevens commented

2026-05-19 16:26:43 +00:00

Author

Member

Implementation Complete

Changes Made

crates/hero_videos_server/src/providers/ai.rs

Removed the combined generate_scenes method
Added generate_image_prompts(intent, count): single chain-of-thought AI call that first produces a story_arc (one sentence per scene forming a narrative arc), then derives an image_prompt grounded in each arc sentence. System prompt enforces consistent lighting, color palette, and protagonist appearance across all scenes. Returns Vec<(arc_sentence, image_prompt)>.
Added generate_video_prompt(intent, arc_sentence, image_prompt, selected_image_description): called after image selection; prompts the AI as a video director to write a 2–3 sentence camera/motion/atmosphere prompt using the selected image as context.

crates/hero_videos_server/src/workers/mod.rs

generate_scenes_work now accepts scene_count: u32; calls generate_image_prompts instead of the old single-call method. Each scene's video_prompt is initialized to the arc sentence (human-readable narrative one-liner).
Added generate_video_prompt_work(project_id, scene_id, osis, providers): loads scene, calls ai.generate_video_prompt, writes the result back to scene.video_prompt.

crates/hero_videos_server/src/rpc/mod.rs

generate_scenes now accepts optional scene_count: u32 (range 3–20, default 5); validates and threads the value through to the worker CLI and tokio::spawn fallback.
Added generate_video_prompt(project_id, scene_id) RPC handler: validates scene exists and has a selected image, then launches generate_video_prompt_work.

crates/hero_videos_server/src/sockets/mod.rs

Wired generate_video_prompt into the dispatch table and rpc.discover methods list.

crates/hero_videos_server/src/main.rs

WorkerTask::Scenes now accepts --scene-count CLI arg (default 5).
Added WorkerTask::VideoPrompt { project_id, scene_id } variant dispatching to generate_video_prompt_work.

crates/hero_videos_server/openrpc.json and crates/hero_videos_admin/static/openrpc.json

Added optional scene_count parameter (integer 3–20) to generate_scenes.
Added new generate_video_prompt method.

crates/hero_videos_web/templates/app.html

Added scene count number input (min=3, max=20, default=5) next to "Generate Scenes" button in the Planning step.
generateScenes() now reads and passes scene_count in the RPC call.
After select_image succeeds, generate_video_prompt is fired automatically (fire-and-forget).
Added videoPromptGenStartedAt tracking; polling continues while video prompt generation is active for any scene.

crates/hero_videos_sdk/src/lib.rs

Fixed stale doc-test that referenced removed run_workflow / RunWorkflowInput.

Test Results

Total: 3
Passed: 3
Failed: 0

--- ## Implementation Complete ### Changes Made **`crates/hero_videos_server/src/providers/ai.rs`** - Removed the combined `generate_scenes` method - Added `generate_image_prompts(intent, count)`: single chain-of-thought AI call that first produces a `story_arc` (one sentence per scene forming a narrative arc), then derives an `image_prompt` grounded in each arc sentence. System prompt enforces consistent lighting, color palette, and protagonist appearance across all scenes. Returns `Vec<(arc_sentence, image_prompt)>`. - Added `generate_video_prompt(intent, arc_sentence, image_prompt, selected_image_description)`: called after image selection; prompts the AI as a video director to write a 2–3 sentence camera/motion/atmosphere prompt using the selected image as context. **`crates/hero_videos_server/src/workers/mod.rs`** - `generate_scenes_work` now accepts `scene_count: u32`; calls `generate_image_prompts` instead of the old single-call method. Each scene's `video_prompt` is initialized to the arc sentence (human-readable narrative one-liner). - Added `generate_video_prompt_work(project_id, scene_id, osis, providers)`: loads scene, calls `ai.generate_video_prompt`, writes the result back to `scene.video_prompt`. **`crates/hero_videos_server/src/rpc/mod.rs`** - `generate_scenes` now accepts optional `scene_count: u32` (range 3–20, default 5); validates and threads the value through to the worker CLI and tokio::spawn fallback. - Added `generate_video_prompt(project_id, scene_id)` RPC handler: validates scene exists and has a selected image, then launches `generate_video_prompt_work`. **`crates/hero_videos_server/src/sockets/mod.rs`** - Wired `generate_video_prompt` into the dispatch table and `rpc.discover` methods list. **`crates/hero_videos_server/src/main.rs`** - `WorkerTask::Scenes` now accepts `--scene-count` CLI arg (default 5). - Added `WorkerTask::VideoPrompt { project_id, scene_id }` variant dispatching to `generate_video_prompt_work`. **`crates/hero_videos_server/openrpc.json`** and **`crates/hero_videos_admin/static/openrpc.json`** - Added optional `scene_count` parameter (integer 3–20) to `generate_scenes`. - Added new `generate_video_prompt` method. **`crates/hero_videos_web/templates/app.html`** - Added scene count number input (min=3, max=20, default=5) next to "Generate Scenes" button in the Planning step. - `generateScenes()` now reads and passes `scene_count` in the RPC call. - After `select_image` succeeds, `generate_video_prompt` is fired automatically (fire-and-forget). - Added `videoPromptGenStartedAt` tracking; polling continues while video prompt generation is active for any scene. **`crates/hero_videos_sdk/src/lib.rs`** - Fixed stale doc-test that referenced removed `run_workflow` / `RunWorkflowInput`. ### Test Results - Total: 3 - Passed: 3 - Failed: 0 ---

casper-stevens referenced this issue from a commit

2026-05-20 08:18:18 +00:00

feat(pipeline): coherent scene generation, video prompt step, settings

casper-stevens closed this issue

2026-05-20 08:18:39 +00:00

Rows
Columns

AI pipeline: incoherent scene generation — prompts generated blind, hardcoded count, no narrative arc #2

Problem

1. Scene count is hardcoded to 5

2. Image prompt and video prompt are generated in a single AI call — before any image exists

3. No narrative coherence — scenes are generated as independent parallel items

Summary of required changes

Related

Implementation Spec — Issue #2

Objective

Requirements

Files to Modify

Implementation Plan

Step 1 — Split AI provider methods in providers/ai.rs

Step 2 — Update workers in workers/mod.rs

Step 3 — Update RPC handlers in rpc/mod.rs

Step 4 — Update CLI in main.rs

Step 5 — Update openrpc.json

Step 6 — Update Planning step UI in app.html

Acceptance Criteria

Notes

Test Results

Failures

Implementation Complete

Changes Made

Test Results

Implementation Complete

Changes Made

Test Results

Step 1 — Split AI provider methods in `providers/ai.rs`

Step 2 — Update workers in `workers/mod.rs`

Step 3 — Update RPC handlers in `rpc/mod.rs`

Step 4 — Update CLI in `main.rs`

Step 5 — Update `openrpc.json`

Step 6 — Update Planning step UI in `app.html`