AI pipeline: incoherent scene generation — prompts generated blind, hardcoded count, no narrative arc #2
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
The current scene generation pipeline produces scenes that do not form a coherent story. Three distinct root causes combine to produce this:
1. Scene count is hardcoded to 5
In
workers/mod.rs,generate_scenes_workcalls:The
generate_scenesRPC method takes no count parameter, and the UI shows no control for it.Users have no way to say "I want a 3-scene short" vs "I want a 12-scene video". All projects start with exactly 5 scenes regardless of intent length, complexity, or target duration.
Fix needed: accept a
scene_countparameter from the client (reasonable range: 3–20); expose it as a number input in the New Project modal and in the Planning step.2. Image prompt and video prompt are generated in a single AI call — before any image exists
In
providers/ai.rs,generate_scenes()sends one request to the LLM that returns bothimage_promptandvideo_promptin the same JSON array:This means the video prompt (camera movement, subject action, atmosphere) is written before any image exists. The video prompt cannot account for:
When a video model receives a prompt that doesn't match the reference image, the result is visually incoherent motion or ignored prompts.
Fix needed: split generation into two separate AI calls:
This also means the
video_promptfield onSceneshould not be populated during scene planning — it should be derived later.3. No narrative coherence — scenes are generated as independent parallel items
The system prompt asks the LLM to "generate N scenes" in a single JSON array. The LLM treats each array entry as an independent creative unit with no enforced relationship to the others. There is no instruction to:
The result is a set of scenes that look like they come from different videos.
Fix needed: the scene generation prompt needs explicit narrative structure:
Alternatively: generate a narrative outline first (one short sentence per scene describing its role in the story), then generate image prompts grounded in that outline. This two-pass approach produces dramatically more coherent results.
Summary of required changes
5in workerRelated
See issue #1 for the user-facing side of this (field naming, structured brief input). The improvements here are backend/pipeline; they will also require a UI change to remove the video prompt textarea from the Planning step.