ExplorerBiomedical EngineeringEngineering
Research PaperResearchia:202605.25032

DrawVideo: Generating Long Video from Storyboard Keyframe Sketches

Chuanzhi Xu

Abstract

Long video generation requires high-fidelity synthesis, coherent narrative structure, and user control over extended time spans. Existing text-to-video methods often rely on a single long prompt, limiting control over pose, composition, layout, and motion. We propose DrawVideo, a sketch-guided, storyboard-driven framework for controllable long-video generation. DrawVideo decomposes long videos into independently controllable shots, each defined by a black-and-white sketch, an appearance prompt, ...

Submitted: May 25, 2026Subjects: Engineering; Biomedical Engineering

Description / Details

Long video generation requires high-fidelity synthesis, coherent narrative structure, and user control over extended time spans. Existing text-to-video methods often rely on a single long prompt, limiting control over pose, composition, layout, and motion. We propose DrawVideo, a sketch-guided, storyboard-driven framework for controllable long-video generation. DrawVideo decomposes long videos into independently controllable shots, each defined by a black-and-white sketch, an appearance prompt, and a motion prompt. The sketch controls pose and layout, the appearance prompt defines identity, scene, and style, and the motion prompt guides temporal dynamics. DrawVideo follows a hierarchical 'global multi-shot, local single-sketch' strategy: it first generates a structure-aligned reference keyframe, then expands the motion prompt into derivative keyframes representing action states, and finally synthesizes clips between adjacent keyframes to build each shot. We also introduce SketchLongVideo, the first dataset for sketch-guided text-to-long-video generation, constructed from animation videos via shot detection, keyframe extraction, vision-language recognition, prompt decomposition, and sketch conversion. Experiments show that DrawVideo achieves strong structural controllability, appearance consistency, visual stability, and coherent long-video generation.


Source: arXiv:2605.23508v1 - http://arxiv.org/abs/2605.23508v1 PDF: https://arxiv.org/pdf/2605.23508v1 Original Link: http://arxiv.org/abs/2605.23508v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
May 25, 2026
Topic:
Biomedical Engineering
Area:
Engineering
Comments:
0
Bookmark
DrawVideo: Generating Long Video from Storyboard Keyframe Sketches | Researchia