Seedance 2.0

Seedance 2.0 Multi-modal Video Creation

As BestVid’s first matrix launch page, Seedance 2.0 focuses on multi-modal fusion and controllable generation from text to mixed asset workflows.

Supports 4-modal input: text / image / video / audio
Generation length supports 4s to 15s
Mode 1: First-Last Frame
Mode 2: Omni Reference
Supports precise references via @asset-name

Input and parameter limits

ItemLimit
Image inputUp to 9 images
Video inputUp to 3 clips, total ≤ 15 seconds
Audio inputUp to 3 tracks, total ≤ 15 seconds
Mixed assets totalUp to 12 assets
Output duration4-15 seconds

Prompt sample

Use @character-image, @scene-video, and @background-audio as references: start with 2 seconds of city-night aerial footage, transition to close-up with camera push, and end with a 2-second brand lockup.

Frequently asked questions

What input types does Seedance 2.0 support?

It supports text, image, video, and audio inputs that can be used separately or in combination.

What are the limits for video and audio inputs?

Video supports up to 3 clips with total length up to 15 seconds; audio supports up to 3 tracks with total length up to 15 seconds.

How many images can I upload?

You can upload up to 9 images in one generation task.

What is the supported output duration?

Output duration supports from 4 to 15 seconds.

When should I use first-last-frame mode?

Use it when you need explicit control over the start and end visual states of a shot.

When should I use omni-reference mode?

Use it for multi-asset style consistency and more complex narrative compositions.

How does @asset-name work?

Use @asset-name directly in your prompt to reference a specific uploaded asset precisely.

Does this page trigger real model generation now?

No. This release is a high-fidelity demo focused on flow and conversion validation.

Related internal links