Seedance 2.0

Seedance 2.0 Multi-modal Video Creation

As BestVid’s first matrix launch page, Seedance 2.0 focuses on multi-modal fusion and controllable generation from text to mixed asset workflows.

Supports 4-modal input: text / image / video / audio

Generation length supports 4s to 15s

Mode 1: First-Last Frame

Mode 2: Omni Reference

Supports precise references via @asset-name

Input and parameter limits

Item	Limit
Image input	Up to 9 images
Video input	Up to 3 clips, total ≤ 15 seconds
Audio input	Up to 3 tracks, total ≤ 15 seconds
Mixed assets total	Up to 12 assets
Output duration	4-15 seconds

Prompt sample

Use @character-image, @scene-video, and @background-audio as references: start with 2 seconds of city-night aerial footage, transition to close-up with camera push, and end with a 2-second brand lockup.

Frequently asked questions

What input types does Seedance 2.0 support?

It supports text, image, video, and audio inputs that can be used separately or in combination.

What are the limits for video and audio inputs?

Video supports up to 3 clips with total length up to 15 seconds; audio supports up to 3 tracks with total length up to 15 seconds.

How many images can I upload?

You can upload up to 9 images in one generation task.

What is the supported output duration?

Output duration supports from 4 to 15 seconds.

When should I use first-last-frame mode?

Use it when you need explicit control over the start and end visual states of a shot.

When should I use omni-reference mode?

Use it for multi-asset style consistency and more complex narrative compositions.

How does @asset-name work?

Use @asset-name directly in your prompt to reference a specific uploaded asset precisely.

Does this page trigger real model generation now?

No. This release is a high-fidelity demo focused on flow and conversion validation.

Seedance 2.0 Multi-modal Video Creation

Input and parameter limits

Prompt sample

Frequently asked questions

Related internal links