December 5, 2025 (April 5, 2026)
Table of contents
- Available Video Models
- Quality & Duration Matrix
- Audio Features by Model
- v6 Audio and Multi-Shot
- v5.5/v5.6 Native Audio
- Feature Comparison
- Video Endpoint Compatibility
- Video Endpoint Constraints
- Quality Tier Requirements
- Available Image Models
- Image Aspect Ratios
- Image Endpoints
- Unlimited Image Generation (Relax Mode)
Video Models
Available Video Models
| Model | Description |
|---|---|
| v6 | Latest model with audio, multi-shot, up to 15s at all qualities |
| v5.6 | Audio generation |
| v5.5 | Audio generation |
| v5 | Lip sync TTS, sound effects |
| v5-fast | Fast generation, no audio features |
Quality & Duration Matrix
v6 supports duration 1-15 seconds at all qualities including 1080p.
Pre-v6 models support duration 1-10 seconds, except:
- 1080p is limited to 1-8 seconds maximum
- create-frames endpoint is limited to 1-8 seconds
| Quality | v6 Duration | v5.x Duration | Notes |
|---|---|---|---|
| 360p | 1-15s | 1-10s | All tiers |
| 540p | 1-15s | 1-10s | All tiers |
| 720p (v6 default) | 1-15s | 1-10s | Standard+ |
| 1080p | 1-15s | 1-8s | Pro/Premium |
Audio Features by Model
| Model | Audio Feature | Usage |
|---|---|---|
| v6 | Integrated audio + multi-shot | audio: true, multi_shot: true |
| v5.6 | Integrated audio | audio: true |
| v5.5 | Integrated audio | audio: true |
| v5 | Lip sync + Sound effects | lip_sync_tts_prompt + sound_effect_prompt |
| v5-fast | None | No audio support |
v6 Audio and Multi-Shot
Model v6 features integrated audio generation and multi-shot storytelling — the AI generates videos with multiple camera angles and scene cuts, with voice, lip sync, and background music generated together in a single step.
{
"model": "v6",
"audio": true,
"multi_shot": true,
"prompt": "A detective says 'Follow the clues' then cut to a suspect running through an alley"
}
v5.5/v5.6 Native Audio
Models v5.5 and v5.6 feature integrated audio generation - voice, lip sync, and background music are generated together with the video in a single step.
{
"model": "v5.6",
"audio": true,
"prompt": "A woman says hello and waves at the camera"
}
| v5.5/v5.6/v6 Audio | v5 Audio |
|---|---|
Use audio: true | Use lip_sync_tts_prompt + sound_effect_prompt |
| Voice integrated with video | Separate lipsync step available |
| Background music auto-generated | Manual via sound_effect_prompt |
| Lipsync endpoint not supported | Lipsync endpoint supported |
Feature Comparison
| Feature | v6 | v5.6 | v5.5 | v5 | v5-fast |
|---|---|---|---|---|---|
| Native Audio | yes | yes | yes | - | - |
| Multi-Shot | yes | - | - | - | - |
| Lip Sync TTS | - | - | - | yes | - |
| Sound Effects | - | - | - | yes | - |
| Duration 1-15s | yes | - | - | - | - |
| Duration 1-10s | yes | yes | yes | yes | yes |
| Preview Mode | yes | yes | yes | yes | yes |
Video Endpoint Compatibility
| Endpoint | v6 | v5.6 | v5.5 | v5 | v5-fast |
|---|---|---|---|---|---|
| POST videos/create | yes | yes | yes | yes | yes |
| POST videos/create-frames | - | yes | yes | yes | yes |
| POST videos/create-transition (2-frame) | - | yes | yes | yes | - |
| POST videos/create-transition (3+ frame) | - | - | - | yes | - |
| POST videos/extend | yes | - | yes | yes | - |
| POST videos/modify | - | - | yes | - | - |
| POST videos/upscale | yes | yes | yes | yes | yes |
| POST videos/lipsync | - | - | - | yes | - |
| POST videos/create-fusion | - | - | - | yes | - |
Video Endpoint Constraints
| Endpoint | Supported Models | Max Duration | Max Quality |
|---|---|---|---|
| create | v6, v5.6, v5.5, v5, v5-fast | 15s (v6), 10s/8s@1080p (v5.x) | 1080p |
| create-frames | v5.6, v5.5, v5, v5-fast | 8s | 1080p |
| create-transition (2-frame) | v5.6, v5.5, v5 | 8s | 1080p |
| create-transition (3+ frame) | v5 only | 8s | 1080p |
| extend | v6, v5.5, v5 | 15s (v6), 10s/8s@1080p (v5.x) | 1080p |
| modify | v5.5 only | source video | 720p |
| lipsync | v5 only | source video | source |
| fusion | v5 only | 10s (8s@1080p) | 1080p |
Quality Tier Requirements
- 360p: All subscription tiers
- 540p: All subscription tiers
- 720p: Standard or higher (v6 default)
- 1080p: Pro/Premium
Image Models
Available Image Models
| Model | Qualities | Max Refs | Est. Time | Default |
|---|---|---|---|---|
| qwen-image | 720p, 1080p | 3 | ~3s | default |
| nano-banana | 1080p | 3 | ~10s | |
| seedream-4.0 | 1080p, 1440p, 2160p | 6 | ~10s | |
| seedream-4.5 | 1440p, 2160p | 6 | ~15s | |
| nano-banana-2 | 512p, 1080p, 1440p, 2160p | 9 | ~30s | |
| seedream-5.0-lite | 1440p, 1800p | 6 | ~30s | |
| nano-banana-pro | 1080p, 1440p, 2160p | 9 | ~60s |
Image Aspect Ratios
All image models support: 1:1, 16:9, 9:16, 4:3, 3:4, 5:4, 4:5, 3:2, 2:3, 21:9.
All models except qwen-image also support auto (default). qwen-image defaults to 1:1.
Image Endpoints
All image models are supported by the following endpoints:
Unlimited Image Generation (Relax Mode)
Pro+ subscription plans include unlimited image generation in Relax Mode for select models with generation times ~3s to ~60s based on the model. The number of models with unlimited access increases with higher tiers:
| Plan | Price | Unlimited Models |
|---|---|---|
| Pro | $30/m | qwen-image |
| Premium | $60/m | qwen-image + selectively others |
| Ultra | $199/m | ALL models |