GET /models returns the current list.
Image
| Mode | Description |
|---|---|
| Text to image | Generate from a text prompt |
| Image to image | Edit or transform an existing image |
| Ingredients to image | Combine multiple reference images into a new image |
Available image models
| Model | ID | Modes | Aspect ratios | Max refs | Max variants |
|---|---|---|---|---|---|
| Nano Banana Pro | google-nano-banana-pro | text, image, references | 16:9, 9:16, 21:9, 1:1 | 5 | 1 |
| Nano Banana 2 | google-nano-banana-2 | text, image, references | 16:9, 9:16, 21:9, 1:1 | 14 | 1 |
| Nano Banana | google-nano-banana | text, image, references | 16:9, 9:16, 21:9, 1:1 | 3 | 1 |
| Seedream 5.0 Lite | seedream-5 | text, image, references | 16:9, 9:16, 21:9, 1:1 | 14 | 5 |
| Seedream 4.5 | seedream-4.5 | text, image, references | 16:9, 9:16, 21:9, 1:1 | 10 | 5 |
| Flux 2 Max | flux-2-max | text, image, references | 16:9, 9:16 | 8 | 1 |
| GPT Image 1.5 | gpt-image-1.5 | text, image, references | 16:9, 9:16 | 10 | 5 |
| Grok Imagine Pro | grok-imagine-image-pro | text, image | 16:9, 9:16, 1:1 | — | 1 |
| Riverflow 2.0 Pro | riverflow-2-pro | text, image, references | 16:9, 9:16, 21:9, 1:1 | 10 | 1 |
Video
| Mode | Description |
|---|---|
| Text to video | Generate from a text prompt |
| Image to video | Generate video from a starting image |
| Ingredients to video | Combine multiple reference images into a video |
| Video to video | Edit or transform an existing video |
| Audio to video | Generate video synced to speech or music |
| Multi-shot | Multiple shots in a single generation, each with its own direction |
| Lip-sync | Generate video synced to a voiceover |
| Start + end frame | Set the first and last frame, video is generated between them |
Available video models
| Model | ID | Modes | Durations | Audio | Key features |
|---|---|---|---|---|---|
| Kling 3.0 Omni | kling-3.0-omni | image, video, references, multi-shot, end frame | 3–15s | Yes | 7 ref images, 1 ref video, V2V |
| Kling 3.0 | kling-3.0 | image, multi-shot, end frame | 3–15s | Yes | 1080p |
| Kling O1 Edit | kling-o1 | image, video, references, end frame | 3–10s | No | V2V editing, 7 ref images, preserves original sound |
| Kling 2.6 | kling-2.6 | text, image | 5, 10s | Yes | Negative prompt, 1080p |
| Veo 3.1 | google-veo-3.1 | text, image, end frame | 4, 6, 8s | Yes | Up to 1080p |
| Veo 3.1 Fast | google-veo-3.1-fast | text, image, end frame | 4, 6, 8s | Yes | Up to 1080p |
| Seedance 1.5 Pro | seedance-1.5-pro | text, image, end frame | 4–12s | Yes | All aspect ratios, 1080p |
| Wan 2.6 | wan-2.6 | text, image, audio, end frame, lip-sync | 5, 10, 15s | Always on | Audio input sync |
| LTX 2.3 Pro | ltx-2.3-pro | text, image, end frame | 6, 8, 10s | Yes | Camera motion (dolly, jib, tracking, static, focus shift), up to 4K |
| LTX 2.3 Fast | ltx-2.3-fast | text, image, end frame | 6–20s | Yes | Camera motion, up to 4K, longest durations |
| Sora 2 Pro | openai-sora-2-pro | text, image | 4, 8, 12s | Always on | Up to 1024p |
| Sora 2 | openai-sora-2 | text, image | 4, 8, 12s | Always on | 720p |
| Grok Imagine Video | grok-imagine-video | text, image, video | 1–15s | Always on | V2V, flexible durations, 720p |
Audio
| Type | Description |
|---|---|
| Voiceover | Generate speech across 60+ languages |
| Music | Generate music from text prompts |
| Sound effects | Generate sound effects and ambient audio |
Voiceover
| Model | ID | Key features |
|---|---|---|
| ElevenLabs v3 | elevenlabs | 60+ languages, tone control, voice transform, adjustable speed |
Music
| Model | ID | Duration | Duration control |
|---|---|---|---|
| ElevenLabs Music | elevenlabs | 3s–10 min | Yes |
| MiniMax Music 2.5 | minimax | Varies by lyrics length | No |
| Google Lyria 3 Pro | google | Up to 3 min | No |
Sound effects
| Model | ID | Duration | Key features |
|---|---|---|---|
| ElevenLabs SFX | elevenlabs | 0.5–22s | Looping, prompt influence control |