Reusable Voices and Characters on Google Flow API
Google Flow API v1 now supports custom voices and reusable characters. Build a voice once, build a character once (1–2 reference images + that voice attached), then drop the character into any video or image generation — identity and voice stay consistent across runs.
| New endpoint | What it does |
|---|---|
| POST /voices | Create a custom voice on top of one of the 30 system voice presets |
| POST /characters | Create a character — 1–2 reference images + an optional attached voice |
character_1..7 on POST /videos | Use a character in a video generation |
character_1..7 on POST /images | Use a character in an image generation |
Plus the supporting GET /voices / GET /voices/ref / DELETE /voices/ref and the matching trio for characters.
Showcase: build a swaggering pirate captain
Below is the full pipeline. Every audio, image, and video clip was generated through the API.
1. Create the voice
Pick one of the 30 system voice presets as the base, then add your own dialog and voicePerformance to shape the delivery. Here we use Umbriel (a deep, gravelly preset) and direct it toward a drunken pirate growl.
curl — POST /voices
curl --location 'https://api.useapi.net/v1/google-flow/voices' \
--header 'Authorization: Bearer user:1234-…' \
--header 'Content-Type: application/json' \
--data '{
"email": "[email protected]",
"voice": "Umbriel",
"displayName": "Drunken Pirate Captain",
"dialog": "Where be me rum?! Speak now, ye worthless bilge rat, or I'\''ll feed yer entrails to the sharks!",
"voicePerformance": "Drunken pirate raging, deep booming voice with explosive growls, ragged breathing between threats."
}'
The response includes a voice reference-id and a freshly resolved audioUrl you can play back immediately:
2. Pick the character reference images
A character takes 1–2 reference images. Source them either from POST /images (Imagen / Nano Banana) or upload your own via POST /assets/email — either path returns a mediaGenerationId you can use.
Here we use two angles — one head-and-shoulders, one tighter close-up — so the character extractor has more identity coverage and downstream gens drift less.
3. Create the character
Bundle the two image mediaGenerationId values plus the voice reference-id into a named character with personalityNotes. The character ref is what you’ll pass to every downstream gen.
curl — POST /characters
curl --location 'https://api.useapi.net/v1/google-flow/characters' \
--header 'Authorization: Bearer user:1234-…' \
--header 'Content-Type: application/json' \
--data '{
"displayName": "The Pirate",
"personalityNotes": "A swaggering, perpetually half-drunk pirate captain with a sharp wit hiding behind a slurred drawl. Charming when sober (rarely), menacing when crossed, theatrical at all times. Speaks in a deep gravelly bellow that drops to a conspiratorial growl when sharing secrets. Superstitious — kisses his gold tooth before any risky move, refuses to harm a parrot. Always has a half-empty bottle within arm'\''s reach.",
"imageReference_1": "user:1234-email:…-image:7623b296-…",
"imageReference_2": "user:1234-email:…-image:b7165c9e-…",
"voice": "user:1234-email:…-voice:9f001ef8-…-mid:673330b9-…"
}'
The response is a portable character ref with imgs:2 and voice:… in its tail — pass it straight to character_1..7 on POST /videos or POST /images.
{
"entityId": "60831b31-…",
"character": "user:1234-email:…-character:60831b31-…-imgs:2-voice:9f001ef8-…",
"displayName": "The Pirate",
"imageReferences": [
{ "mediaId": "user:1234-email:…-image:7623b296-…" },
{ "mediaId": "user:1234-email:…-image:b7165c9e-…" }
],
"voice": "user:1234-email:…-voice:9f001ef8-…-mid:673330b9-…"
}
4. Generate the video
Use the character ref in character_1. The attached voice rides along automatically — no need to pass referenceAudio_* separately. Eight seconds, push-in from wide to close-up, the captain delivers his catchphrase on the close-up:
curl — POST /videos
curl --location 'https://api.useapi.net/v1/google-flow/videos' \
--header 'Authorization: Bearer user:1234-…' \
--header 'Content-Type: application/json' \
--data '{
"model": "veo-3.1-fast",
"aspectRatio": "landscape",
"duration": 8,
"character_1": "user:1234-email:…-character:60831b31-…-imgs:2-voice:9f001ef8-…",
"prompt": "Eight-second continuous shot. Opens on a wide full-figure view of the pirate captain on the sunlit deck of his galleon, leaning casually on the wooden rail with one boot up on a coiled rope, rum bottle dangling from one hand. A colorful parrot is perched on the rail beside him. Brilliant blue sky and turquoise sea visible behind him, sun glinting off the water. The camera slowly pushes in over the first four seconds, smoothly ending tight on his face. The instant the close-up locks, he lifts the rum bottle toward the camera in a mock toast, tilts his head with a sly half-grin, and delivers the line lipsynced clearly: \"I'\''ll be sober when I'\''m dead, and not a minute sooner!\" Bright midday sun on his face when the close-up holds. Smooth single take, no cuts."
}'
The character face is consistent with the reference images, the Umbriel voice carries through with the drunken-pirate performance direction, and the push-in lands the punchline on the close-up exactly where the prompt asked.
Why it matters
Without characters, every video generation needs you to pass the same referenceImage_* set and re-explain the look in the prompt. Drift compounds across runs and the look slowly mutates. With a saved character:
- One reference id replaces every image + voice you’d otherwise re-attach
- The image-ref budget check happens server-side from the inline image count
- Personality notes travel with the character so the model has consistent flavor across gens
- Voices follow the same lifecycle — create once, reuse anywhere they’d otherwise need to be re-attached
Get started
Google Flow API v1 docs · Setup Google Flow · POST /voices · POST /characters