Table of contents

  1. Demo 1: noir clip with three inline @-mentions
    1. Source images
    2. Create the voice
    3. Create the character
    4. Generate the noir clip
  2. Demo 2: V2V edit — round-trip the noir clip and add two more characters
    1. Two more character sources
    2. Get a referenceVideo_1 for the V2V edit
      1. V2V edit constraints — heads up
    3. V2V edit with inline @character_1 + @character_2
  3. Why inline @-mentions matter
  4. Get started

Inline @-mentions on Google Flow API

Google Flow API v1 now supports inline @-mention markers in the prompt on POST /videos and POST /images. Place your character / image / voice references inside the prompt — @character_1, @referenceImage_2, @referenceAudio_1 — and the API wires them to the corresponding body slot at exactly that position in the sentence.

Endpoint Supported markers
POST /videos @character_1..7, @referenceImage_1..7, @referenceAudio_1..5
POST /images @character_1..7, @reference_1..10

Case-insensitive, opt-in (prompts without markers are unchanged), each marker must have its matching body slot supplied. Plain text prompts continue to work exactly like before.

This post walks two demos end-to-end:

  1. omni-flash R2V with three inline @-refs — a moonlit San Francisco noir clip.
  2. omni-flash V2V edit that round-trips the noir clip back to the API as a source video, then composites two new characters into it via inline @character_1 / @character_2.

All assets, references, voices, characters, and the two generated videos below were produced through the public Google Flow API v1.


Demo 1: noir clip with three inline @-mentions

The shot we want: a self-assured biker character sits on a vintage BMW motorcycle in an empty parking lot at night, Golden Gate Bridge in the far distance, San Francisco neo-noir atmosphere, slowly revs the engine, then delivers a single line in a deep raspy voice.

Source images

Three images uploaded via POST /assets/email — each used in a different role below.

Smoker Biker character source
Smoker Biker — character source
(face reference passed to POST /characters)
Vintage BMW motorcycle reference
MotorcyclereferenceImage_1
(@referenceImage_1 in the prompt)
Golden Gate Bridge reference
Golden Gate BridgereferenceImage_2
(@referenceImage_2 in the prompt)

Create the voice

A custom voice for the biker — Laomedeia base preset with a smoke-roughened performance direction.

curlPOST /voices
curl --location 'https://api.useapi.net/v1/google-flow/voices' \
--header 'Authorization: Bearer user:1234-…' \
--header 'Content-Type: application/json' \
--data '{
    "email": "[email protected]",
    "voice": "Laomedeia",
    "displayName": "Smoker Lady",
    "dialog": "Honey, you have no idea what I'\''ve seen in this town.",
    "voicePerformance": "Deep raspy smoke-roughened, low gravelly pitch, slow drawl with audible exhales, world-weary"
}'

Create the character

Bundle the character image and the custom voice into a single named character.

curlPOST /characters
curl --location 'https://api.useapi.net/v1/google-flow/characters' \
--header 'Authorization: Bearer user:1234-…' \
--header 'Content-Type: application/json' \
--data '{
    "displayName": "Smoker Biker",
    "imageReference_1": "user:1234-email:…-image:129960d4-…",
    "voice": "user:1234-email:…-voice:e25b42db-…-mid:fec00c57-…"
}'

Generate the noir clip

Three inline markers do the work. Notice the prompt does not describe what @referenceImage_1 or @referenceImage_2 are — no “motorcycle”, no “bridge”, no “Golden Gate” mentioned in the text. Identity comes purely from the references; the model uses the inline position to anchor each one in the noun phrase.

curlPOST /videos
curl --location 'https://api.useapi.net/v1/google-flow/videos' \
--header 'Authorization: Bearer user:1234-…' \
--header 'Content-Type: application/json' \
--data '{
    "model": "omni-flash",
    "aspectRatio": "landscape",
    "duration": 8,
    "character_1":      "user:1234-email:…-character:7bf2bc7f-…-imgs:1-voice:e25b42db-…",
    "referenceImage_1": "user:1234-email:…-image:6139e134-…",
    "referenceImage_2": "user:1234-email:…-image:bb5032e0-…",
    "prompt": "OPEN ON ambient neo-noir synthwave score already playing — low pulsing bass and distant saxophone. @character_1 sitting on @referenceImage_1 in an empty parking lot at night, hands on the controls. Clear sky, sharp moonlight, deep crisp shadows under tall amber street lamps, dry asphalt with faint road markings. @referenceImage_2 visible in the far distance. Iconic neo-noir atmosphere. Static locked-off camera, no zoom, no pan, no movement. @character_1 twists throttle once and revs @referenceImage_1 once — smoke briefly rises from @referenceImage_1 — then eases off, looks at the camera with a knowing half-smile and says slowly \"What a night for a ride.\""
}'

The 8-second result:

Every @-mention grounded to its exact reference: the same specific BMW vintage motorcycle, the same Golden Gate Bridge in the distance, the same character identity from the source image — plus the smoke-roughened custom voice on the spoken line. Music kicks in from frame one.

Demo 2: V2V edit — round-trip the noir clip and add two more characters

For the next shot, we want to take the noir clip above and add two more characters flanking the biker — one on the left dancing, one on the right jumping with arms raised. This exercises omni-flash’s V2V edit path via referenceVideo_1 combined with inline @character_* markers.

Two more character sources

Two more images uploaded via POST /assets/email, then turned into characters via POST /characters (no voice attached this time).

Left character source
character_1 — LEFT side
(@character_1 in the prompt — dances)
Right character source
character_2 — RIGHT side
(@character_2 in the prompt — jumps with arms raised)

Get a referenceVideo_1 for the V2V edit

Two ways to get one:

  • Reuse the generated video’s mediaGenerationId directly from the Demo 1 response (media[0].mediaGenerationId) — no download or re-upload needed.
  • Upload an MP4 from disk via POST /assets/email with Content-Type: video/mp4 — for video sources that aren’t already in your Flow history.

We’ll use the first.

V2V edit constraints — heads up

V2V edit has tighter slot caps than R2V (5 image refs vs 7, 3 audio vs 5), accepts only omni-flash, and uses endFrameIndex_1 instead of duration to size the output. Full constraints in POST /videos → Model Capabilities.

One thing worth knowing from this demo: Google’s V2V moderation is non-deterministic. The exact same body can land on a content-safety reject one attempt and a clean success the next. A simple resubmit usually clears it — no input changes needed.

V2V edit with inline @character_1 + @character_2

referenceVideo_1 switches the request into V2V edit mode. The inline @character_* markers place each new character at the specified position in the scene without ever describing what they look like in the text — identity, again, comes entirely from the references.

curlPOST /videos (V2V edit)
curl --location 'https://api.useapi.net/v1/google-flow/videos' \
--header 'Authorization: Bearer user:1234-…' \
--header 'Content-Type: application/json' \
--data '{
    "model": "omni-flash",
    "aspectRatio": "landscape",
    "character_1":      "user:1234-email:…-character:5116516f-…-imgs:1",
    "character_2":      "user:1234-email:…-character:1555335b-…-imgs:1",
    "referenceVideo_1": "user:1234-email:…-video:19ea81d0-…",
    "prompt": "@character_1 on the LEFT side of the frame busts some cool noir-style dance moves — slow confident shoulder rolls, smooth hip sway, runs a hand through her hair, fitting the moody parking-lot vibe. @character_2 on the RIGHT side of the frame jumps up and down enthusiastically with both arms raised, beaming smile. The center character stays where she is. Same noir parking lot at night, same Golden Gate behind, same moonlit atmosphere from the source video. Static locked-off camera, no zoom, no pan, no movement."
}'

The result:

Three distinct characters in one frame, each anchored to a specific reference: the original biker preserved in place from the source video, plus the two new flanking characters in their assigned positions and actions. Moon, bridge, parking lot, and amber street lamps all carried through unchanged from the source.

Why inline @-mentions matter

Without inline markers, a video gen with three references says “use these for context” but doesn’t tell the model where each one belongs. The model has to guess from word order. With markers:

  • Positional grounding. The model knows @character_1 goes in the noun phrase the marker sits in — not somewhere else.
  • No descriptive cheating needed. You don’t have to describe what the reference is in the prompt ("the chrome motorcycle", "the Golden Gate Bridge") — identity comes from the reference. The prompt stays neutral and the model has to use the ref to render correctly.
  • Multi-mention reuse. Mention the same @character_1 or @referenceImage_1 multiple times across one prompt. The top-level reference array de-duplicates by id; only the positional information stacks.
  • Same wire shape across endpoints. Both /videos (R2V, V2V edit) and /images accept the same markers — pick the right slot name for the endpoint (@reference_N on /images, @referenceImage_N on /videos).

Get started

Google Flow API v1 docs · Setup Google Flow · POST /videos · POST /images · POST /characters · POST /voices