PixVerse: Text-to-Speech with MiniMax and ElevenLabs

JUN 25, 2026

Text-to-Speech on PixVerse with MiniMax and ElevenLabs

June 25, 2026

The PixVerse API v2 now turns text into speech. POST /speech/create generates spoken audio across five models from two providers — MiniMax speech-2.8-hd (default) and speech-2.8-turbo, and ElevenLabs eleven-multilingual-v2, eleven-v3, and eleven-turbo-v2.5 — then you poll GET /speech/audio_id for the .mp3, or pass a replyUrl callback.

A request is just a model, the text, and a voice_id from GET /speech/voices (300+ MiniMax voices, ~20 ElevenLabs). MiniMax voices take an emotion setting, eleven-v3 takes inline audio tags, and both cover 40+ languages. On a Premium plan ($60/month) speech runs $0.04–$0.08 per 1,000 characters. Each sample below has the exact curl that produced it.

Audio tags — eleven-v3

eleven-v3 is ElevenLabs’ most expressive model. Instead of an emotion setting, you direct the delivery with inline tags like [whispers], [excited], and [shouts] placed right in the text — they work best on longer, sentence-level prompts.

curl — eleven-v3, a whisper that builds to a shout

curl --location 'https://api.useapi.net/v2/pixverse/speech/create' \
--header 'Authorization: Bearer user:12345-…' \
--form 'model="eleven-v3"' \
--form 'voice_id="eleven_roger"' \
--form 'text="[whispers] The Force surrounds us. It binds the galaxy together. [curious] Do you feel it? [excited] Your destiny is calling... [shouts] may the Force be with you!"'

curl — eleven-v3, a different emotional arc

curl --location 'https://api.useapi.net/v2/pixverse/speech/create' \
--header 'Authorization: Bearer user:12345-…' \
--form 'model="eleven-v3"' \
--form 'voice_id="eleven_sarah"' \
--form 'text="[sighs] For a thousand generations, the Jedi kept the peace. [sad] And then, we lost everything. [whispers] But hope is not gone. [excited] May the Force be with you!"'

Voice variety — MiniMax Speech 2.8

The same line — “Use API dot net” — on the fast, lower-cost speech-2.8-turbo, then a male speech-2.8-hd voice.

curl — speech-2.8-turbo, female voice

curl --location 'https://api.useapi.net/v2/pixverse/speech/create' \
--header 'Authorization: Bearer user:12345-…' \
--form 'model="speech-2.8-turbo"' \
--form 'voice_id="minimax_english_radiant_girl"' \
--form 'emotion="happy"' \
--form 'text="Use API dot net."'

curl — speech-2.8-hd, male voice

curl --location 'https://api.useapi.net/v2/pixverse/speech/create' \
--header 'Authorization: Bearer user:12345-…' \
--form 'model="speech-2.8-hd"' \
--form 'voice_id="minimax_english_magnetic_voiced_man"' \
--form 'emotion="happy"' \
--form 'text="Use API dot net."'

ElevenLabs Turbo

eleven-turbo-v2.5 is the fastest, lowest-cost ElevenLabs model, with a 40,000-character limit — here on a female voice reading “May the Force be with you”.

curl — eleven-turbo-v2.5, female voice

curl --location 'https://api.useapi.net/v2/pixverse/speech/create' \
--header 'Authorization: Bearer user:12345-…' \
--form 'model="eleven-turbo-v2.5"' \
--form 'voice_id="eleven_sarah"' \
--form 'text="May the Force be with you."'

Emotion — MiniMax

MiniMax voices take an emotion setting (happy, sad, angry, calm, and more) that colors the delivery. Here emotion: happy on the Expressive Narrator voice.

curl — speech-2.8-hd, emotion: happy

curl --location 'https://api.useapi.net/v2/pixverse/speech/create' \
--header 'Authorization: Bearer user:12345-…' \
--form 'model="speech-2.8-hd"' \
--form 'voice_id="minimax_english_expressive_narrator"' \
--form 'emotion="happy"' \
--form 'text="I can'\''t believe it. After everything we'\''ve been through... this is finally the moment. May the Force be with you."'

See the POST /speech/create reference for the full model matrix, per-voice settings, languages, and a live Try-It console — or browse the voices with GET /speech/voices.