JUN 25, 2026
Text-to-Speech on PixVerse with MiniMax and ElevenLabs
June 25, 2026
The PixVerse API v2 now turns text into speech. POST /speech/create generates spoken audio across five models from two providers β MiniMax speech-2.8-hd (default) and speech-2.8-turbo, and ElevenLabs eleven-multilingual-v2, eleven-v3, and eleven-turbo-v2.5 β then you poll GET /speech/audio_id for the .mp3, or pass a replyUrl callback.
A request is just a model, the text, and a voice_id from GET /speech/voices (300+ MiniMax voices, ~20 ElevenLabs). MiniMax voices take an emotion setting, eleven-v3 takes inline audio tags, and both cover 40+ languages. On a Premium plan ($60/month) speech runs $0.04β$0.08 per 1,000 characters. Each sample below has the exact curl that produced it.
Audio tags β eleven-v3
eleven-v3 is ElevenLabsβ most expressive model. Instead of an emotion setting, you direct the delivery with inline tags like [whispers], [excited], and [shouts] placed right in the text β they work best on longer, sentence-level prompts.
curl β eleven-v3, a whisper that builds to a shout
curl --location 'https://api.useapi.net/v2/pixverse/speech/create' \
--header 'Authorization: Bearer user:12345-β¦' \
--form 'model="eleven-v3"' \
--form 'voice_id="eleven_roger"' \
--form 'text="[whispers] The Force surrounds us. It binds the galaxy together. [curious] Do you feel it? [excited] Your destiny is calling... [shouts] may the Force be with you!"'
curl β eleven-v3, a different emotional arc
curl --location 'https://api.useapi.net/v2/pixverse/speech/create' \
--header 'Authorization: Bearer user:12345-β¦' \
--form 'model="eleven-v3"' \
--form 'voice_id="eleven_sarah"' \
--form 'text="[sighs] For a thousand generations, the Jedi kept the peace. [sad] And then, we lost everything. [whispers] But hope is not gone. [excited] May the Force be with you!"'
Voice variety β MiniMax Speech 2.8
The same line β βUse API dot netβ β on the fast, lower-cost speech-2.8-turbo, then a male speech-2.8-hd voice.
curl β speech-2.8-turbo, female voice
curl --location 'https://api.useapi.net/v2/pixverse/speech/create' \
--header 'Authorization: Bearer user:12345-β¦' \
--form 'model="speech-2.8-turbo"' \
--form 'voice_id="minimax_english_radiant_girl"' \
--form 'emotion="happy"' \
--form 'text="Use API dot net."'
curl β speech-2.8-hd, male voice
curl --location 'https://api.useapi.net/v2/pixverse/speech/create' \
--header 'Authorization: Bearer user:12345-β¦' \
--form 'model="speech-2.8-hd"' \
--form 'voice_id="minimax_english_magnetic_voiced_man"' \
--form 'emotion="happy"' \
--form 'text="Use API dot net."'
ElevenLabs Turbo
eleven-turbo-v2.5 is the fastest, lowest-cost ElevenLabs model, with a 40,000-character limit β here on a female voice reading βMay the Force be with youβ.
curl β eleven-turbo-v2.5, female voice
curl --location 'https://api.useapi.net/v2/pixverse/speech/create' \
--header 'Authorization: Bearer user:12345-β¦' \
--form 'model="eleven-turbo-v2.5"' \
--form 'voice_id="eleven_sarah"' \
--form 'text="May the Force be with you."'
Emotion β MiniMax
MiniMax voices take an emotion setting (happy, sad, angry, calm, and more) that colors the delivery. Here emotion: happy on the Expressive Narrator voice.
curl β speech-2.8-hd, emotion: happy
curl --location 'https://api.useapi.net/v2/pixverse/speech/create' \
--header 'Authorization: Bearer user:12345-β¦' \
--form 'model="speech-2.8-hd"' \
--form 'voice_id="minimax_english_expressive_narrator"' \
--form 'emotion="happy"' \
--form 'text="I can'\''t believe it. After everything we'\''ve been through... this is finally the moment. May the Force be with you."'
See the POST /speech/create reference for the full model matrix, per-voice settings, languages, and a live Try-It console β or browse the voices with GET /speech/voices.