Generate speech from text
Table of contents
June 25, 2026
- Model Comparison Matrix
- Voices
- Voice settings
- Audio tags (eleven-v3)
- Languages
- Request Headers
- Request Body
- Parameters
- Responses
- Model
- Examples
- Try It
Use a pixverse.ai account to turn text into speech. You pick a model, a voice (from GET speech/voices), and optional per-voice settings, and the API returns spoken audio as an .mp3.
Generation is asynchronous. The call returns an audio_id immediately, then poll GET speech/audio_id until audio_status_final is true, or pass a replyUrl to receive the result by callback.
Model Comparison Matrix
| Model | Voice settings | Max chars | PixVerse Credits / Cost per 1K chars * | Provider cost per 1K chars ** |
|---|---|---|---|---|
ElevenLabseleven-multilingual-v2 | stability, similarity_boost, speed, style | 10,000 | 20 cr / $0.08 | $0.10 |
ElevenLabseleven-v3 | stability, similarity_boost, speed + inline audio tags | 5,000 | 20 cr / $0.08 | $0.10 |
ElevenLabseleven-turbo-v2.5 | stability, similarity_boost, speed | 40,000 | 10 cr / $0.04 | $0.05 |
MiniMaxspeech-2.8-hd (default) | speed, volume, pitch, emotion | 10,000 | 20 cr / $0.08 | $0.10 |
MiniMaxspeech-2.8-turbo | speed, volume, pitch, emotion | 10,000 | 10 cr / $0.04 | $0.06 |
Billing is per-character, so 500 characters costs half. *PixVerse pricing is for the $60/mo Premium plan ($0.004 per credit) β see the cost calculator for other plans. **Provider cost is the price of going direct to the modelβs own provider (their public rates, which change).
The two providers take different voice settings β MiniMax voices respond to emotion, ElevenLabs voices to stability / similarity_boost / style. Sending a setting to the wrong family is rejected with 400. See Voice settings below.
Voices
Every request needs a voice_id from GET speech/voices, filtered by model and language β the paired provider_voice_id is filled in for you automatically. The MiniMax catalog carries 300+ voices (male, female, and neutral), ElevenLabs ~20, each with gender, accent, and style_tags to help you choose.
Voice settings
All settings are optional β omit them for the voiceβs natural delivery. Ranges are enforced per model family.
MiniMax (speech-2.8-hd, speech-2.8-turbo):
| Setting | Range | Default | Notes |
|---|---|---|---|
speed | 0.5 β 2 | 1 | playback speed |
volume | 0 β 10 | 1 | loudness |
pitch | -12 β 12 | 0 | integer semitones |
emotion | enum | auto | auto, happy, sad, angry, fearful, disgusted, surprised, neutral, calm |
ElevenLabs (eleven-multilingual-v2, eleven-v3, eleven-turbo-v2.5):
| Setting | Range | Default | Notes |
|---|---|---|---|
stability | 0 β 1 | 0.5 | lower = more expressive, higher = more consistent |
similarity_boost | 0 β 1 | 0.75 | adherence to the original voice |
speed | 0.7 β 1.2 | 1 | playback speed |
style | 0 β 1 | 0 | eleven-multilingual-v2 only β style exaggeration |
use_speaker_boost | boolean | β | eleven-multilingual-v2 only |
style and use_speaker_boost are accepted only by eleven-multilingual-v2 β sending them to another model is rejected with 400.
Audio tags (eleven-v3)
eleven-v3 is ElevenLabsβ most expressive model β instead of an emotion setting, you direct the performance with inline audio tags placed right in the text, such as [whispers], [excited], [shouts], [sighs], [laughs], [sad], [curious], and [gasps]. Tags work best with longer, sentence-level text that gives the model room to perform.
"[whispers] The Force surrounds us. It binds the galaxy together.
[curious] Do you feel it?
[excited] Your destiny is calling...
[shouts] may the Force be with you!"
Languages
language_code is optional and validated per model against the live catalog from GET speech/models β MiniMax and the ElevenLabs v2 / turbo models support 30-40 languages (auto plus ISO codes like en, es, ja, zh). eleven-v3 auto-detects the language and rejects language_code with 400.
To retrieve generated speech, use:
- GET speech/
audio_id - GET speech to list all speech
To cancel a job that is still generating, use DEL scheduler/id.
https://api.useapi.net/v2/pixverse/speech/create
Request Headers
Authorization: Bearer {API token}
Content-Type: application/json
# Alternatively you can use multipart/form-data
# Content-Type: multipart/form-data
API tokenis required, see Setup useapi.net for details.
Request Body
{
"email": "Optional PixVerse API account email",
"model": "speech-2.8-hd",
"text": "Required text to speak",
"voice_id": "minimax_english_radiant_girl",
"language_code": "en",
"emotion": "happy",
"replyUrl": "Place your call back URL here",
"replyRef": "Place your reference id here",
"maxJobs": 3
}
Parameters
-
emailis optional, if not specified API will randomly select account from available accounts. -
modelis optional. Default:speech-2.8-hd. See Model Comparison Matrix. -
textis required, the words to speak. Maximum length is model-specific β see the matrix (eleven-v35,000,eleven-turbo-v2.540,000, others 10,000). Foreleven-v3you may embed audio tags. -
voice_idis required. A voice from GET speech/voices. Its pairedprovider_voice_idis derived automatically β you do not pass it. -
language_codeis optional, see Languages. Validated per model β rejected foreleven-v3. -
speed,volume,pitch,emotion(MiniMax) andstability,similarity_boost,speed,style,use_speaker_boost(ElevenLabs) are optional voice settings. Settings from the wrong provider family are rejected with 400. -
replyUrlis optional, place here your callback URL. This is the preferred and most optimal way to receive results quickly β the API polls every 10 seconds and will call the providedreplyUrlonce the audio is completed or failed. We recommend using sites like webhook.site to test callback URL functionality. Maximum length 1024 characters. Callback body has the same JSON shape as GET speech/audio_idresponse. -
replyRefis optional, place here your reference id which will be stored and returned along with this speech response / result. Maximum length 1024 characters. -
maxJobsis optional, if not specified value from selected accounts/email will be used. It should not exceed the number of concurrent generations supported by your account subscription plan. Valid range: 1β¦8
Responses
-
Use the returned
audio_idto retrieve status and results using GET speech/audio_id. Checkaudio_status_nameforCOMPLETEDandurlfor the generated.mp3link.If you specify the optional parameter
replyUrl, the API will call the providedreplyUrlwith progress updates until the audio is complete or fails.{ "audio_id": "user:<userid>-pixverse:<email>-speech:<number>", "asset_id": 409875792979281, "asset_type": 2, "asset_source": 1, "create_mode": "voice", "status": "making", "audio_status": 5, "credits": 1, "audio_status_name": "QUEUED", "audio_status_final": false } -
Returned when a parameter is invalid, for example
textlonger than the model allows, anemotionon an ElevenLabs voice,stabilityon a MiniMax voice, orlanguage_codeoneleven-v3.{ "error": "<Error message>", "code": 400 } -
{ "error": "Unauthorized" } -
Insufficient credits. All Credits have been used up. Please upgrade your membership or purchase credits.
{ "error": "All Credits have been used up. Please upgrade your membership or purchase credits." } -
Wait in a loop for at least 10..30 seconds and retry again.
The API query is full and can not accept new speech/create requests. Size of the query is defined by the
maxJobsoptional parameter.{ "error": "Account <email> is busy executing <maxJobs> tasks." "All configured accounts are running at maximum capacity." } -
596 Pending mod message
Your PixVerse.ai account has a pending error. Most likely, you changed your account password or your PixVerse.ai account was placed on hold. Once the issue is resolved, update your account to clear the error by executing POST accounts/email before making any new API calls.
{ "error": "Your PixVerse account has pending error." "Please address this issue at https://useapi.net/docs/api-pixverse-v2/post-pixverse-accounts-email before making any new API calls." }
Model
{ // TypeScript, all fields are optional
audio_id: string
asset_id: number
asset_type: number
asset_source: number
create_mode: string
status: string
audio_status: number
credits: number
error: string
code: number
// added
audio_status_name: string
audio_status_final: boolean
}
Examples
-
curl -H "Accept: application/json" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer β¦" \ -X POST "https://api.useapi.net/v2/pixverse/speech/create" \ -d '{"model": "speech-2.8-hd", "text": "May the Force be with you.", "voice_id": "minimax_english_radiant_girl", "emotion": "happy"}' -
const apiUrl = `https://api.useapi.net/v2/pixverse/speech/create`; const token = "API token"; const data = { method: 'POST', headers: { 'Authorization': `Bearer ${token}`, 'Content-Type': 'application/json' } }; data.body = JSON.stringify({ model: "speech-2.8-hd", text: "May the Force be with you.", voice_id: "minimax_english_radiant_girl", emotion: "happy" }); const response = await fetch(apiUrl, data); const result = await response.json(); console.log("response", {response, result}); -
import requests apiUrl = f"https://api.useapi.net/v2/pixverse/speech/create" token = "API token" headers = { "Content-Type": "application/json", "Authorization" : f"Bearer {token}" } body = { "model": "speech-2.8-hd", "text": "May the Force be with you.", "voice_id": "minimax_english_radiant_girl", "emotion": "happy" } response = requests.post(apiUrl, headers=headers, json=body) print(response, response.json())
Try It
Pick a model first β the form then loads its real voice and language lists. Selecting a voice plays a preview.