Generate speech from text

Table of contents

June 25, 2026

  1. Model Comparison Matrix
  2. Voices
  3. Voice settings
  4. Audio tags (eleven-v3)
  5. Languages
  6. Request Headers
  7. Request Body
  8. Parameters
  9. Responses
  10. Model
  11. Examples
  12. Try It

Use a pixverse.ai account to turn text into speech. You pick a model, a voice (from GET speech/voices), and optional per-voice settings, and the API returns spoken audio as an .mp3.

Generation is asynchronous. The call returns an audio_id immediately, then poll GET speech/audio_id until audio_status_final is true, or pass a replyUrl to receive the result by callback.

Model Comparison Matrix
Model Voice settings Max chars PixVerse
Credits / Cost
per 1K chars *
Provider cost
per 1K chars **
ElevenLabs
eleven-multilingual-v2
stability, similarity_boost, speed, style 10,000 20 cr / $0.08 $0.10
ElevenLabs
eleven-v3
stability, similarity_boost, speed
+ inline audio tags
5,000 20 cr / $0.08 $0.10
ElevenLabs
eleven-turbo-v2.5
stability, similarity_boost, speed 40,000 10 cr / $0.04 $0.05
MiniMax
speech-2.8-hd (default)
speed, volume, pitch, emotion 10,000 20 cr / $0.08 $0.10
MiniMax
speech-2.8-turbo
speed, volume, pitch, emotion 10,000 10 cr / $0.04 $0.06

Billing is per-character, so 500 characters costs half. *PixVerse pricing is for the $60/mo Premium plan ($0.004 per credit) β€” see the cost calculator for other plans. **Provider cost is the price of going direct to the model’s own provider (their public rates, which change).

The two providers take different voice settings β€” MiniMax voices respond to emotion, ElevenLabs voices to stability / similarity_boost / style. Sending a setting to the wrong family is rejected with 400. See Voice settings below.

Voices

Every request needs a voice_id from GET speech/voices, filtered by model and language β€” the paired provider_voice_id is filled in for you automatically. The MiniMax catalog carries 300+ voices (male, female, and neutral), ElevenLabs ~20, each with gender, accent, and style_tags to help you choose.

Voice settings

All settings are optional β€” omit them for the voice’s natural delivery. Ranges are enforced per model family.

MiniMax (speech-2.8-hd, speech-2.8-turbo):

Setting Range Default Notes
speed 0.5 – 2 1 playback speed
volume 0 – 10 1 loudness
pitch -12 – 12 0 integer semitones
emotion enum auto auto, happy, sad, angry, fearful, disgusted, surprised, neutral, calm

ElevenLabs (eleven-multilingual-v2, eleven-v3, eleven-turbo-v2.5):

Setting Range Default Notes
stability 0 – 1 0.5 lower = more expressive, higher = more consistent
similarity_boost 0 – 1 0.75 adherence to the original voice
speed 0.7 – 1.2 1 playback speed
style 0 – 1 0 eleven-multilingual-v2 only β€” style exaggeration
use_speaker_boost boolean β€” eleven-multilingual-v2 only

style and use_speaker_boost are accepted only by eleven-multilingual-v2 β€” sending them to another model is rejected with 400.

Audio tags (eleven-v3)

eleven-v3 is ElevenLabs’ most expressive model β€” instead of an emotion setting, you direct the performance with inline audio tags placed right in the text, such as [whispers], [excited], [shouts], [sighs], [laughs], [sad], [curious], and [gasps]. Tags work best with longer, sentence-level text that gives the model room to perform.

"[whispers] The Force surrounds us. It binds the galaxy together.
[curious] Do you feel it?
[excited] Your destiny is calling...
[shouts] may the Force be with you!"
Languages

language_code is optional and validated per model against the live catalog from GET speech/models β€” MiniMax and the ElevenLabs v2 / turbo models support 30-40 languages (auto plus ISO codes like en, es, ja, zh). eleven-v3 auto-detects the language and rejects language_code with 400.

To retrieve generated speech, use:

To cancel a job that is still generating, use DEL scheduler/id.

https://api.useapi.net/v2/pixverse/speech/create

Request Headers
Authorization: Bearer {API token}
Content-Type: application/json
# Alternatively you can use multipart/form-data
# Content-Type: multipart/form-data
Request Body
{
    "email": "Optional PixVerse API account email",
    "model": "speech-2.8-hd",
    "text": "Required text to speak",
    "voice_id": "minimax_english_radiant_girl",
    "language_code": "en",
    "emotion": "happy",
    "replyUrl": "Place your call back URL here",
    "replyRef": "Place your reference id here",
    "maxJobs": 3
}
Parameters
  • email is optional, if not specified API will randomly select account from available accounts.

  • model is optional. Default: speech-2.8-hd. See Model Comparison Matrix.

  • text is required, the words to speak. Maximum length is model-specific β€” see the matrix (eleven-v3 5,000, eleven-turbo-v2.5 40,000, others 10,000). For eleven-v3 you may embed audio tags.

  • voice_id is required. A voice from GET speech/voices. Its paired provider_voice_id is derived automatically β€” you do not pass it.

  • language_code is optional, see Languages. Validated per model β€” rejected for eleven-v3.

  • speed, volume, pitch, emotion (MiniMax) and stability, similarity_boost, speed, style, use_speaker_boost (ElevenLabs) are optional voice settings. Settings from the wrong provider family are rejected with 400.

  • replyUrl is optional, place here your callback URL. This is the preferred and most optimal way to receive results quickly β€” the API polls every 10 seconds and will call the provided replyUrl once the audio is completed or failed. We recommend using sites like webhook.site to test callback URL functionality. Maximum length 1024 characters. Callback body has the same JSON shape as GET speech/audio_id response.

  • replyRef is optional, place here your reference id which will be stored and returned along with this speech response / result. Maximum length 1024 characters.

  • maxJobs is optional, if not specified value from selected accounts/email will be used. It should not exceed the number of concurrent generations supported by your account subscription plan. Valid range: 1…8

Responses
  • 200 OK

    Use the returned audio_id to retrieve status and results using GET speech/audio_id. Check audio_status_name for COMPLETED and url for the generated .mp3 link.

    If you specify the optional parameter replyUrl, the API will call the provided replyUrl with progress updates until the audio is complete or fails.

    {
        "audio_id": "user:<userid>-pixverse:<email>-speech:<number>",
        "asset_id": 409875792979281,
        "asset_type": 2,
        "asset_source": 1,
        "create_mode": "voice",
        "status": "making",
        "audio_status": 5,
        "credits": 1,
        "audio_status_name": "QUEUED",
        "audio_status_final": false
    }
    
  • 400 Bad Request

    Returned when a parameter is invalid, for example text longer than the model allows, an emotion on an ElevenLabs voice, stability on a MiniMax voice, or language_code on eleven-v3.

    {
      "error": "<Error message>",
      "code": 400
    }
    
  • 401 Unauthorized

    {
      "error": "Unauthorized"
    }
    
  • 412 Insufficient credits

    Insufficient credits. All Credits have been used up. Please upgrade your membership or purchase credits.

    {
      "error": "All Credits have been used up. Please upgrade your membership or purchase credits."
    }
    
  • 429 Too Many Requests

    Wait in a loop for at least 10..30 seconds and retry again.

    The API query is full and can not accept new speech/create requests. Size of the query is defined by the maxJobs optional parameter.

    {
        "error":
          "Account <email> is busy executing <maxJobs> tasks."
          "All configured accounts are running at maximum capacity."
    }
    
  • 596 Pending mod message

    Your PixVerse.ai account has a pending error. Most likely, you changed your account password or your PixVerse.ai account was placed on hold. Once the issue is resolved, update your account to clear the error by executing POST accounts/email before making any new API calls.

    {
      "error":
        "Your PixVerse account has pending error."
        "Please address this issue at https://useapi.net/docs/api-pixverse-v2/post-pixverse-accounts-email before making any new API calls."
    }
    
Model
{ // TypeScript, all fields are optional
    audio_id: string
    asset_id: number
    asset_type: number
    asset_source: number
    create_mode: string
    status: string
    audio_status: number
    credits: number
    error: string
    code: number
    // added
    audio_status_name: string
    audio_status_final: boolean
}
Examples
  • curl -H "Accept: application/json" \
         -H "Content-Type: application/json" \
         -H "Authorization: Bearer …" \
         -X POST "https://api.useapi.net/v2/pixverse/speech/create" \
         -d '{"model": "speech-2.8-hd", "text": "May the Force be with you.", "voice_id": "minimax_english_radiant_girl", "emotion": "happy"}'
    
  • const apiUrl = `https://api.useapi.net/v2/pixverse/speech/create`;
    const token = "API token";
    const data = {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${token}`,
        'Content-Type': 'application/json' }
    };
    data.body = JSON.stringify({
      model: "speech-2.8-hd",
      text: "May the Force be with you.",
      voice_id: "minimax_english_radiant_girl",
      emotion: "happy"
    });
    const response = await fetch(apiUrl, data);
    const result = await response.json();
    console.log("response", {response, result});
    
  • import requests
    apiUrl = f"https://api.useapi.net/v2/pixverse/speech/create"
    token = "API token"
    headers = {
        "Content-Type": "application/json",
        "Authorization" : f"Bearer {token}"
    }
    body = {
        "model": "speech-2.8-hd",
        "text": "May the Force be with you.",
        "voice_id": "minimax_english_radiant_girl",
        "emotion": "happy"
    }
    response = requests.post(apiUrl, headers=headers, json=body)
    print(response, response.json())
    
Try It

Pick a model first β€” the form then loads its real voice and language lists. Selecting a voice plays a preview.