POST speech/create

Generate speech from text

June 25, 2026

Model Comparison Matrix
Voices
Voice settings
Audio tags (eleven-v3)
Languages
Request Headers
Request Body
Parameters
Responses
Model
Examples
Try It

Use a pixverse.ai account to turn text into speech. You pick a model, a voice (from GET speech/voices), and optional per-voice settings, and the API returns spoken audio as an .mp3.

Generation is asynchronous. The call returns an audio_id immediately, then poll GET speech/audio_id until audio_status_final is true, or pass a replyUrl to receive the result by callback.

Model Comparison Matrix

Model	Voice settings	Max chars	PixVerse Credits / Cost per 1K chars *	Provider cost per 1K chars **
ElevenLabs `eleven-multilingual-v2`	stability, similarity_boost, speed, style	10,000	20 cr / $0.08	$0.10
ElevenLabs `eleven-v3`	stability, similarity_boost, speed + inline audio tags	5,000	20 cr / $0.08	$0.10
ElevenLabs `eleven-turbo-v2.5`	stability, similarity_boost, speed	40,000	10 cr / $0.04	$0.05
MiniMax `speech-2.8-hd` (default)	speed, volume, pitch, emotion	10,000	20 cr / $0.08	$0.10
MiniMax `speech-2.8-turbo`	speed, volume, pitch, emotion	10,000	10 cr / $0.04	$0.06

Billing is per-character, so 500 characters costs half. *PixVerse pricing is for the $60/mo Premium plan ($0.004 per credit) — see the cost calculator for other plans. **Provider cost is the price of going direct to the model’s own provider (their public rates, which change).

The two providers take different voice settings — MiniMax voices respond to emotion, ElevenLabs voices to stability / similarity_boost / style. Sending a setting to the wrong family is rejected with 400. See Voice settings below.

Voices

Every request needs a voice_id from GET speech/voices, filtered by model and language — the paired provider_voice_id is filled in for you automatically. The MiniMax catalog carries 300+ voices (male, female, and neutral), ElevenLabs ~20, each with gender, accent, and style_tags to help you choose.

Voice settings

All settings are optional — omit them for the voice’s natural delivery. Ranges are enforced per model family.

MiniMax (speech-2.8-hd, speech-2.8-turbo):

Setting	Range	Default	Notes
`speed`	0.5 – 2	1	playback speed
`volume`	0 – 10	1	loudness
`pitch`	-12 – 12	0	integer semitones
`emotion`	enum	`auto`	`auto`, `happy`, `sad`, `angry`, `fearful`, `disgusted`, `surprised`, `neutral`, `calm`

ElevenLabs (eleven-multilingual-v2, eleven-v3, eleven-turbo-v2.5):

Setting	Range	Default	Notes
`stability`	0 – 1	0.5	lower = more expressive, higher = more consistent
`similarity_boost`	0 – 1	0.75	adherence to the original voice
`speed`	0.7 – 1.2	1	playback speed
`style`	0 – 1	0	`eleven-multilingual-v2` only — style exaggeration
`use_speaker_boost`	boolean	—	`eleven-multilingual-v2` only

style and use_speaker_boost are accepted only by eleven-multilingual-v2 — sending them to another model is rejected with 400.

Audio tags (eleven-v3)

eleven-v3 is ElevenLabs’ most expressive model — instead of an emotion setting, you direct the performance with inline audio tags placed right in the text, such as [whispers], [excited], [shouts], [sighs], [laughs], [sad], [curious], and [gasps]. Tags work best with longer, sentence-level text that gives the model room to perform.

"[whispers] The Force surrounds us. It binds the galaxy together.
[curious] Do you feel it?
[excited] Your destiny is calling...
[shouts] may the Force be with you!"

Languages

language_code is optional and validated per model against the live catalog from GET speech/models — MiniMax and the ElevenLabs v2 / turbo models support 30-40 languages (auto plus ISO codes like en, es, ja, zh). eleven-v3 auto-detects the language and rejects language_code with 400.

To retrieve generated speech, use:

GET speech/audio_id
GET speech to list all speech

To cancel a job that is still generating, use DEL scheduler/id.

https://api.useapi.net/v2/pixverse/speech/create

Request Headers

Authorization: Bearer {API token}
Content-Type: application/json
# Alternatively you can use multipart/form-data
# Content-Type: multipart/form-data

API token is required, see Setup useapi.net for details.

Request Body

{
    "email": "Optional PixVerse API account email",
    "model": "speech-2.8-hd",
    "text": "Required text to speak",
    "voice_id": "minimax_english_radiant_girl",
    "language_code": "en",
    "emotion": "happy",
    "replyUrl": "Place your call back URL here",
    "replyRef": "Place your reference id here",
    "maxJobs": 3
}

Parameters

email is optional, if not specified API will randomly select account from available accounts.
model is optional. Default: speech-2.8-hd. See Model Comparison Matrix.
text is required, the words to speak. Maximum length is model-specific — see the matrix (eleven-v3 5,000, eleven-turbo-v2.5 40,000, others 10,000). For eleven-v3 you may embed audio tags.
voice_id is required. A voice from GET speech/voices. Its paired provider_voice_id is derived automatically — you do not pass it.
language_code is optional, see Languages. Validated per model — rejected for eleven-v3.
speed, volume, pitch, emotion (MiniMax) and stability, similarity_boost, speed, style, use_speaker_boost (ElevenLabs) are optional voice settings. Settings from the wrong provider family are rejected with 400.
replyUrl is optional, place here your callback URL. This is the preferred and most optimal way to receive results quickly — the API polls every 10 seconds and will call the provided replyUrl once the audio is completed or failed. We recommend using sites like webhook.site to test callback URL functionality. Maximum length 1024 characters. Callback body has the same JSON shape as GET speech/audio_id response.
replyRef is optional, place here your reference id which will be stored and returned along with this speech response / result. Maximum length 1024 characters.
maxJobs is optional, if not specified value from selected accounts/email will be used. It should not exceed the number of concurrent generations supported by your account subscription plan. Valid range: 1…8

Responses

200 OK

Use the returned audio_id to retrieve status and results using GET speech/audio_id. Check audio_status_name for COMPLETED and url for the generated .mp3 link.

If you specify the optional parameter replyUrl, the API will call the provided replyUrl with progress updates until the audio is complete or fails.

{
    "audio_id": "user:<userid>-pixverse:<email>-speech:<number>",
    "asset_id": 409875792979281,
    "asset_type": 2,
    "asset_source": 1,
    "create_mode": "voice",
    "status": "making",
    "audio_status": 5,
    "credits": 1,
    "audio_status_name": "QUEUED",
    "audio_status_final": false
}

400 Bad Request

Returned when a parameter is invalid, for example text longer than the model allows, an emotion on an ElevenLabs voice, stability on a MiniMax voice, or language_code on eleven-v3.
```
{
  "error": "<Error message>",
  "code": 400
}
```
401 Unauthorized
```
{
  "error": "Unauthorized"
}
```
412 Insufficient credits

Insufficient credits. All Credits have been used up. Please upgrade your membership or purchase credits.
```
{
  "error": "All Credits have been used up. Please upgrade your membership or purchase credits."
}
```
429 Too Many Requests

Wait in a loop for at least 10..30 seconds and retry again.

The API query is full and can not accept new speech/create requests. Size of the query is defined by the maxJobs optional parameter.
```
{
    "error":
      "Account <email> is busy executing <maxJobs> tasks."
      "All configured accounts are running at maximum capacity."
}
```
596 Pending mod message

Your PixVerse.ai account has a pending error. Most likely, you changed your account password or your PixVerse.ai account was placed on hold. Once the issue is resolved, update your account to clear the error by executing POST accounts/email before making any new API calls.
```
{
  "error":
    "Your PixVerse account has pending error."
    "Please address this issue at https://useapi.net/docs/api-pixverse-v2/post-pixverse-accounts-email before making any new API calls."
}
```

Model

{ // TypeScript, all fields are optional
    audio_id: string
    asset_id: number
    asset_type: number
    asset_source: number
    create_mode: string
    status: string
    audio_status: number
    credits: number
    error: string
    code: number
    // added
    audio_status_name: string
    audio_status_final: boolean
}

Examples

curl -H "Accept: application/json" \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer …" \
     -X POST "https://api.useapi.net/v2/pixverse/speech/create" \
     -d '{"model": "speech-2.8-hd", "text": "May the Force be with you.", "voice_id": "minimax_english_radiant_girl", "emotion": "happy"}'

const apiUrl = `https://api.useapi.net/v2/pixverse/speech/create`;
const token = "API token";
const data = {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${token}`,
    'Content-Type': 'application/json' }
};
data.body = JSON.stringify({
  model: "speech-2.8-hd",
  text: "May the Force be with you.",
  voice_id: "minimax_english_radiant_girl",
  emotion: "happy"
});
const response = await fetch(apiUrl, data);
const result = await response.json();
console.log("response", {response, result});

import requests
apiUrl = f"https://api.useapi.net/v2/pixverse/speech/create"
token = "API token"
headers = {
    "Content-Type": "application/json",
    "Authorization" : f"Bearer {token}"
}
body = {
    "model": "speech-2.8-hd",
    "text": "May the Force be with you.",
    "voice_id": "minimax_english_radiant_girl",
    "emotion": "happy"
}
response = requests.post(apiUrl, headers=headers, json=body)
print(response, response.json())

Try It

Pick a model first — the form then loads its real voice and language lists. Selecting a voice plays a preview.