POST audio/create-stream

Create text-to-speech audio stream token and payload

December 23, 2024 (April 4, 2025)

Request Headers
Request Body
Responses
Examples
Model
Try It

Please configure at least one www.minimax.io/audio account for this endpoint, see Setup MiniMax for details.

This endpoint creates a near real-time audio stream from the provided text.

Average time to response is 3 seconds.
Up to 20 parallel jobs per account are supported.

Over 300 pre-built voices provided GET audio/voices supporting the following:

Languages: English (US, UK, Australia, India), Chinese (Mandarin and Cantonese), Japanese, Korean, French, German, Spanish, Portuguese (including Brazilian), Italian, Arabic, Russian, Turkish, Dutch, Ukrainian, Vietnamese, and Indonesian.
The list is constantly updated to include more languages!
Emotions: happy, sad, angry, fearful, disgusted, surprised, neutral
Accents: US (General), English, Indian
Ages: Young Adult, Adult, Middle-Aged, Senior
Genders: Male, Female

Returned by this endpoint token and payload will be used by WebSocket WSS audio/wss.

https://api.useapi.net/v1/minimax/audio/create-stream

Request Headers

Authorization: Bearer {API token}
Content-Type: application/json
# Alternatively you can use multipart/form-data
# Content-Type: multipart/form-data

API token is required, see Setup useapi.net for details.

Request Body

{
    "account": "Optional MiniMax www.minimax.io/audio API account",
    "text": "Required text",
    "voice_id": "Required voice id"
}

account is optional when only one account configured. However, if you have multiple accounts configured, this parameter becomes required.
text is required. Insert <#0.5#> to add a 0.5s pause between sentences. Adjust the duration as needed.
Maximum length: 5000 characters.
voice_id is required. Use GET audio/voices to get list of all available voices.
model is optional.
Supported values: speech-02-hd (default), speech-01-hd, speech-02-turbo, speech-01-turbo.
language_boost is optional. Use tag_name from array voice_tag_language of GET audio/config.
Default value Auto.
emotion is optional. Use value from array t2a_emotion of GET audio/config.
Default value Auto.
vol is optional.
Default 1.
speed is optional.
Valid range: 0.5…2, default 1.
pitch is optional.
Valid range: -12…12, default 0.
deepen_lighten is optional.
Valid range: -100…100, default 0.
stronger_softer is optional.
Valid range: -100…100, default 0.
nasal_crisp is optional.
Valid range: -100…100, default 0.
spacious_echo is optional.
Supported values: true, false (default).
lofi_telephone is optional.
Supported values: true, false (default).
robotic is optional.
Supported values: true, false (default).
auditorium_echo is optional.
Supported values: true, false (default).

Responses

200 OK

Field token and payload values are used by WebSocket WSS audio/wss.

The token contains WebSocket authorization information and will expire in 24 hours.
The payload contains a properly formed and validated payload built from user-provided input. You should send this payload over the WebSocket to generate a real-time sound stream and optionally retrieve the generated MP3 file.

{
    "token": "token for WSS WebSocket endpoint",
    "payload": {
        "msg_id": "1e53d-a593-e40b9-54e20-f29c84-40ae3",
        "payload": {
            "model": "",
            "text": "Text for TTS generation",
            "voice_setting": {
                "speed": 1,
                "vol": 1,
                "pitch": 0,
                "voice_id": "123456789",
                "emotion": "happy"
            },
            "audio_setting": {},
            "effects": {
                "deepen_lighten": 0,
                "stronger_softer": 0,
                "nasal_crisp": 0,
                "spacious_echo": false,
                "lofi_telephone": false,
                "robotic": false,
                "auditorium_echo": false
            },
            "er_weights": [],
            "language_boost": "German",
            "stream": true
        }
    }
}

400 Bad Request
```
{
  "error": "<Error message>"
}
```
401 Unauthorized
```
{
  "error": "Unauthorized"
}
```

Examples

The code provided at WSS audio/wss is used on this page when you use the Try It feature.

Model