Create text-to-speech audio stream token and payload
Table of contents
December 23, 2024 (January 15, 2025)
Please configure at least one www.hailuo.ai account for this endpoint, see Setup MiniMax for details.
This endpoint creates a near real-time audio stream from the provided text.
- Average time to response is 3 seconds.
- Up to 20 parallel jobs per account are supported.
- Currently, this service is offered free of charge.
Over 300 pre-built voices provided GET audio/voices supporting the following:
- Languages: English (US, UK, Australia, India), Chinese (Mandarin and Cantonese), Japanese, Korean, French, German, Spanish, Portuguese (including Brazilian), Italian, Arabic, Russian, Turkish, Dutch, Ukrainian, Vietnamese, and Indonesian.
The list is constantly updated to include more languages! - Emotions: happy, sad, angry, fearful, disgusted, surprised, neutral
- Accents: US (General), English, Indian
- Ages: Young Adult, Adult, Middle-Aged, Senior
- Genders: Male, Female
Returned by this endpoint token
and payload
will be used by WebSocket WSS audio/wss.
https://api.useapi.net/v1/minimax/audio/create-stream
Request Headers
Authorization: Bearer {API token}
Content-Type: application/json
# Alternatively you can use multipart/form-data
# Content-Type: multipart/form-data
API token
is required, see Setup useapi.net for details.
Request Body
{
"account": "Optional MiniMax www.hailuo.ai API account",
"text": "Required text",
"voice_id": "Required voice id"
}
-
account
is optional when only onewww.hailuo.ai
account configured. However, if you have multiple accounts configured, this parameter becomes required. -
text
is required. Insert<#0.5#>
to add a 0.5s pause between sentences. Adjust the duration as needed.
Maximum length: 5000 characters. -
voice_id
is required. Use GET audio/voices to get list of all available voices. -
model
is optional.
Supported values:speech-01-hd
(default),speech-01-turbo
. -
language_boost
is optional. Use tag_name from arrayvoice_tag_language
of GET audio/config.
Default valueAuto
. -
emotion
is optional. Use value from arrayt2a_emotion
of GET audio/config.
Default valueAuto
. -
vol
is optional.
Default 1. -
speed
is optional.
Valid range: 0.5β¦2, default 1. -
pitch
is optional.
Valid range: -12β¦12, default 0. -
deepen_lighten
is optional.
Valid range: -100β¦100, default 0. -
stronger_softer
is optional.
Valid range: -100β¦100, default 0. -
nasal_crisp
is optional.
Valid range: -100β¦100, default 0. -
spacious_echo
is optional.
Supported values:true
,false
(default). -
lofi_telephone
is optional.
Supported values:true
,false
(default). -
robotic
is optional.
Supported values:true
,false
(default). -
auditorium_echo
is optional.
Supported values:true
,false
(default).
Responses
-
Field
token
andpayload
values are used by WebSocket WSS audio/wss.- The
token
contains WebSocket authorization information and will expire in 24 hours. - The
payload
contains a properly formed and validated payload built from user-provided input. You should send this payload over the WebSocket to generate a real-time sound stream and optionally retrieve the generated MP3 file.
{ "token": "token for WSS WebSocket endpoint", "payload": { "msg_id": "1e53d-a593-e40b9-54e20-f29c84-40ae3", "model": "", "text": "Text for TTS generation", "voice_setting": { "speed": 1, "vol": 1, "pitch": 0, "voice_id": "123456789", "emotion": "happy" }, "audio_setting": {}, "effects": { "deepen_lighten": 0, "stronger_softer": 0, "nasal_crisp": 0, "spacious_echo": false, "lofi_telephone": false, "robotic": false, "auditorium_echo": false }, "er_weights": [], "language_boost": "German", "stream": true } }
- The
-
{ "error": "<Error message>" }
-
{ "error": "Unauthorized" }
Examples
The code provided at WSS audio/wss is used on this page when you use the Try It feature.
Model
Field token
and payload
values are used by WebSocket WSS audio/wss.
- The
token
contains WebSocket authorization information and will expire in 24 hours. - The
payload
contains a properly formed and validated payload built from user-provided input. You should send this payload over the WebSocket to generate a real-time sound stream and optionally retrieve the generated MP3 file.
{ // TypeScript, all fields are optional
token: string
payload: {}
}