MiniMax API Text-to-Speech (TTS): Voices & Samples

3 min read • December 30, 2024 (June 15, 2026)

Introduction
Cloned Voices Samples
Standard Voices Samples
Current API: Mureka speech
Conclusion

MiniMax’s text-to-speech endpoints have since been retired. The voices and samples below remain as a showcase of what the original MiniMax TTS model produced. For current text-to-speech and voice cloning, use the Mureka API — see POST speech, GET speech/voices and POST speech/voice (voice cloning).

Introduction

The MiniMax API v1 was a third-party API for the MiniMax speech AI model, deployed at www.minimax.io/audio.

At the time these samples were produced, the MiniMax TTS API provided the following features:

Up to 20 parallel TTS jobs per single www.minimax.io/audio account.
You can connect as many accounts as you need.
Average response time for live streaming is 3 seconds.
Average time to create an MP3 from text is under 10 seconds.
Over 300 pre-built voices available.
Ability to clone voices.
Supported Languages: English (US, UK, Australia, India), Chinese (Mandarin and Cantonese), Japanese, Korean, French, German, Spanish, Portuguese (including Brazilian), Italian, Arabic, Russian, Turkish, Dutch, Ukrainian, Vietnamese, and Indonesian.
The list is constantly updated to include more languages!
Supported Emotions: happy, sad, angry, fearful, disgusted, surprised, and neutral.
Supported Accents: US (General), British English, and Indian English.
Supported Ages: Young Adult, Adult, Middle-Aged, and Senior.
Supported Genders: Male and Female.