MiniMax API for TTS (text-to-speech) AI model
3 min read • December 30, 2024
Table of contents
The MiniMax API by useapi.net offers a fixed $10/m subscription with near-unlimited text-to-speech generations and voice cloning capabilities.
Introduction
The MiniMax API v1 is a third-party API for the MiniMax speech-01-turbo AI model, which is deployed at www.hailuo.ai/audio. To use this API, you will need a free www.hailuo.ai
account and a flat monthly subscription.
The MiniMax API v1 provides the following features:
- Up to 20 parallel TTS jobs per single
www.hailuo.ai
account.
You can connect as many accounts as you need. - Average response time for live streaming is 3 seconds.
- Average time to create an MP3 from text is under 20 seconds.
- Over 300 pre-built voices available.
- Ability to clone an unlimited number of voices.
- Supported Languages: English, Chinese (Mandarin), Spanish, French, Russian, Portuguese, Indonesian, German, Japanese, Korean, Italian, and Cantonese.
- Supported Emotions: happy, sad, angry, fearful, disgusted, surprised, and neutral.
- Supported Accents: US (General), British English, and Indian English.
- Supported Ages: Young Adult, Adult, Middle-Aged, and Senior.
- Supported Genders: Male and Female.
Cloned Voices Samples
Examples below were created using the MiniMax API endpoint POST audio/create-mp3 using voices cloned via POST audio/clone-voice:
- Donald Trump on AI moderation issues (audio clip used for voice cloning)
- Scarlett Johansson (audio clip used for voice cloning)
- Arnold Schwarzenegger as T1000 (audio clip used for voice cloning)
- Morgan Freeman (audio clip used for voice cloning)
Standard Voices Samples
Examples below were created using the MiniMax API endpoint POST audio/create-mp3:
- May the Force be with you reference
- There’s no place like home reference
- Yippee-Ki-Yay reference
- Dr. Evil: Sharks with laser beams attached to their heads reference
API Overview
To create an MP3 audio file from text (maximum 1000 characters) in under 20 seconds, use the POST audio/create-mp3 endpoint.
To create a near real-time audio stream from text (maximum 3000 characters), use the POST audio/create-stream endpoint.
To clone voices from up to 10 audio samples (each 10 to 60 seconds long), use the POST audio/clone-voice endpoint.
Conclusion
Visit our Discord Server or Telegram Channel for any support questions and concerns.
We regularly post guides and tutorials on the YouTube Channel.
Check our GitHub repo with code examples.