MiniMax API for TTS (text-to-speech) AI model

3 min read • December 30, 2024

Table of contents

  1. Introduction
  2. Cloned Voices Samples
  3. Standard Voices Samples
  4. API Overview
  5. Conclusion

The MiniMax API by useapi.net offers a fixed $10/m subscription with near-unlimited text-to-speech generations and voice cloning capabilities.

Introduction

The MiniMax API v1 is a third-party API for the MiniMax speech-01-turbo AI model, which is deployed at www.hailuo.ai/audio. To use this API, you will need a free www.hailuo.ai account and a flat monthly subscription.

The MiniMax API v1 provides the following features:

  • Up to 20 parallel TTS jobs per single www.hailuo.ai account.
    You can connect as many accounts as you need.
  • Average response time for live streaming is 3 seconds.
  • Average time to create an MP3 from text is under 20 seconds.
  • Over 300 pre-built voices available.
  • Ability to clone an unlimited number of voices.
  • Supported Languages: English, Chinese (Mandarin), Spanish, French, Russian, Portuguese, Indonesian, German, Japanese, Korean, Italian, and Cantonese.
  • Supported Emotions: happy, sad, angry, fearful, disgusted, surprised, and neutral.
  • Supported Accents: US (General), British English, and Indian English.
  • Supported Ages: Young Adult, Adult, Middle-Aged, and Senior.
  • Supported Genders: Male and Female.

Cloned Voices Samples

Examples below were created using the MiniMax API endpoint POST audio/create-mp3 using voices cloned via POST audio/clone-voice:

  • Donald Trump on AI moderation issues (audio clip used for voice cloning)
  • Scarlett Johansson (audio clip used for voice cloning)
  • Arnold Schwarzenegger as T1000 (audio clip used for voice cloning)
  • Morgan Freeman (audio clip used for voice cloning)

Standard Voices Samples

Examples below were created using the MiniMax API endpoint POST audio/create-mp3:

  • Dr. Evil: Sharks with laser beams attached to their heads reference

API Overview

To create an MP3 audio file from text (maximum 1000 characters) in under 20 seconds, use the POST audio/create-mp3 endpoint.

To create a near real-time audio stream from text (maximum 3000 characters), use the POST audio/create-stream endpoint.

To clone voices from up to 10 audio samples (each 10 to 60 seconds long), use the POST audio/clone-voice endpoint.

Conclusion

Visit our Discord Server or Telegram Channel for any support questions and concerns.

We regularly post guides and tutorials on the YouTube Channel.

Check our GitHub repo with code examples.