POST tts/create

Generate speech from text

July 4, 2025 (August 6, 2025)

Request Headers
Request Body
Responses
Model
Examples
Try It

This endpoint generates speech from text using text-to-speech technology.

You can execute up to 100 jobs in parallel, but we strongly recommend keeping concurrent jobs per account under 10 to avoid your account being banned by HeyGen.

When generating a very long text, some models tend to generate speech that gets gradually quieter. To avoid that, we suggest slicing the prompt into 1K chunks.

The engine parameter is optional and allows you to specify which voice engine to use when generating speech. Supported engines include: auto, aws, azure, elevenLabs, elevenLabsV3, fish, google, openai, openaiEmo, panda, and starfish.
The chosen engine must be supported for the selected voice (see each voice’s voice_engines array or the default_voice_engine returned by GET /tts/voices).
If you do not specify an engine, the system defaults to elevenLabs. If the selected voice does not support elevenLabs, it will fall back to that voice’s default engine.
For advanced emotional context and control, the elevenLabsV3 model supports audio tags.

Each unique generation is internally cached by HeyGen, and if you execute the same exact parameters, the cached value will be returned. If you need to trigger re-generation, you must change one of the parameters, for example, by adding spaces to your prompt.

https://api.useapi.net/v1/heygen/tts/create

Request Headers

Authorization: Bearer {API token}
Content-Type: application/json
# Alternatively you can use multipart/form-data
# Content-Type: multipart/form-data

API token is required, see Setup useapi.net for details.

Request Body

{
  "email": "[email protected]",
  "voice_id": "en-US-AriaNeural",
  "prompt": "Text to be converted to speech",
  "speed": 100,
  "pitch": 0,
  "volume": 100,
  "language_code": "en-US",
  "emotion": "happy"
}

email is optional when only one account configured.
However, if you have multiple accounts configured, this parameter becomes required.
voice_id is required, a valid voice_id from GET /tts/voices.
prompt is required, the text to be converted to speech (maximum 5000 characters).
speed is optional, range from 50 to 150.
Default is 100 (normal speed).
pitch is optional, range from -100 to 100.
Default is 0 (normal pitch).
volume is optional, range from 0 to 100.
Default is 100 (full volume).
language_code is optional, must be one of the supported language codes from GET /tts/languages.
emotion is optional, must be one of the supported emotion names for the selected voice.
Use valid voice.settings.clone_emotions.name from GET /tts/voices/?voice_id=voice_id.
engine is optional and specifies the voice engine to use for text-to-speech generation. It must be one of the supported engines for the selected voice from the voice.voice_engines array or the default_voice_engine field in GET /tts/voices.
Possible values: auto, aws, azure, elevenLabs, elevenLabsV3, fish, google, openai, openaiEmo, panda, starfish.
If not specified, it defaults to elevenLabs. If elevenLabs is not supported by the voice, it falls back to default_voice_engine.
The elevenLabsV3 model supports audio tags to express emotional context in speech.

Responses

200 OK

{
  "audio_url": "https://heygen-media.s3.amazonaws.com/audio/abc123.mp3",
  "duration": 5.2,
  "is_pass": true,
  "job_id": null,
  "word_timestamps": [
    {
      "word": "Hello",
      "start": 0.0,
      "end": 0.5
    },
    {
      "word": "world",
      "start": 0.6,
      "end": 1.1
    }
  ]
}

400 Bad Request

{
  "error": "Invalid emotion angry for voice en-US-AriaNeural, supported values: happy, sad, neutral"
}

Error examples:

{
  "error": "Invalid engine elevenLabs for voice en-US-AriaNeural, supported values: auto,aws,azure"
}

{
  "error": "Voice en-US-AriaNeural does not support emotions"
}

401 Unauthorized

{
  "error": "Unauthorized",
  "code": 401
}

412 Insufficient credits

Daily voice duration limit reached, please upgrade or try tomorrow.

{
  "error": "Daily voice duration limit reached, please upgrade or try tomorrow."
}

Field audio_url will contain URL with generated mp3 audio file.

Model

{ // TypeScript, all fields are optional
  audio_url: string         // URL to the generated MP3 audio file
  duration: number          // Duration of the audio in seconds
  is_pass: boolean          // Whether the generation was successful
  job_id: string | null     // Job ID (usually null for synchronous requests)
  word_timestamps: {        // Timing information for each word
    word: string            // The spoken word
    start: number           // Start time in seconds
    end: number             // End time in seconds
  }[]
}

Examples

curl -X POST "https://api.useapi.net/v1/heygen/tts/create" \
   -H "Content-Type: application/json" \
   -H "Authorization: Bearer …" \
   -d '{"email":"[email protected]","voice_id":"en-US-AriaNeural","prompt":"Hello, world!"}'

const token = "API token";
const email = "Previously configured account email";
const apiUrl = "https://api.useapi.net/v1/heygen/tts/create"; 
const response = await fetch(apiUrl, {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": `Bearer ${token}`,
  },
  body: JSON.stringify({
    email: email,
    voice_id: "en-US-AriaNeural",
    prompt: "Hello, world!"
  })
});
const result = await response.json();
console.log("response", {response, result});

import requests
token = "API token"
email = "Previously configured account email"
apiUrl = "https://api.useapi.net/v1/heygen/tts/create"
headers = {
    "Content-Type": "application/json", 
    "Authorization" : f"Bearer {token}"
}
data = {
    "email": email,
    "voice_id": "en-US-AriaNeural",
    "prompt": "Hello, world!"
}
response = requests.post(apiUrl, headers=headers, json=data)
print(response, response.json())