Generate speech from text
July 4, 2025 (August 21, 2025)
HeyGen API has been decommissioned. Currently we do not have plans to re-release it and recommend switching to Mureka API for speech and music generation.
Table of contents
This endpoint generates speech from text using text-to-speech technology.
You can execute up to 100 jobs in parallel, but we strongly recommend keeping concurrent jobs per account under 10 to avoid your account being banned by HeyGen.
When generating a very long text, some models tend to generate speech that gets gradually quieter. To avoid that, we suggest slicing the prompt into 1K chunks.
The engine parameter is optional and allows you to specify which voice engine to use when generating speech. Supported engines include: auto, aws, azure, elevenLabs, elevenLabsV3, fish, google, openai, openaiEmo, panda, and starfish.
The chosen engine must be supported for the selected voice (see each voice’s voice_engines array or the default_voice_engine returned by GET /tts/voices).
If you do not specify an engine, the system defaults to elevenLabs. If the selected voice does not support elevenLabs, it will fall back to that voice’s default engine.
For advanced emotional context and control, the elevenLabsV3 model supports audio tags.
Each unique generation is internally cached by HeyGen, and if you execute the same exact parameters, the cached value will be returned. If you need to trigger re-generation, you must change one of the parameters, for example, by adding spaces to your prompt.
https://api.useapi.net/v1/heygen/tts/create
Request Headers
Authorization: Bearer {API token}
Content-Type: application/json
# Alternatively you can use multipart/form-data
# Content-Type: multipart/form-data
API tokenis required, see Setup useapi.net for details.
Request Body
{
"email": "[email protected]",
"voice_id": "en-US-AriaNeural",
"prompt": "Text to be converted to speech",
"speed": 100,
"pitch": 0,
"volume": 100,
"language_code": "en-US",
"emotion": "happy"
}
-
emailis optional when only one account configured.
However, if you have multiple accounts configured, this parameter becomes required. -
voice_idis required, a valid voice_id from GET /tts/voices. -
promptis required, the text to be converted to speech (maximum 5000 characters). -
speedis optional, range from50to150.
Default is100(normal speed). -
pitchis optional, range from-100to100.
Default is0(normal pitch). -
volumeis optional, range from0to100.
Default is100(full volume). -
language_codeis optional, must be one of the supported language codes from GET /tts/languages. -
emotionis optional, must be one of the supported emotion names for the selected voice.
Use validvoice.settings.clone_emotions.namefrom GET /tts/voices/?voice_id=voice_id. -
engineis optional and specifies the voice engine to use for text-to-speech generation. It must be one of the supported engines for the selected voice from thevoice.voice_enginesarray or thedefault_voice_enginefield in GET /tts/voices.
Possible values:auto,aws,azure,elevenLabs,elevenLabsV3,fish,google,openai,openaiEmo,panda,starfish.
If not specified, it defaults toelevenLabs. IfelevenLabsis not supported by the voice, it falls back todefault_voice_engine.
TheelevenLabsV3model supports audio tags to express emotional context in speech.
Responses
-
{ "audio_url": "https://heygen-media.s3.amazonaws.com/audio/abc123.mp3", "duration": 5.2, "is_pass": true, "job_id": null, "word_timestamps": [ { "word": "Hello", "start": 0.0, "end": 0.5 }, { "word": "world", "start": 0.6, "end": 1.1 } ] } -
{ "error": "Invalid emotion angry for voice en-US-AriaNeural, supported values: happy, sad, neutral" }Error examples:
{ "error": "Invalid engine elevenLabs for voice en-US-AriaNeural, supported values: auto,aws,azure" }{ "error": "Voice en-US-AriaNeural does not support emotions" } -
{ "error": "Unauthorized", "code": 401 } -
Daily voice duration limit reached, please upgrade or try tomorrow.
{ "error": "Daily voice duration limit reached, please upgrade or try tomorrow." }
Field audio_url will contain URL with generated mp3 audio file.
Model
{ // TypeScript, all fields are optional
audio_url: string // URL to the generated MP3 audio file
duration: number // Duration of the audio in seconds
is_pass: boolean // Whether the generation was successful
job_id: string | null // Job ID (usually null for synchronous requests)
word_timestamps: { // Timing information for each word
word: string // The spoken word
start: number // Start time in seconds
end: number // End time in seconds
}[]
}
Examples
-
curl -X POST "https://api.useapi.net/v1/heygen/tts/create" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer …" \ -d '{"email":"[email protected]","voice_id":"en-US-AriaNeural","prompt":"Hello, world!"}' -
const token = "API token"; const email = "Previously configured account email"; const apiUrl = "https://api.useapi.net/v1/heygen/tts/create"; const response = await fetch(apiUrl, { method: "POST", headers: { "Content-Type": "application/json", "Authorization": `Bearer ${token}`, }, body: JSON.stringify({ email: email, voice_id: "en-US-AriaNeural", prompt: "Hello, world!" }) }); const result = await response.json(); console.log("response", {response, result}); -
import requests token = "API token" email = "Previously configured account email" apiUrl = "https://api.useapi.net/v1/heygen/tts/create" headers = { "Content-Type": "application/json", "Authorization" : f"Bearer {token}" } data = { "email": email, "voice_id": "en-US-AriaNeural", "prompt": "Hello, world!" } response = requests.post(apiUrl, headers=headers, json=data) print(response, response.json())