Generate speech from text
July 4, 2025 (August 6, 2025)
Table of contents
This endpoint generates speech from text using text-to-speech technology.
You can execute up to 100 jobs in parallel, but we strongly recommend keeping concurrent jobs per account under 10 to avoid your account being banned by HeyGen.
When generating a very long text, some models tend to generate speech that gets gradually quieter. To avoid that, we suggest slicing the prompt into 1K chunks.
The engine
parameter is optional and allows you to specify which voice engine to use when generating speech. Supported engines include: auto
, aws
, azure
, elevenLabs
, elevenLabsV3
, fish
, google
, openai
, openaiEmo
, panda
, and starfish
.
The chosen engine must be supported for the selected voice (see each voice’s voice_engines
array or the default_voice_engine
returned by GET /tts/voices).
If you do not specify an engine, the system defaults to elevenLabs
. If the selected voice does not support elevenLabs
, it will fall back to that voice’s default engine.
For advanced emotional context and control, the elevenLabsV3
model supports audio tags.
Each unique generation is internally cached by HeyGen, and if you execute the same exact parameters, the cached value will be returned. If you need to trigger re-generation, you must change one of the parameters, for example, by adding spaces to your prompt.
https://api.useapi.net/v1/heygen/tts/create
Request Headers
Authorization: Bearer {API token}
Content-Type: application/json
# Alternatively you can use multipart/form-data
# Content-Type: multipart/form-data
API token
is required, see Setup useapi.net for details.
Request Body
{
"email": "[email protected]",
"voice_id": "en-US-AriaNeural",
"prompt": "Text to be converted to speech",
"speed": 100,
"pitch": 0,
"volume": 100,
"language_code": "en-US",
"emotion": "happy"
}
-
email
is optional when only one account configured.
However, if you have multiple accounts configured, this parameter becomes required. -
voice_id
is required, a valid voice_id from GET /tts/voices. -
prompt
is required, the text to be converted to speech (maximum 5000 characters). -
speed
is optional, range from50
to150
.
Default is100
(normal speed). -
pitch
is optional, range from-100
to100
.
Default is0
(normal pitch). -
volume
is optional, range from0
to100
.
Default is100
(full volume). -
language_code
is optional, must be one of the supported language codes from GET /tts/languages. -
emotion
is optional, must be one of the supported emotion names for the selected voice.
Use validvoice.settings.clone_emotions.name
from GET /tts/voices/?voice_id=voice_id
. -
engine
is optional and specifies the voice engine to use for text-to-speech generation. It must be one of the supported engines for the selected voice from thevoice.voice_engines
array or thedefault_voice_engine
field in GET /tts/voices.
Possible values:auto
,aws
,azure
,elevenLabs
,elevenLabsV3
,fish
,google
,openai
,openaiEmo
,panda
,starfish
.
If not specified, it defaults toelevenLabs
. IfelevenLabs
is not supported by the voice, it falls back todefault_voice_engine
.
TheelevenLabsV3
model supports audio tags to express emotional context in speech.
Responses
-
{ "audio_url": "https://heygen-media.s3.amazonaws.com/audio/abc123.mp3", "duration": 5.2, "is_pass": true, "job_id": null, "word_timestamps": [ { "word": "Hello", "start": 0.0, "end": 0.5 }, { "word": "world", "start": 0.6, "end": 1.1 } ] }
-
{ "error": "Invalid emotion angry for voice en-US-AriaNeural, supported values: happy, sad, neutral" }
Error examples:
{ "error": "Invalid engine elevenLabs for voice en-US-AriaNeural, supported values: auto,aws,azure" }
{ "error": "Voice en-US-AriaNeural does not support emotions" }
-
{ "error": "Unauthorized", "code": 401 }
-
Daily voice duration limit reached, please upgrade or try tomorrow.
{ "error": "Daily voice duration limit reached, please upgrade or try tomorrow." }
Field audio_url
will contain URL with generated mp3 audio file.
Model
{ // TypeScript, all fields are optional
audio_url: string // URL to the generated MP3 audio file
duration: number // Duration of the audio in seconds
is_pass: boolean // Whether the generation was successful
job_id: string | null // Job ID (usually null for synchronous requests)
word_timestamps: { // Timing information for each word
word: string // The spoken word
start: number // Start time in seconds
end: number // End time in seconds
}[]
}
Examples
-
curl -X POST "https://api.useapi.net/v1/heygen/tts/create" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer …" \ -d '{"email":"[email protected]","voice_id":"en-US-AriaNeural","prompt":"Hello, world!"}'
-
const token = "API token"; const email = "Previously configured account email"; const apiUrl = "https://api.useapi.net/v1/heygen/tts/create"; const response = await fetch(apiUrl, { method: "POST", headers: { "Content-Type": "application/json", "Authorization": `Bearer ${token}`, }, body: JSON.stringify({ email: email, voice_id: "en-US-AriaNeural", prompt: "Hello, world!" }) }); const result = await response.json(); console.log("response", {response, result});
-
import requests token = "API token" email = "Previously configured account email" apiUrl = "https://api.useapi.net/v1/heygen/tts/create" headers = { "Content-Type": "application/json", "Authorization" : f"Bearer {token}" } data = { "email": email, "voice_id": "en-US-AriaNeural", "prompt": "Hello, world!" } response = requests.post(apiUrl, headers=headers, json=data) print(response, response.json())