Generate speech from text
July 4, 2025
Table of contents
This endpoint generates speech from text using text-to-speech technology.
You can execute up to 100 jobs in parallel, but we strongly recommend keeping concurrent jobs per account under 10 to avoid your account being banned by HeyGen.
Notes
When generating a very long text, some models tend to generate speech that gets gradually quieter. To avoid that, we suggest slicing the prompt into 1K chunks.
https://api.useapi.net/v1/heygen/tts/create
Request Headers
Authorization: Bearer {API token}
Content-Type: application/json
# Alternatively you can use multipart/form-data
# Content-Type: multipart/form-data
API token
is required, see Setup useapi.net for details.
Request Body
{
"email": "[email protected]",
"voice_id": "en-US-AriaNeural",
"prompt": "Text to be converted to speech",
"speed": 100,
"pitch": 0,
"volume": 100,
"language_code": "en-US",
"emotion": "happy"
}
-
email
is optional when only one account configured.
However, if you have multiple accounts configured, this parameter becomes required. -
voice_id
is required, a valid voice_id from GET /tts/voices. -
prompt
is required, the text to be converted to speech (maximum 5000 characters). -
speed
is optional, range from50
to150
.
Default is100
(normal speed). -
pitch
is optional, range from-100
to100
.
Default is0
(normal pitch). -
volume
is optional, range from0
to100
.
Default is100
(full volume). -
language_code
is optional, must be one of the supported language codes from GET /tts/languages. -
emotion
is optional, must be one of the supported emotion names for the selected voice.
Use validvoice.settings.clone_emotions.name
from GET /tts/voices/?voice_id=voice_id
.
Responses
-
{ "audio_url": "https://heygen-media.s3.amazonaws.com/audio/abc123.mp3", "duration": 5.2, "is_pass": true, "job_id": null, "word_timestamps": [ { "word": "Hello", "start": 0.0, "end": 0.5 }, { "word": "world", "start": 0.6, "end": 1.1 } ] }
-
{ "error": "Invalid emotion (angry), supported values: happy, sad, neutral" }
-
{ "error": "Unauthorized", "code": 401 }
Field audio_url
will contain URL with generated mp3 audio file.
Model
{ // TypeScript, all fields are optional
audio_url: string // URL to the generated MP3 audio file
duration: number // Duration of the audio in seconds
is_pass: boolean // Whether the generation was successful
job_id: string | null // Job ID (usually null for synchronous requests)
word_timestamps: { // Timing information for each word
word: string // The spoken word
start: number // Start time in seconds
end: number // End time in seconds
}[]
}
Examples
-
curl -X POST "https://api.useapi.net/v1/heygen/tts/create" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer …" \ -d '{"email":"[email protected]","voice_id":"en-US-AriaNeural","prompt":"Hello, world!"}'
-
const token = "API token"; const email = "Previously configured account email"; const apiUrl = "https://api.useapi.net/v1/heygen/tts/create"; const response = await fetch(apiUrl, { method: "POST", headers: { "Content-Type": "application/json", "Authorization": `Bearer ${token}`, }, body: JSON.stringify({ email: email, voice_id: "en-US-AriaNeural", prompt: "Hello, world!" }) }); const result = await response.json(); console.log("response", {response, result});
-
import requests token = "API token" email = "Previously configured account email" apiUrl = "https://api.useapi.net/v1/heygen/tts/create" headers = { "Content-Type": "application/json", "Authorization" : f"Bearer {token}" } data = { "email": email, "voice_id": "en-US-AriaNeural", "prompt": "Hello, world!" } response = requests.post(apiUrl, headers=headers, json=data) print(response, response.json())