Kling Lip-Sync Avatars, Motion Control & Image Generation via the Kling API
7 min read • June 22, 2026
Table of contents
- Introduction
- Pricing
- Lip-sync & avatars
- Motion control
- Image generation (Kolors)
- More
- Runnable scripts
- Examples
- Frequently asked questions
- Conclusion
Introduction
Kling AI is not just a text-to-video model — the same account drives a full creative suite: talking lip-sync avatars, motion transfer onto a still image, and KOLORS image generation — and every one of them is reachable from code. Kling AI is the generative-media service from Chinese short-video giant Kuaishou Technology, and useapi.net fronts it with a third-party Kling API that runs your own Kling account over a standard REST endpoint. This guide covers the premium features beyond plain text/image-to-video — for the core video workflow and the model lineup, see How to Generate AI Video with Kling v3.
Three things hold across every feature below. You need a useapi.net API token and a connected Kling account — export the token so the curl examples run as-is:
export USEAPI_TOKEN="user:1234-..."
The email field is required in the body only when you have more than one Kling account configured. And most of these endpoints are asynchronous and task-based: the create call returns a numeric task.id, then you poll GET /v1/kling/tasks/{task_id} until status_final is true (success is status_name: "succeed", status: 99), with the output in the works[] array. The exceptions are noted where they occur — text-to-speech returns synchronously, and avatar creation returns an id directly.
Pricing
This is the consumer-account route. Kuaishou’s official Kling API bills per generation at developer rates on a separate developer account. useapi.net instead automates the consumer account you already pay for, at the website subscription price plus a flat $15/month that covers API access to every supported service with no per-generation surcharge from us. Generations draw from your Kling account’s own credit balance at Kling’s standard rates — text-to-speech is the one feature that is free on every plan, including the free tier. For the per-model credit breakdown, see the core Kling tutorial’s pricing and the Kling API overview live cost calculator.
Lip-sync & avatars
A Kling avatar is a digital character built from a single image that can be animated to speak — either from a pre-recorded audio file or from text you supply, voiced by Kling’s text-to-speech. The flow has up to three pieces.
1. (Optional) Save an avatar — POST https://api.useapi.net/v1/kling/avatars turns an image into a reusable avatar. Upload the image first with POST /assets, then pass the returned URL as imageUrl. Most fields (nickname, prompt, scene, and the TTS voice settings) are auto-filled by AI from the image if you omit them:
curl -X POST "https://api.useapi.net/v1/kling/avatars" \
-H "Authorization: Bearer $USEAPI_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"imageUrl": "https://s21-kling.klingai.com/ai-platform/.../portrait.jpg"
}'
This call returns synchronously with the new avatar’s id, not a task to poll:
{ "id": "300137233318846", "status": "SUCCESS" }
You can skip this step and pass a direct imageUrl to the video call instead — saving an avatar is only worth it if you want to reuse the same character and its voice defaults. List your saved avatars (or browse system templates) with GET /avatars.
2. Generate the talking video — POST https://api.useapi.net/v1/kling/avatars/video. Provide one avatar source (avatarId or imageUrl) and one audio source (audioUrl or text). When you pass text, a speakerId is required — list the available voices, their emotions, and sample clips with GET /tts/voices:
curl -X POST "https://api.useapi.net/v1/kling/avatars/video" \
-H "Authorization: Bearer $USEAPI_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"avatarId": "300137233318846",
"text": "Hello there! Welcome to our product demo.",
"speakerId": "moss_audio_ad5baf92-735f-11f0-8263-fe5a2fe98ec8",
"mode": "std"
}'
This is asynchronous — it returns a task.id. Poll it:
curl "https://api.useapi.net/v1/kling/tasks/[email protected]" \
-H "Authorization: Bearer $USEAPI_TOKEN"
When status_final is true and status_name is "succeed", the MP4 is in the works[] array. To pull the clean, non-watermarked master, take the workId from works[] and call GET /assets/download with fileTypes=MP4. text accepts up to 5000 characters, and mode is std (default) or pro. The speed (0.8–2.0) and emotion (neutral, happy, angry, sad, fearful, disgusted, surprised) options apply when you supply text — not every emotion is available for every voice, so check the voice’s emotions list from GET /tts/voices.
Two related endpoints round this out. POST /tts/create generates up to 5 minutes of speech from text and returns the MP3 URL synchronously in the resource field (status: 99) — no polling — and it is free and unlimited on every Kling plan. And if you already have a finished video with a clear frontal face, POST /videos/lipsync re-syncs its lips to a new audio track (both video and audio are URLs, the video must be 60 seconds or less). For longer clips or looser face-visibility requirements, Kling’s docs point to PixVerse Lip Sync instead.
Motion control
Motion control transfers the motion from a reference video onto a static image, so the person in your image performs the action from the video. Use POST https://api.useapi.net/v1/kling/videos/motion-create with an imageUrl (a person with a clearly visible pose) and a motionUrl (a 3–30 second reference video). Both must be uploaded via POST /assets first — use the url field from the response for imageUrl and the resourceUrl field for motionUrl. You can also pull an official Kling motion or one you previously uploaded with GET /videos/motions:
curl -X POST "https://api.useapi.net/v1/kling/videos/motion-create" \
-H "Authorization: Bearer $USEAPI_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model_name": "kling-v3-0",
"imageUrl": "https://s15-kling.klingai.com/.../person.jpg",
"motionUrl": "https://v15-kling.klingai.com/.../dance.mp4",
"prompt": "Person dancing energetically",
"keepAudio": true,
"mode": "std"
}'
model_name is kling-v3-0 (default, upgraded motion capture with high facial consistency) or kling-v2-6. The output duration is detected automatically from the motion video. Set keepAudio: true to keep the reference video’s sound, and motionDirection to motion_direction (default — follow the video’s motion) or image_direction (follow the pose from your source image). On kling-v3-0 you can attach a saved element via element_1 for stronger character consistency. The endpoint validates the image up front and returns a 400 with MOTION.PIC_NOT_MATCHED if it can’t detect a person.
The create call is asynchronous — it returns a task.id (type: "m2v_motion_control"). Poll it the same way as the avatar video above, then download the clean master via GET /assets/download:
curl "https://api.useapi.net/v1/kling/tasks/[email protected]" \
-H "Authorization: Bearer $USEAPI_TOKEN"
Image generation (Kolors)
KOLORS is Kling’s image model, exposed at POST https://api.useapi.net/v1/kling/images/kolors. A prompt is required (up to 2500 characters). The default version is kling-v3-0, which supports up to ten reference images — upload each with POST /assets, pass the URLs as image_1 through image_10, and cite them inside the prompt with @image_1, @image_2, and so on:
curl -X POST "https://api.useapi.net/v1/kling/images/kolors" \
-H "Authorization: Bearer $USEAPI_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"prompt": "A woman @image_1 dancing elegantly in a sunlit ballroom",
"version": "kling-v3-0",
"resolution": "2k",
"aspect_ratio": "3:4",
"imageCount": 2,
"image_1": "https://s21-kling.klingai.com/ai-platform/.../woman.jpg"
}'
Like the video features, KOLORS is asynchronous: the create call returns a task.id, and you poll GET /v1/kling/tasks/{task_id} until it succeeds — the generated images land in the works[] array.
curl "https://api.useapi.net/v1/kling/tasks/[email protected]" \
-H "Authorization: Bearer $USEAPI_TOKEN"
resolution is 1k or 2k (default), and aspect_ratio defaults to 16:9 (v3 also accepts auto when reference images are present). For the older kling-v2-1 version, drop the @image_N references and instead use a single reference of face, subject, or restyle together with an imageReference URL — useful for face-similarity, subject-similarity, or style-transfer workflows. Full parameter details are on POST /images/kolors.
Two image features build on the same account. POST /images/kolors-elements composes up to four subject images (plus optional scene and style references) into one scene. POST /images/virtual-try-on dresses a humanImage in a garment (dressInput, or upperInput/lowerInput). And POST /images/upscale takes a task_id + workId from a completed KOLORS task and returns a higher-resolution version. All three are async and resolve through the same task poll.
More
The same one account and token cover the rest of Kling’s lineup — each is its own endpoint, documented and Try-It-runnable:
- Effects — apply a named special effect to an image: POST /videos/image2video-effects (list effects with GET /videos/effects).
- Video extend — continue an existing clip past its original length: POST /videos/extend.
- Video elements — generate video from reusable character/object references: POST /videos/image2video-elements and the multi-shot POST /videos/omni.
- Face detection — find faces in an image (for KOLORS
faceNo): POST /images/recognize-faces.
For the full endpoint list and a live credit-cost calculator, see the Kling API overview.
Runnable scripts
The kling-api GitHub repo ships ready-to-run Node.js batch scripts for the features in this guide — each reads prompts from prompts.json, submits them, polls, and downloads every result:
lip-sync/— talking avatars (POST /avatars/video)motion-control/— motion transfer (POST /videos/motion-create)image-generation/— KOLORS images (POST /images/kolors)
Run any of them with node <script>.mjs <API_TOKEN> <EMAIL>.
Examples
The samples below are real Kling generations produced through this Kling API, straight from our blog walkthroughs.
Lip-sync avatar — POST /avatars/video, talking avatar from an image + TTS text
— from Kling: Elements and Avatars 2.0
Motion control — POST /videos/motion-create, kling-v3-0, 720p 15s motion transfer onto a still image
— from Kling: Motion Control 3.0 with Character Consistency
KOLORS image — POST /images/kolors, kling-v3-0 text-to-image

— from AI Image Models: 16+ Compared
Frequently asked questions
Can I create a talking avatar (lip-sync) through the Kling API? Yes. Optionally save an avatar from an image with POST /avatars, then call POST /avatars/video with an avatar source (avatarId or imageUrl) and an audio source (audioUrl or text + speakerId). It returns a task.id you poll for the finished MP4. See Lip-sync & avatars above.
Is Kling text-to-speech free? Yes. POST /tts/create generates up to 5 minutes of speech from text and is free and unlimited on every Kling subscription plan, including the free tier. It returns the MP3 URL synchronously (no polling). List voices and emotions with GET /tts/voices.
How does Kling motion control work? POST /videos/motion-create applies the motion from a reference video (motionUrl, 3–30s) onto a person in a static image (imageUrl). It uses kling-v3-0 by default for high facial consistency, runs asynchronously, and supports keepAudio, motionDirection, and an element_1 for stronger character identity. See Motion control above.
What image models does the Kling API expose? KOLORS, via POST /images/kolors. The default kling-v3-0 does text-to-image with up to ten @image_N references; kling-v2-1 adds face / subject / restyle reference workflows. Generation is asynchronous (poll the returned task.id). Virtual try-on, multi-element composition, and upscaling are separate endpoints on the same account. See Image generation (Kolors) above.
Why is my generation watermarked? The MP4 returned in the poll’s works[] array is the watermarked preview. To get the clean master, take the workId from the task and call GET /assets/download with fileTypes=MP4 — it returns a cdnUrl to the watermark-free file. This requires a paid Kling account.
How much do these features cost? You keep your normal Kling website subscription, plus a flat $15/month to useapi.net for API access to all services. Generations draw from your Kling account’s own credits at Kling’s standard rates (text-to-speech is free), with no per-generation surcharge from us — the Kling API overview has a live cost calculator.
My request returns a 500 or the poll returns a 404 — what does that mean? Kling reuses 500 for content-moderation rejections as well as genuine server faults — read the error text rather than the generic message to tell them apart (a moderation rejection reads like “The content you uploaded appears to violate the community guidelines.”). On a poll, a 404 means the task was deleted, failed at moderation, or your account ran out of credits — check your balance at GET /accounts/email.
Conclusion
Visit our Discord Server or
Telegram Channel for any support questions and concerns.
We regularly post guides and tutorials on the YouTube Channel.
The full runnable example is in the kling-api GitHub repo.
Cross posted
</content> </invoke>