Create AI avatars (UGC videos)
Last updated: May 19, 2026
Create AI avatars (UGC videos)
POST /api/v1/creative-hub/generate/avatar (verified apps/backend/src/routes/api/creative-hub-generate.route.ts). Provider: heygen (only one verified in apps/backend/src/providers/creative/types.ts). Params: script (text the avatar speaks), avatar_id (from HeyGen avatar library), voice_id (paired voice), language. Output: lip-synced video. Async via generate-avatar.worker.ts. Use case: UGC-style testimonial ads without filming a real person.
Who is this for
Mediabuyers running UGC-style ads, testimonial campaigns, or spokesperson explainers without filming real talent.
What an avatar generation produces
A short video of a chosen avatar (stock or custom) speaking your script. Lip-sync is automatic. Voice can be paired to the avatar (HeyGen avatar library has matched voices) or substituted (different language / different vocal style).
Typical outputs: 5-30 second clips of an avatar saying scripted lines, often used as testimonial cuts or product explainer hooks.
The single provider
Wevion uses heygen only. No alternative providers wired in. HeyGen's avatar library is large (stock avatars + you can upload custom in HeyGen UI).
How to generate
Step 1: Pick or upload an avatar in HeyGen
Avatar selection is done in HeyGen's own UI (Wevion references HeyGen avatars by avatar_id):
Browse HeyGen's stock library — many ethnicities, ages, styles
Or upload custom avatar in HeyGen UI (you / your spokesperson)
Copy the
avatar_idfrom HeyGen
Wevion may surface a picker that loads HeyGen's library, but ultimately the ID lookup happens against HeyGen.
Step 2: Open the generator in Wevion
/creative-hub → AI Generate → Avatar tab.
Step 3: Configure
Field | What |
|---|---|
| What the avatar should say (text). Max length typically 1000-3000 chars (~30-90 sec). |
| HeyGen avatar ID |
| Voice (HeyGen-paired or custom from voice library) |
| Language code (en, it, es, fr, de, etc.) |
Step 4: Write the script
Conversational, short sentences. Avatar performance is better with natural speech patterns vs corporate marketing language. Examples that work:
"Honestly, I tried five different headphones before this one — and the difference is night and day."
"Look at how easy this is — just click, pick, done."
Examples that don't work:
Long compound sentences with subordinate clauses
Marketing buzzwords ("synergize", "leverage", "ecosystem")
Technical jargon without context
Step 5: Submit
Click Generate. Returns 202 Accepted + job_id.
Step 6: Track + download
Avatar generation is slower: 2-10 min typical. Longer scripts = longer wait.
Once completed: download mp4 or use in Campaign Creator.
Endpoint
POST /api/v1/creative-hub/generate/avatar (verified).
Body:
script(required)avatar_id(required)voice_id(required)language(e.g. en, it)folder_id(optional)
Returns 202 + job_id. Worker calls HeyGen API, polls until done, downloads mp4, stores in Drive folder, marks creative_job.status: completed.
Cost
Heygen generation is higher-cost than image / TTS. Cost varies by:
Script duration (longer = more)
Resolution
Custom avatar vs stock (sometimes different pricing)
See ch-112 AI credits.
Custom avatars
For custom (you, your spokesperson, an actor):
Upload video footage to HeyGen UI (their consent + capture flow)
HeyGen creates a custom avatar
Use that
avatar_idin Wevion
Custom avatars often perform best (audience sees a real person, not stock).
Multi-language strategy
Same script → same avatar → different voice_id + language per locale → N video variants for N languages.
Use cases:
One spokesperson, 5 languages for international campaigns
Cheaper than re-recording with native actors per locale
Watch for: lip-sync accuracy may degrade in some languages; preview each output.
Best practices
Test with short script first
Run a 10-second script first to validate avatar + voice fit. Then commit to longer scripts.
Match avatar to audience
Audience demographic should see relatable avatar (age, ethnicity, style). Stock library helps; custom is best.
Avoid uncanny scenarios
Don't generate medical / authority claims with stock avatar (audience instinctively distrusts)
Don't make avatar look at extreme angles (HeyGen may not handle gracefully)
Don't over-script (long monologues feel off; cut to multiple short clips)
Pair with B-roll via compositing
Avatar talking head + B-roll footage cuts via compositing (ch-117) feels more produced than talking head alone.
Common mistakes
Long monologues: cut into 10-15 sec clips for ad pacing
Wrong language for voice / script: mismatch tanks performance — match
languageto scriptStock avatar for high-credibility claim: audience can spot stock; use custom or different ad format
Skipping preview test: generate short test first, then commit
Common issues
"avatar_id not found": verify the ID in HeyGen UI; copy-paste error common
Lip-sync looks off: language mismatch OR HeyGen model limitation; try different voice
Script too long error: HeyGen has script length limits; split into multiple shorter generations
Custom avatar not ready: HeyGen processes custom avatar uploads asynchronously; wait until HeyGen marks ready before referencing in Wevion
Related
Text-to-speech — voice-only narration alternative
Video compositing — combine avatar + B-roll + text
AI best practices — broader creative guidance