Video compositing — assembling multi-element videos
Last updated: May 19, 2026
Video compositing — assembling multi-element videos
POST /api/v1/creative-hub/generate/composite (verified apps/backend/src/routes/api/creative-hub-generate.route.ts). Provider: creatomate (only one verified in apps/backend/src/providers/creative/types.ts). Combines: video clips, images, text overlays, audio (TTS-generated or uploaded), transitions — into a single output mp4. Template-based: Creatomate defines the layout; you populate slots. Async via composite-video.worker.ts. Use case: assemble TTS narration + B-roll video + product image + CTA text in one pipeline.
Who is this for
Mediabuyers building polished video ads from multiple AI-generated or uploaded elements. Especially: anyone producing localized variants (same template, different audio per locale).
What compositing does
Stitches multiple media elements + text + audio into a final video, per a defined template:
Final mp4 = template(
video clip slot,
image overlay slot,
text title slot,
text CTA slot,
audio track slot (TTS or uploaded),
transitions per Creatomate spec
)
Templates live in Creatomate's library. Each template defines:
Layout (positions of elements over time)
Animation / transitions
Slot definitions (variable inputs)
Total duration
The single provider
Wevion uses creatomate only for compositing. Creatomate's API supports template-driven video composition; Wevion calls it with the template + slot values.
How to composite
Step 1: Pick or create a template
Templates are managed in Creatomate's own UI (Wevion references them by template ID). Common templates:
Product hero: title text + product video + CTA + outro logo
Testimonial: avatar video + caption overlay + brand bumper
Multi-product carousel: 3-4 product slots cycled with transitions
Animated text: bold typography over background video
Each template exposes named slots (e.g. headline_text, product_video, cta_text).
Step 2: Open the generator in Wevion
/creative-hub → AI Generate → Composite tab.
Step 3: Pick template + populate slots
Per the template's slot definitions, provide values:
Text slots: type the copy
Video slots: pick from Creative Hub (URL reference)
Image slots: pick from Creative Hub
Audio slot: pick a TTS output (from ch-116) or uploaded audio
Wevion's UI surfaces the slot list dynamically based on the template selected.
Step 4: Submit
Click Generate. Returns 202 Accepted + job_id.
Step 5: Track + download
Compositing time: 30 sec - 5 min depending on template complexity + element count + duration.
Once completed: final mp4 in your Drive folder. Use in Campaign Creator or download for external use.
Endpoint
POST /api/v1/creative-hub/generate/composite (verified).
Body:
template_id(Creatomate template ID)modifications(JSON object of slot name → value)folder_id(optional, target output folder)
Returns 202 + job_id. Worker calls Creatomate API, polls for completion, downloads mp4, stores.
Cost
Compositing cost depends on duration + complexity (more elements = higher rendering cost). Generally lower than video generation since no AI inference is happening — just rendering.
Failed jobs free. See ch-112.
Typical compositing recipes
Product hero (15-30 sec)
template: product-hero-template-id
modifications:
headline_text: "Introducing the X-2000"
product_video: <Creative Hub URL to AI-generated product video>
cta_text: "Shop now →"
audio: <Creative Hub URL to TTS narration>
Output: branded video ad ready for Reels / feed.
Testimonial (20-45 sec)
template: testimonial-template-id
modifications:
avatar_video: <Creative Hub URL to HeyGen avatar video>
caption_text: "From a verified customer"
brand_logo: <Creative Hub URL to logo image>
Output: testimonial-format ad with avatar + captions + branding.
Multi-locale variants
For N languages: same template + same video slots + different audio slot per language (each pre-generated via TTS in target language). N compositing jobs = N localized ad variants.
Faster than re-generating videos per language.
The composition pipeline
Typical full ad creation pipeline using Wevion's tools:
Generate image (ch-113) — product hero shot
Generate video (ch-114) — product in motion (image-to-video using the hero)
Generate avatar (ch-115) — spokesperson saying the script
Generate TTS (ch-116) — additional narration per locale
Composite (this article) — assemble everything into final ad mp4
Each step adds polish; full pipeline takes 10-30 min plus generation latencies.
Best practices
Build templates once, reuse forever
Template setup in Creatomate is one-time work. Once set: just populate slots. For agencies: build per-client templates that match brand guidelines, then populate per-campaign.
Match template duration to placement
Reels / Stories: 6-15 sec
Feed: 6-30 sec
In-stream: 15-30 sec
Choose templates whose intrinsic duration matches placement. Trimming a 30-sec template down to 6 sec usually looks rushed.
Use compositing for localization
Generate localization-ready assets:
Audio variants per language (cheap)
Same video + image slots (no re-generation)
Composite per language = many localized ad variants from one template
Preview before generating final variants
For new templates: composite once with placeholder content to validate slot mapping. Then commit to real content.
Common mistakes
Template doesn't match brand: spend time on template setup; everything cascades from it
Skipping slot validation: typos in slot names = generation fails or produces unexpected output
Generating 10 variants without preview: validate template behavior first
Forgetting audio length must match video length: TTS too long = audio cut off; too short = silent gap. Plan duration up front.
Using compositing for simple cuts: if you just need to splice 2 clips, video editor faster
Common issues
Generation failed (slot mismatch): check
error_message; common: typo in slot name vs template definitionOutput missing element: template expected slot you didn't populate; populate with placeholder or update template to make optional
Audio out of sync: total audio duration doesn't match template's expected duration; adjust TTS length or pick different template
Output quality lower than inputs: compositing re-encodes; some loss expected; verify template's output resolution settings
Related
Generate videos — produce video slots
Text-to-speech — produce audio slots
AI best practices — broader creative guidance