Text-to-Speech (TTS) — Voice Generation
Last updated: April 18, 2026
Wevion's Text-to-Speech feature converts written text into natural-sounding voiceover audio. You type or paste your ad script, pick a voice, and receive a ready-to-use audio file in seconds. The generated audio is saved directly to your Creative Hub library, where you can preview it, download it, or feed it into a composite video as a narration track.
TTS is the fastest content type in Wevion's AI generation suite. While video and avatar generation can take minutes, a voiceover typically finishes in under ten seconds — making it practical to iterate on multiple script versions quickly.
Prerequisites
- An active Wevion subscription that includes AI credits
- At least Media Buyer role or higher in your team workspace
- Access to Creative Hub (visible in the main sidebar)
No external accounts, plugins, or software are required. Wevion handles the entire voice synthesis pipeline internally.
Accessing the TTS Generator
- Navigate to Creative Hub from the left sidebar
- Click the magic wand icon in the toolbar at the top of the page to open the AI Generation panel
- In the left sidebar of the Generation panel, select the Voice tab (microphone icon)
The Voice tab presents a text area for your script, a voice engine selector, and a Generate button. Your recent generation jobs appear below the form.
Step-by-Step Guide
1. Write Your Script
Type or paste the text you want converted to speech into the Text field.
- Maximum length: 5,000 characters per generation
- The text field supports any language — the AI engine detects and adapts pronunciation accordingly
Tip: Write conversationally. TTS engines produce the most natural results when the input reads like spoken language, not formal writing. Short sentences, contractions, and natural pauses (commas, ellipses) all help.
2. Select a Voice Engine
Use the Provider dropdown to choose which AI voice engine processes your script:
| Engine | Tier | Best For |
|---|---|---|
| Premium | Professional | Final production audio — highest quality, expressive intonation, multilingual |
| Standard | Budget | Drafts and high-volume work — clean output at roughly 20x lower credit cost |
The Premium engine is selected by default. It uses a multilingual model that supports a wide range of languages natively — it infers the language from your text, so no separate language selector is required.
3. Select a Voice (Optional)
Each engine has a sensible default voice:
- Premium engine: Female voice ("Sarah") — warm, professional tone
- Standard engine: Female voice ("Nova") — clear, neutral tone
To use a different voice, enter the Voice ID in the text field. The system uses a default voice if left empty.
Choose from a library of voices with distinct characteristics across both engines.
4. Generate
Click the Voice Over button at the bottom of the form. It is disabled until you enter at least one character of text.
After clicking:
- The button changes to Generating... with a loading spinner
- A new job appears in the Generation Jobs section with status Pending
- Within seconds, the status moves to Processing, then Completed
- A View Result link appears next to the job, opening the MP3 audio file in a new tab
- The file is automatically saved to your current Creative Hub folder
You can submit multiple voiceovers without waiting for previous ones to finish.
Available Voices and Languages
Premium Engine — Studio-quality output with emotional expression and natural pacing. A single multilingual model supports multiple languages natively (including English, Spanish, French, German, Italian, Portuguese, Dutch, Polish, Arabic, Hindi, Japanese, Korean, Chinese, and others). Adjustable voice settings (stability, expressiveness) are applied automatically for ad narration.
Standard Engine — Clean, consistent output at lower cost. A set of built-in voices with distinct characteristics. Primarily supports English; basic pronunciation available for other Latin-script languages.
Use Cases for Ad Voiceovers
Narrated Product Demos — Combine a product video with a TTS voiceover explaining features. Generate the audio in Wevion, then assemble the final video with the compositing tool.
UGC-Style Ads — Layer a voiceover on top of lifestyle footage or B-roll to create the feel of a real person narrating their experience.
Multi-Language Variants — Write the same script in multiple languages, generate a voiceover for each, and pair them with the same visual creative. No translation agency or voice actors needed.
Script A/B Testing — Generate voiceover variants from different script angles (benefit-led, problem-led, testimonial-style) and test them in separate ad sets. Fast and inexpensive.
Audio for Composite Videos — TTS audio feeds directly into composite video assembly as the narration layer.
Video Compositing — Assembling Multi-Element Videos
Tips for Natural-Sounding Scripts
Write for the ear, not the eye. Read your script aloud before submitting. If it sounds stilted when you say it, it will sound stilted from the AI.
- Instead of: "Our platform facilitates the optimization of advertising expenditure."
- Write: "Our platform helps you spend less on ads and get better results."
Use punctuation to control pacing. Commas create short pauses, periods create full stops, ellipses (...) create dramatic pauses before a key benefit, and dashes create medium pauses for asides.
Keep sentences short. Aim for 10-20 words. Long sentences cause the AI to rush or lose rhythm.
Spell out numbers. Write "fourteen days" instead of "14 days" for more natural delivery.
Match script length to ad duration:
| Ad Duration | Word Count | Characters |
|---|---|---|
| 15 seconds | 35-45 | ~200-250 |
| 30 seconds | 70-90 | ~400-500 |
| 60 seconds | 140-170 | ~800-1000 |
These are estimates. Actual duration depends on voice, language, and engine pacing.
FAQ
How long does generation take? Typically under 10 seconds — the fastest content type in Wevion.
What format is the output? MP3, delivered via CDN for fast playback and download.
Is there a character limit? 5,000 characters per generation. For longer scripts, split into separate generations and combine the audio files in a composite video.
Can I generate voiceovers in other languages? Yes. The Premium engine supports multiple languages natively. Write in the target language and the engine handles pronunciation automatically.
How many credits does it cost? Cost depends on the engine and text length. The Premium engine costs more per character but delivers higher quality. The Standard engine is more economical. See your account settings for current pricing.
What happens if generation fails? The job status changes to Failed with an error message. Failed generations do not consume credits. Simply retry.
Can I use TTS audio in composite videos? Yes — this is one of the primary use cases. Generate the voiceover, then add it as an audio element in the compositing tool.
Video Compositing — Assembling Multi-Element Videos
Where is the file saved? In your current Creative Hub folder. If you were viewing a subfolder when you opened the Generation panel, the file is saved there; otherwise it goes to the root folder.