Text-to-Speech (TTS) — Voice Generation

Last updated: April 18, 2026

Wevion's Text-to-Speech feature converts written text into natural-sounding voiceover audio. You type or paste your ad script, pick a voice, and receive a ready-to-use audio file in seconds. The generated audio is saved directly to your Creative Hub library, where you can preview it, download it, or feed it into a composite video as a narration track.

TTS is the fastest content type in Wevion's AI generation suite. While video and avatar generation can take minutes, a voiceover typically finishes in under ten seconds — making it practical to iterate on multiple script versions quickly.


Prerequisites

  • An active Wevion subscription that includes AI credits
  • At least Media Buyer role or higher in your team workspace
  • Access to Creative Hub (visible in the main sidebar)

No external accounts, plugins, or software are required. Wevion handles the entire voice synthesis pipeline internally.


Accessing the TTS Generator

  1. Navigate to Creative Hub from the left sidebar
  2. Click the magic wand icon in the toolbar at the top of the page to open the AI Generation panel
  3. In the left sidebar of the Generation panel, select the Voice tab (microphone icon)
📸 Creative Hub toolbar with the magic wand icon highlighted, and the AI Generation panel open with the Voice tab selected in the left sidebar

The Voice tab presents a text area for your script, a voice engine selector, and a Generate button. Your recent generation jobs appear below the form.


Step-by-Step Guide

1. Write Your Script

Type or paste the text you want converted to speech into the Text field.

📸 TTS text area filled with sample ad copy reading "Tired of managing ten different tools for your ads? Wevion brings everything into one place. Try it free for 14 days."
  • Maximum length: 5,000 characters per generation
  • The text field supports any language — the AI engine detects and adapts pronunciation accordingly

Tip: Write conversationally. TTS engines produce the most natural results when the input reads like spoken language, not formal writing. Short sentences, contractions, and natural pauses (commas, ellipses) all help.

2. Select a Voice Engine

Use the Provider dropdown to choose which AI voice engine processes your script:

Engine Tier Best For
Premium Professional Final production audio — highest quality, expressive intonation, multilingual
Standard Budget Drafts and high-volume work — clean output at roughly 20x lower credit cost

The Premium engine is selected by default. It uses a multilingual model that supports a wide range of languages natively — it infers the language from your text, so no separate language selector is required.

3. Select a Voice (Optional)

Each engine has a sensible default voice:

  • Premium engine: Female voice ("Sarah") — warm, professional tone
  • Standard engine: Female voice ("Nova") — clear, neutral tone

To use a different voice, enter the Voice ID in the text field. The system uses a default voice if left empty.

Choose from a library of voices with distinct characteristics across both engines.

4. Generate

Click the Voice Over button at the bottom of the form. It is disabled until you enter at least one character of text.

📸 The Voice Over button in active state with a microphone icon, below the completed form

After clicking:

  1. The button changes to Generating... with a loading spinner
  2. A new job appears in the Generation Jobs section with status Pending
  3. Within seconds, the status moves to Processing, then Completed
  4. A View Result link appears next to the job, opening the MP3 audio file in a new tab
  5. The file is automatically saved to your current Creative Hub folder
📸 Jobs panel showing a completed TTS job with checkmark, provider name, script preview, credit cost, and View Result link

You can submit multiple voiceovers without waiting for previous ones to finish.


Available Voices and Languages

Premium Engine — Studio-quality output with emotional expression and natural pacing. A single multilingual model supports multiple languages natively (including English, Spanish, French, German, Italian, Portuguese, Dutch, Polish, Arabic, Hindi, Japanese, Korean, Chinese, and others). Adjustable voice settings (stability, expressiveness) are applied automatically for ad narration.

Standard Engine — Clean, consistent output at lower cost. A set of built-in voices with distinct characteristics. Primarily supports English; basic pronunciation available for other Latin-script languages.


Use Cases for Ad Voiceovers

Narrated Product Demos — Combine a product video with a TTS voiceover explaining features. Generate the audio in Wevion, then assemble the final video with the compositing tool.

UGC-Style Ads — Layer a voiceover on top of lifestyle footage or B-roll to create the feel of a real person narrating their experience.

Multi-Language Variants — Write the same script in multiple languages, generate a voiceover for each, and pair them with the same visual creative. No translation agency or voice actors needed.

Script A/B Testing — Generate voiceover variants from different script angles (benefit-led, problem-led, testimonial-style) and test them in separate ad sets. Fast and inexpensive.

Audio for Composite Videos — TTS audio feeds directly into composite video assembly as the narration layer.

Video Compositing — Assembling Multi-Element Videos


Tips for Natural-Sounding Scripts

Write for the ear, not the eye. Read your script aloud before submitting. If it sounds stilted when you say it, it will sound stilted from the AI.

  • Instead of: "Our platform facilitates the optimization of advertising expenditure."
  • Write: "Our platform helps you spend less on ads and get better results."

Use punctuation to control pacing. Commas create short pauses, periods create full stops, ellipses (...) create dramatic pauses before a key benefit, and dashes create medium pauses for asides.

Keep sentences short. Aim for 10-20 words. Long sentences cause the AI to rush or lose rhythm.

Spell out numbers. Write "fourteen days" instead of "14 days" for more natural delivery.

Match script length to ad duration:

Ad Duration Word Count Characters
15 seconds 35-45 ~200-250
30 seconds 70-90 ~400-500
60 seconds 140-170 ~800-1000

These are estimates. Actual duration depends on voice, language, and engine pacing.


FAQ

How long does generation take? Typically under 10 seconds — the fastest content type in Wevion.

What format is the output? MP3, delivered via CDN for fast playback and download.

Is there a character limit? 5,000 characters per generation. For longer scripts, split into separate generations and combine the audio files in a composite video.

Can I generate voiceovers in other languages? Yes. The Premium engine supports multiple languages natively. Write in the target language and the engine handles pronunciation automatically.

How many credits does it cost? Cost depends on the engine and text length. The Premium engine costs more per character but delivers higher quality. The Standard engine is more economical. See your account settings for current pricing.

Understanding AI Credits

What happens if generation fails? The job status changes to Failed with an error message. Failed generations do not consume credits. Simply retry.

Can I use TTS audio in composite videos? Yes — this is one of the primary use cases. Generate the voiceover, then add it as an audio element in the compositing tool.

Video Compositing — Assembling Multi-Element Videos

Where is the file saved? In your current Creative Hub folder. If you were viewing a subfolder when you opened the Generation panel, the file is saved there; otherwise it goes to the root folder.


Related Articles