Our Predictions for AI Dubbing by 2028 (Challenges and Developments)

In today’s global digital economy, video dubbing isn’t just about translation — it’s about access, inclusivity, and storytelling that resonates across languages.
At Kapwing, we’ve spent the past year immersed in the world of AI dubbing. This article explores where the technology is today, what challenges still remain, and what we predict AI dubbing will look like by 2028.
The State of AI Dubbing in 2025
Just a few years ago, high-quality dubbing required expensive voice actors, complex production workflows, and weeks of turnaround time.
Today, thanks to rapid advances in large language models (LLMs), text-to-speech synthesis, and voice cloning, creators can dub videos in multiple languages with near-human voices in a matter of minutes and for a fraction of the cost.
How AI Dubbing Works
AI dubbing involves multiple layers of smart automation. Here’s how we’ve structured the workflow inside Kapwing's AI Dubbing tool:
- Transcription – Converting the video’s audio into text with timestamps.
- Translation – Using an LLM to accurately translate the text, preserving tone and timing.
- Text-to-Speech – Generating synthetic speech in the target language using cloned voices.
- Playback Adjustments – Modifying video and audio timing to ensure natural pacing.
- Collaboration – Allowing teams to edit translations, tweak timing, and ensure natural sounding speech before export.

Kapwing’s editor makes all of this collaborative and editable, so you can work with native speakers, stakeholders, and reviewers to refine the dubbed video before publishing.
How Accurate Is AI Dubbing today?
We rigorously test our AI dubbing with human evaluators every two weeks across 13 language pairs. Our current average quality score is 7.2 out of 10 — meaning it gets most of the way there, but still benefits from human refinement. We're optimistic that we'll reach 85%+ accuracy on the first generation by the end of the year.
We’ve seen rapid improvement in quality due to:
- Better prompt engineering for translations
- Real-time adjustment of speech cadence
- Enhanced multi-speaker recognition
- Smart playback speed changes to match voiceovers naturally
These advancements are paired with and fueled by a rising demand for video localization tools. We've seen adoption soaring in education, marketing, and news sectors where global reach is becoming essential.
Problems Ahead for Dubbing
Despite our progress, AI dubbing has four major remaining challenges:
- Pacing Mismatches – Different languages have different cadences and lengths. “Hi team!” in English might become an 8-syllable phrase in Japanese. We solve this by stretching video segments and refining translations.
- Emotional Tone – Current text-to-speech voices often flatten out emotional nuance. We’re exploring multiple voice clones, voice-to-voice modeling, and fine-grained inflection control.
- Translation Accuracy – Idioms, slang, and proper nouns are easy to mistranslate. We’re building tools like glossaries and inline editors to help creators fix issues quickly.
- Speaker Diarization – Assigning the correct voice to each speaker is hard when the audio changes tone or is interrupted. We allow manual overrides and continually improve our detection models.
We're actively building solutions to these challenges through continuous research and product iteration. While the technology still has ground to cover, it's exciting to see the growing potential for quality improvements as LLMs and related AI technologies evolve.
Cost Trends: Where We Are and Where We're Headed
Today, AI dubbing costs range from $0 to $100 per finished hour depending on language, quality, and control. The most expensive part of the process is the text to speech generation, which can require multiple voice clones. That’s already a fraction of traditional dubbing costs, which can run thousands per hour.
We expect costs to drop another 50–70% by 2028 due to:
- More efficient models
- Scalable infrastructure
- Commoditized voice libraries
Lower costs will open the door not only for high-volume users like large media companies, but also for educators and creators with smaller budgets who were previously priced out of professional dubbing.
5 Predictions for AI Dubbing in 2028
Before we look ahead, it’s worth noting that the pace of innovation in this space has never been faster — and that’s exactly why the next few years will be transformative. Based on what we’ve seen so far, here’s what we think is coming:
- Dubbing becomes default: Like closed captions today, dubbed audio will be expected.
- Hyper-personalized voices: Users will create custom voice clones that reflect accent, tone, and age.
- Real-time dubbing: Live streams will support multilingual dubbing with minimal delay.
- Universal review tools: Editors will auto-flag potential errors in tone, translation, and sync.
- Cultural editing layers: AI will seamlessly localize jokes, references, and tone for each market.
Overall, we believe dubbing will become not just more accessible, but more culturally intelligent.
AI as a Creative Partner in 2028
We firmly believe that humans play a central and lasting role in shaping content created with AI. That won’t change, even as the technology improves. While AI dubbing tools can accelerate the process of video translation, it’s human creators who bring cultural insight, emotional nuance, and creative judgment to the final product.
Think of AI as your dubbing assistant. It gets you 80% of the way, then hands you the reins.
Want to see where AI dubbing is headed? Try Kapwing’s dubbing tool — and join us in shaping a future where every story can be heard in every language.