How to Transcribe Audio to Text

Learn the best way to transcribe audio to text, plus get our best transcription tool recommendations for 2024.

How to Transcribe Audio to Text

Transcribing audio is a powerful way to unlock new potential from existing content. Busy creatives can turn audio and video content into articles and social media posts, pull insights from interview transcripts, and even send out minutes from your meeting recordings.

All you need is an efficient audio transcription workflow. 

We’ll break down the different ways to transcribe audio to text, provide a step-by-step tutorial on how to use our preferred transcription tool, and round up the best transcription options currently available. 

TL;DR: The best way to transcribe audio is by using an automatic transcription tool. Jump to the tutorial.

Our quick recommendations: Kapwing (for professional transcription and longer files), or Microsoft Word (for casual, one-time transcription of shorter files).

The 3 methods for audio transcription

There’s more than one way to convert audio into text, of course, however not all transcription methods are created equal. 

Here’s a quick breakdown of the three most common ways to transcribe audio:

1. Manual transcription 

Time consuming, tedious, and prone to human error, manual transcription is, nevertheless, one way to turn an audio or video file into text.

How it works:

You listen back to the audio file and type out what you hear as you hear it. Even if you’re a relatively fast typist, you’ll need to pause the playback often and replay certain parts of the audio to catch anything you might have missed. To speed up the process, you can use shorthand but unless you’re moonlighting as a stenographer, this doesn’t increase efficiency all that much as you’ll still need to translate your notes into longhand.

This might seem like a good option if you only have to transcribe a very short file, but it’s often still more efficient to use a dedicated transcription tool.

At most, we recommend reading your transcription after it's created using one of the methods below and manually revising sentences or individual words for pinpoint accuracy. But if you value your time, manual transcription isn't the way to go.

2. Paid transcription services

Transcription services, not to be confused with transcription tools, are essentially just outsourced manual transcription.

How it works:

You send your files to a transcription service that will do the manual transcription work for you. These services usually have a team of transcription professionals on staff but the turnaround time is still at least a few business days. And because manual transcription is tedious, detail-oriented work that requires a number of man hours, these services are often quite expensive.

In the past, people have preferred transcription services to transcription tools because they were more accurate. However, with the continual improvement of AI-powered speech recognition, that’s no longer the case. Projects with lots of background noise or unclear audio are the rare exception to this rule.

3. Automatic transcription tools

The fastest and easiest way to transcribe audio is with an automated transcription tool. 

How it works:

You upload your files into the transcription tool which uses AI-powered speech recognition software to analyze and convert the spoken audio into text. Unlike the other two transcription methods, AI transcription is fast and efficient, taking a few seconds to a few minutes, depending on the length of your file.

Both the speed and affordability of AI-powered transcription tools make them the best choice for most transcription projects. Transcription accuracy for AI transcriptions, measured as Word Error Rate (WER) average in the 85% to 90% range, depending on the API used. Most of the errors occur when the audio is unclear or has multiple overlapping speakers. Other common errors are usually mistranscriptions of proper nouns and homonyms—an issue you can solve with Kapwing's Custom Spelling.

As machine learning progresses and training data sets expand, these errors become less and less frequent. 

How to transcribe audio to text with Kapwing

Using an automatic transcription tool is the method we recommend for turning audio into text. Here’s a step-by-step tutorial on how to use our preferred tool, Kapwing’s transcript generator.

Step 1: Upload your audio file to Kapwing

Kapwing is a browser-bassed video and audio editing tool, so there's nothing to download. To get started, go to kapwing.com and open a new project.

Upload the file you want to transcribe from your device, the cloud, or from a link if you’ve already published it online. Kapwing supports transcription for video and audio files, so there’s no need to convert your video to mp3 before uploading it.

Step 2: Generate your audio transcription

Once your file is uploaded, open the “Transcript” tab in the left-side menu. There are a few transcript editing options available. Select “Trim with Transcript.

Set the language for your transcript by choosing from the “What language is your video in?” dropdown menu. This is an important step that will make sure the AI is transcribing words correctly. By default, it’s set to English but there are over 70 languages to choose from. 

When you’re ready, click “Generate Transcript.”

Sit back and let Kapwing generate a fast and accurate transcript for your project. When it’s ready, it will appear in the textbox of the Transcript window.

Step 3: Automatically remove filler words

Your audio transcript is now ready to use, but before you export you can use one of Kapwing’s smart tools to remove filler words (“um” and “uh”) from the text first.

To do so, click the Smart Tools drop down at the top of the Transcript window. Select “Cut Filler Words.”

This is helpful if you’re transcribing audio from a podcast, interview, or other unscripted recording that you want to sound a little more polished in written form.

Step 4: Export your audio transcription

Finally, click the download transcript button. Your audio transcript will be automatically saved to your device as a .txt file. 

You can edit the transcript in any .txt editor or copy and paste it into your preferred word processor. We recommend reading through the transcript for any misspelled words or small errors.

While Kapwing’s transcription tool has a low Word Error Rate, like all AI-powered transcription generators it can miss a few things here and there, particularly proper nouns with abnormal spellings (like our own brand name, Kapwing!). Set up Custom Spelling in your Brand Kit to catch these small mistakes in advance and spend less time making manual edits.

The best audio transcription tools

There are plenty of automatic transcription tools on the market now, especially as machine learning continues to improve speech recognition and analysis. Which one you choose is largely dependent on what you need it for. 

We’ve broken down our top tool recommendations by category to better help you find the best tool for your needs. 

1. Best for Microsoft Office Users: Word

Believe it or not, you can transcribe whole audio files to text using a tool you might already have on your computer: Microsoft Word. And for any fellow Mac users, don’t worry—this feature is also available in Word for Web, so you can access it from your browser, too.

It’s a little bit hidden, but you should be able to find the Transcribe tool in the toolbar. You may need to click on the three dot icon to expand the toolbar. Select “Dictate” and then “Transcribe.” Word gives you the option to record audio directly or upload a file.

For a built-in tool, Word’s transcription feature is pretty robust. It supports Speaker Identification, allows you to edit within the transcript window, and allows you to add the whole transcript to your document or just insert certain quotes. 

Users are generally happy with the accuracy of the transcriptions but do find the auto-punctuation feature overzealous, which calls for some manual editing of the transcription.

✅ Reasons to try it:

  • Integrated with Microsoft tools you already use
  • Speaker Identification (labeled as Speaker 1, Speaker 2, etc.)
  • Easy to use and edit
  • Free

❌ Reasons to skip:

  • Auto-punctuation feature is prone to errors and can’t be turned off

💲 Pricing:

  • Free – included with your paid Microsoft Office subscription and the free Word for Web version

2. Best for Google Workspace Users: Google Docs & Google Meet

There are two Google tools that support transcription: Google Docs and Google Meet. They work a little bit differently, though, and neither really provides the full functionality of a dedicated transcription tool. For specific applications, though, they might be what you need.

Google Docs has a feature called “Voice Typing” which allows you to dictate text and have it transcribed directly. Like Microsoft Word’s transcription tool, it’s hidden in a toolbar menu. Go to “Tools” then scroll down to “Voice Typing” or use the keyboard shortcut cmd/ctrl + shift + S. 

As a dictation device, this is pretty useful, but Voice Typing won’t transcribe existing files. 

It also doesn’t automatically punctuate or capitalize anything, so if you are using Voice Typing for live transcription, make sure to speak clearly and indicate any punctuation verbally, e.g., “The blue ‘comma,’ red ‘comma,’ and white paints are all empty ‘period.’”

Google Meet has a transcription function to convert recorded meetings into text. This feature functions much more like a typical transcription and even has Speaker Identification. The only downside is that you can only use it to transcribe Google Meet calls and you record the call in order to generate a transcription, which requires a bit of forethought.

Caption: Meeting name, attendee names, and disclaimer in the header of a Google Meet transcript.

That said, Google Meet transcripts are generally accurate and label each individual speaker, pulling names from attendee data. The transcripts are generated as editable Google Docs, which makes it easy to repurpose your meeting recording.

✅ Reasons to try it:

  • Integrated with Google tools you already use
  • Speaker Identification (for Meeting transcripts)
  • Free

❌ Reasons to skip:

  • No way to transcribe existing files
  • Slow and prone to transcription errors

💲 Pricing:

  • Free – included with your Google Workspace

3. Best for transcribing video audio: Kapwing

This is our tool! 

Kapwing gives you the ability to pull accurate transcripts from any file or published video. And because our transcription tool is built into our full-suite video editor, there’s a lot more you can do with your transcript than just export it:

  • Turn your transcript into engaging, animated subtitles.
  • Translate your transcript to reach new audiences.
  • Edit your transcript to edit your video—just like a text doc.
  • Find the highlights in your transcript and automatically turn them into clips.
  • Convert your transcript into an AI-generated voice over.

Even just looking at the transcription feature on its own, though, there’s a lot to recommend it.

✅ Reasons to try it:

  • Fast transcription turnaround
  • Browser-based tool with no download required
  • Works on any operating system
  • Supports transcription in over 70 languages
  • Accurate, reliable transcriptions (10% to 20% WER)
  • Integrated with full studio editor

❌ Reasons to skip:

  • No dedicated mobile app
  • No Speaker Identification

💲 Pricing:

  • Free – 10 minutes of transcription/month
  • Pro – 300 minutes of transcription/month; $16/month, billed annually
  • Business – 900 minutes of transcription/month; $50/member per month, billed annually
  • Enterprise – Custom limits; pricing by request

4. Best for transcribing meetings: Fireflies

If the main thing you want to transcribe is meetings or interviews, Fireflies might be the right option for you. This transcription tool integrates with different video conferencing apps, capturing video and audio recording, and then transcribing the recording once the call ends.

In addition to fast transcription, Fireflies has a few other useful features for meeting reviews and follow-ups. Take advantage of the AI-powered transcription analysis to generate meeting notes, assign action items, and even track speaker talking time.

Fireflies integrates with popular video conferencing apps, like Zoom, Google Meet, and Microsoft Teams, as well as other productivity and collaboration tools your team might use, like Slack, Asana, and Notion.

✅ Reasons to try it:

  • Integrates with video conferencing apps you already use
  • Supports Speaker Identification
  • AI-powered meeting summaries, analysis, and follow up
  • Supports easy collaboration with tool integrations like Notion and Slack
  • Supports audio file upload, not just active meeting recording/transcription
  • 60 transcription languages

❌ Reasons to skip:

  • Might be more functionality than you need for just standard transcription
  • Transcript accuracy could be better
  • Transcription credit system is confusing

💲 Pricing:

  • Free – unlimited transcription*; limited AI summaries
  • Pro – unlimited transcription, unlimited AI summaries; $10/month
  • Business – unlimited transcription, unlimited AI summaries, unlimited storage; $19/month
  • Enterprise – unlimited transcription, unlimited AI summaries, unlimited storage; pricing by request

*For all plans, unlimited transcription only applies to offline meetings or voice notes transcribed with the mobile app, recorded meetings with the Fireflies meeting bot, or Google Meet calls recorded with the Fireflies Chrome extension.

Benefits and best ways to use audio transcription

No matter what tool you choose, there are plenty of uses for audio transcriptions. Here are a few of the top use cases for audio to text transcription.

1. Accessibility

Video and audio aren’t always the most accessible content formats. For deaf and hard of hearing audience members, having a written component to translate the audio is key. Two ways to make content more accessible with transcription include:

  • Generating a transcript to accompany your video or audio recording. Many podcasts will include episode transcripts for audience members who can’t listen directly to the audio.
  • Creating a subtitles file from your transcript. All video content online should have subtitles—80% of young people use subtitles "some or most of the time" when watching video, even without hearing impairments. Most transcript generators also include the option to export the file as an SRT.

2. Repurposing

Repurposing certainly feels like the hot new thing in marketing, but not all repurposing involves chopping up larger videos into smaller clips. When you think a little outside the box, transcriptions can help reimagine all sorts of  content for an entirely new medium. Here are some valuable ways to use repurpose content with transcriptions:

  • Finding clips from a podcast episode. It’s hard to know as you’re recording which moments will make for good episode clips. Reviewing the transcript after the fact, or using a transcript-powered clip finder, can help you find the best highlights.
  • Turning a YouTube video into a blog post. If you have both a YouTube channel and a blog, there’s a lot of opportunity for cross-posting.
  • Pulling important quotes from an interview. Even if you only need one or two direct quotes from an interview, it’s still much faster and easier to pull them from the transcript rather than listening back and transcribing manually.
  • Getting the top takeaways from a webinar. Webinars are full of great insights, locked away inside a 30+ minute recording. Transcribing your webinars lets you share those insights in your follow-up marketing materials with less hassle. 
  • Atomizing long-form content for social media. Not every great point made in a video or podcast episode needs to be a LinkedIn text post… but some of them do.

3. Internal processes

Marketers and content creators aren’t the only ones who use audio transcription. Transcribing audio to text can be quite useful for internal communications and people operations, particularly for:

  • Reviewing or summarizing recorded meetings. Whether you’re taking notes for anyone who missed the meeting, recording minutes for future review, or sharing back insights to stakeholders, it’s easier when you transcribe the recording.
  • Coaching clients or employees. With a transcript, you can get an idea of who spoke most during a meeting, what the general sentiment was, how many filler words were used, all of which is helpful information for coaching someone on public speaking or sales. 

Whether you’re repurposing content for a new medium, making your existing content more accessible, or streamlining internal processes, the best way to transcribe audio is with a fast, reliable automatic transcription tool like the ones recommended in this article.

Create content faster with Kapwing's online video editor →