An audio-to-text converter is an application that takes real-time or pre-recorded speech and automatically converts it to text i.e. transcribes it.
There are many versions of audio-to-text converters, and they have confusingly similar names – voice-to-text apps, speech-to-text apps, speech-recognition software,transcription software, dictation apps, etc.
In essence they all do the same thing – convert audio to text. But each solution differs in capability and functionality.
This post explains what audio-to-text converters can and cannot do, and how to go about selecting the right one for YOU.
First up, here’s a roundup of the best free and paid audio-to-text converters available out there:
List of Free and Paid Audio-to-Text Converters
App | Type | Price |
---|---|---|
1. Dragon Speech Recognition Software | Dictation & Transcription | Starts at $150.00 |
2. Notes from Apple | Dictation | Free |
3. Windows Speech Recognition | Dictation | Free |
4. Google Docs | Dictation | Free |
5. Braina | Dictation | $39.00 per year |
6. Speechnotes | Dictation | Free |
7. Transcribe | Audio Player | Free |
8. Inqscribe | Audio Player | $99.00 |
9. Express Scribe | Audio Player | Free |
10. VoiceBase Web App | Transcription | Free up to 50 hours per month |
11. Descript | Transcription | $0.15 per audio minute |
12. Trint | Transcription | $15.00 per audio hour |
13. Happy Scribe | Transcription | €0.15 per audio minute |
14. Go Transcribe | Transcription | $0.22 per audio minute |
How do Audio-to-Text Converters Work?
Audio-to-text converters work on speech recognition technology.
This technology uses a complex combination of linguistics, mathematics, and computing to understand human speech and convert it into text or commands. (Source: Explainthatstuff.com).
Now speech recognition is a vast topic, but here’s a simple explanation for the purposes of this post:
Speech recognition can be divided into two categories – Speaker Dependent and Speaker Independent.
The Speaker Independent version can understand a limited set of words, irrespective of who is speaking.
This version is often used in phone IVRs where limited vocabulary is required within a known context.
The Speaker Dependent version is capable of holding a larger vocabulary, but needs to be trained to recognize individual speakers’ voices. (Source: Government Technology).
You have probably used this version in some of your favorite apps like Apple’s Siri, Google Assistant, Amazon’s Alexa, and Shazam.
Today, practically every dictation and transcription app worth talking about uses speaker dependent speech recognition technology in combination with AI/Machine Learning.
Should You Use Audio-to-Text Converters
The upside of using audio-to-text converters is that they can obviously save you a ton of time and energy by automating what is otherwise a very tedious and manual process.
The downside is that the accuracy of transcripts produced by these applications is inconsistent (ranging between 40-95%).
The accuracy is significantly lower for multi-speaker recordings.
This may work for some people who need transcripts for internal use, but if you want to print or share the transcripts with others, then you’ll need to reserve time for proofreading.
Types of Audio-to-Text Converters
There are 3 types of audio-to-text converters:
1. Dictation apps
2. Transcription apps.
3. Audio Players for manual transcription.
They may go by different names (voice-to-text apps, audio-to-text apps, transcription apps, speech-to-text apps, etc.) but they fall in one of these categories.
Dictation apps
As the name suggests, dictation apps transcribe your speech real-time as you talk into a microphone.
If you’re looking for an app to take notes, write a book, dictate patient reports, etc. then dictation apps would be a good choice.
Transcription apps
Transcription apps allow you to upload a recording and create a transcript from it.
If you need pre-recorded audio or video transcribed, then these are more appropriate for you.
Audio Players
And finally there are audio players that come with hot-keys for playback and time-coding to simplify manual transcription.
If you’re saying ‘But those are not audio-to-text converters’ you’re quite right. However, they will come up in search results if you Google ‘audio-to-text converter’ and so it’s important to know what they do.
If you choose to transcribe your own audio/video files, then these players are a far better choice compared to the in-built players on your PC or Mac.
Things to Consider Before Picking an Audio-to-Text Converter
There are quite a few audio-to-text converters available in the market today, ranging from free and bare-bones to expensive and highly sophisticated.
Ask yourself these questions to decide which one is best for you –
Do You Have Time to Proofread?
No app delivers 100% accuracy (neither do human transcribers for that matter, but that’s a topic for a different post).
The accuracy of audio-to-text converters can range from 30-95% depending on the apps quality and other factors discussed below.
This means that you would probably have to proofread and re-check transcripts for accuracy.
If you are into research, then this shouldn’t be a problem because you would re-listen to the recording anyway.
But if you’re a journalist or podcaster then the proofreading will mean additional time and effort and may affect deadlines.
How many speakers are there on your recording?
This is important.
As we know, speech recognition technology is primarily geared towards single-speaker audio. Therefore the accuracy of audio-to-text converters is quite low on multi-speaker audio.
Of course they would still produce some form of a transcript, but you may find that making corrections takes the same amount of time as transcribing the whole thing manually (sometimes longer).
If you mostly dictate or record single-speaker audio, then an audio-to-text converter is a good choice.
But if most of your audio is multi-speaker (interviews, focus groups, meetings, etc.), then you may want to consider manual transcription or outsourcing.
How’s the Audio Quality?
Audio-to-text converters require clearly audio to transcribe accurately.
Some apps claim to have ‘noise-cancellation’ features, but that generally fails when the background noise is loud.
Other factors like low volume, echo, or raspy/slurry speech also negatively impact the accuracy of audio-to-text converters.
So if you use a good quality headset or external microphone while dictating or recording audio, then audio-to-text converters would produce good results.
But average quality audio recorded over say the mic on your laptop would produce poor results.
Is the Rate of Speech Steady?
Audio-to-text converters work best when you speak at an even pace and pause regularly between sentences.
This is possible when you are dictating or recording in a controlled environment.
However, during interviews, classroom lectures, focus groups etc. controlling the rate-of-speech of speakers is impossible and therefor error rates on these types of audio are much higher.
How Complex is the Vocabulary?
Good audio-to-text converters come with a large built-in vocabulary and would understand most of what is said in general parlance.
But if your audio is technical in nature then you may need to ‘train’ the converter to understand the terminology.
This is not a bad thing because training the app will make things easier in the long run. Just takes a little time.
Which Language and Dialect Do the Speakers Have?
In general, audio-to-text converters claim to understand 30-100 languages and dialects.
Accuracy levels may vary from language-to-language, so it’s a good idea to do a trial first.
Punctuation
Audio-to-text converters can’t put in punctuation automatically, so all those commas, semi-colons, question marks and periods have to dictated or inserted manually while proofreading.
What is the Format of Your Recording?
Some audio-to-text converters can only transcribe recordings that have been recorded on their proprietary software.
This can be a deal-breaker if you use a voice recorder or other device to create recordings in a format that the converter does not accept.
Do You Require Time Coding?
As of today, no audio-to-converter has the ability to insert periodic time codes in transcripts. This is something you will have to do manually at the time of proofreading.
12 Audio-to-Text Converters 2018
Now that you know how audio-to-text converters work, here’s the list we mentioned earlier:
App | Type | Price |
---|---|---|
1. Dragon Speech Recognition Software | Dictation & Transcription | Starts at $150.00 |
2. Notes from Apple | Dictation | Free |
3. Windows Speech Recognition | Dictation | Free |
4. Google Docs | Dictation | Free |
5. Braina | Dictation | $39.00 per year |
6. Speechnotes | Dictation | Free |
7. Transcribe | Audio Player | Free |
8. Inqscribe | Audio Player | $99.00 |
9. Express Scribe | Audio Player | Free |
10. VoiceBase Web App | Transcription | Free up to 50 hours per month |
11. Descript | Transcription | $0.15 per audio minute |
12. Trint | Transcription | $15.00 per audio hour |
13. Happy Scribe | Transcription | €0.15 per audio minute |
14. Go Transcribe | Transcription | $0.22 per audio minute |
In coming posts we’ll sharing detailed reviews of each of these apps and some more.
Do you know of a good audio-to-text converter that you think we should included in this list? Leave a note below and we’ll look it up!
Vishal says
Hello! There is another one that does a decent job for a little cheaper than some on here. mebos.com!
Ben says
One of the best speech to text apps is missing in your list – “Dictation Pro” This is the best app to create documents just with your voice. This also provides vocabulary training to correctly type the words. You will only need a good quality microphone and your PC. Consider adding this to your list, it will be helpful for many people. https://www.deskshare.com/dictation.aspx
Alissa Pagels-Minor says
Hello,
VoiceBase is no longer a consumer app product but rather a commercial enterprise B2B solution. VoiceBase processes calls and text chats for contact centers and businesses with 100s of hours of customer interactions.
Transcription is available for large volumes of calls starting at $0.01 cents a minute, but not free transcription.
Alisandra says
Hey, I like your top, but I can also add this service – https://audext.com/. It is really helpful nowadays.
Alan says
An automated service called ebby.co is quite affordable (10c/min) and deals quite well with multiple speakers too.
What I like most about them is their online editor that has an integrated player and helps me quickly proof the transcript and later search for anything that what said in the interview
Ashley says
Hi! Thank you, nice article. I have to use such resources to make my online learning faster. It was mentioned but I like audext.com as well. I can’t imagine what I would do without these services