We currently perform software based voicemail transcription, but we find that our transcription accuracy suffers when audio quality is poor (breaks or glitches where the end or beginning of a word has a sharp spike) OR when speakers speak abnormally fast without leaving proper pauses between words (I guess an abnormally fast tempo?). I had two questions that it would be great to get some advice on:
For the general audio quality (when there’s glitches) - especially when dealing with trying to optimize thing before transcription - are there specific steps I can try with Audacity that users have experience with?
Can we setup Audacity (or other applications) to detect the tempo of a .wav file and, our case, if above a certain threshold - slow the .wav file tempo down to our ideal range. We’d like to set things up to do this in an automated way because we want to be able to process large numbers of files.
If the recording has dropouts where something was spoken but not captured, that can’t be corrected and needs to be fixed before recording.
Some people use limiting to deal with spikes. If there are “mouth smacks” there is an experimental De-Clicker for speech plugin.
There is Change Tempo but it does not work well for slowing down an individual word or vowel because it is not length accurate. Try Sliding Time Scale / Pitch Shift which is slower but length-accurate.
Tempo detection is hard. It usually only works (somewhat) for pop songs with strong beats. As far as I know there is no reliable way to detect tempo of speech.
If you are only interesting in playing audio more slowly for transcription, and not in exporting the finished file slowed down, you can use the Play-at-Speed button. Like Change Speed, this affects the pitch (slowing down will lower the pitch).