How do I improve the voice quality in audio files with poorly recorded voice quality, such as the one attached with this post?
It’s only a small 10 sec file that I have attached, but I have got hours and hours of such audio files with me.
The voice is quite distorted and I have got hundreds of hours of such audio files to be transcribed. I searched on Google and tried a few things on my own, but the result isn’t any better. So any help would be appreciated.
You may be able to reduce the background noise a little with the Noise Reduction effect (https://manual.audacityteam.org/man/noise_reduction.html) but note that if you try to reduce the noise too much, the remaining sound will be even more garbled than it already is, and so become harder to understand.
Probably the best you will be able to get (for intelligibility) is just amplifying up to 0 dB (the default setting in the Amplify effect).
To preface, I wrote and maintain this guide on doing essentially what you ask. I have cleaned over 1500 voice files of vastly different quality, with very different microphones and a wide range of background noise types. Sometimes it is possible to do really impressive things. The problem here is that the data simply is not there to recover. It actually has some great points to get an NR sample, but the noise is much stronger than the signal. I did fiddle with it, none of the techniques that I know to use are able to do much.
The loudest point in the original file is only 0.15, which means that it is effectively a 13-bit file with around 78db of dynamic range. That would normally be fine for clear speech, but that is at the LOUDEST point. Most of the file is around 0.02 to 0.06 which is effectively an 7 - 9 bit file with about 40-50 db of dynamic range. The background hiss itself hits that level when normalized, so I BELIEVE that means it is consuming half of the available dynamic range (I’m not entirely sure on that calculation, but the important thing is that it is quite a bit). Further, when the microphone is able to pick up a decent signal it has a significant amount of distortion which appears to be consuming most of the remaining available range.
Thanks a lot Steve. That Noise Reduction setting is really helping out in letting me hear the voice clearly.
BTW, I have kept the 20Hz to 60Hz levels to the minimum because I read somewhere that this will reduce the “hiss” and background noise a little bit.
I think in that same page, it mentioned that to hear a human voice ‘clearly’ the frequencies around 3000Hz can be increased, so that’s why I have kept the 2000Hz, 2500Hz, and the 3000Hz frequencies at the max.
Should I reduce/increase any other frequencies to hear the voice clearly (less robotic distortion)?
… I have no idea about audible frequencies, as I’m a newbie transcriber trying to make some extra side income from transcribing sucky audios, so any help would be appreciated.
I agree that boosting the 2 - 3kHz range, as you suggest, makes it a bit more intelligible (to me), but I’d not boost the 2kHz (2000 Hz) range as much as in your picture as (for me) it makes the sound very harsh and difficult to listen to for very long. However, this is a subjective matter, so whatever works best for you is what matters
I don’t see getting much more out of it than you already have, for the reasons I mentioned above. I’m glad that my guide is helpful, but it is worth remembering that it is mainly concerned with remove unwanted audio from otherwise reasonably good quality files. This is related, but much harder. I don’t know how long your typical file is, but the kind of “by-hand” work require to clean up some of these things would be extremely time consuming.
Another thing that ocurred to me, are the files you were given 8-bit or 16-bit WAVs? Were they even WAVs? I forgot that Audacity exports to 16-bit by default. If the actual quality of the files you were given is worse than the ones you are exporting…heaven help you.
You can’t AFAIK. I always check it in Foobar 2000, but VLC will show you as well under [Tools → Media Information → Codec → Bits per sample]
That said, even if it says 16-bit the file may have been 8-bit at some point in the chain so it isn’t an absolute guarantee that the data is still there to be recovered. The proof is in the pudding as they say.