How do I improve the voice quality in such audio files?

Transcriberwala · July 11, 2018, 10:10am

How do I improve the voice quality in audio files with poorly recorded voice quality, such as the one attached with this post?
It’s only a small 10 sec file that I have attached, but I have got hours and hours of such audio files with me.

The voice is quite distorted and I have got hundreds of hours of such audio files to be transcribed. I searched on Google and tried a few things on my own, but the result isn’t any better. So any help would be appreciated.

steve · July 11, 2018, 10:20am

You can make it louder with the Amplify effect: https://manual.audacityteam.org/man/amplify.html

You may be able to reduce the background noise a little with the Noise Reduction effect (https://manual.audacityteam.org/man/noise_reduction.html) but note that if you try to reduce the noise too much, the remaining sound will be even more garbled than it already is, and so become harder to understand.

Probably the best you will be able to get (for intelligibility) is just amplifying up to 0 dB (the default setting in the Amplify effect).

Transcriberwala · July 11, 2018, 12:30pm

Thanks for the reply Steve.

How can I reduce the the noise a 'little" bit? … What settings would you recommend in the ‘Noise Reduction’ Step 2 ?

Also, I have set my Equalizer setting to this:

so as to get a clearer voice quality. I’m able to hear the voice better now, but it’s still very “robotic”. Can you suggest an even better Equalizer setting to make the voice sound clearer?

steve · July 11, 2018, 2:35pm

As a starting point, try 6, 6, 3.

Possibly a little easier to listen to if the 2000 Hz slider is not quite as high as that.

nintendoeats · July 11, 2018, 4:04pm

To preface, I wrote and maintain this guide on doing essentially what you ask. I have cleaned over 1500 voice files of vastly different quality, with very different microphones and a wide range of background noise types. Sometimes it is possible to do really impressive things. The problem here is that the data simply is not there to recover. It actually has some great points to get an NR sample, but the noise is much stronger than the signal. I did fiddle with it, none of the techniques that I know to use are able to do much.

The loudest point in the original file is only 0.15, which means that it is effectively a 13-bit file with around 78db of dynamic range. That would normally be fine for clear speech, but that is at the LOUDEST point. Most of the file is around 0.02 to 0.06 which is effectively an 7 - 9 bit file with about 40-50 db of dynamic range. The background hiss itself hits that level when normalized, so I BELIEVE that means it is consuming half of the available dynamic range (I’m not entirely sure on that calculation, but the important thing is that it is quite a bit). Further, when the microphone is able to pick up a decent signal it has a significant amount of distortion which appears to be consuming most of the remaining available range.

Here is the best I could do. Some of that hiss could still come out, but it wouldn’t significantly effect the clarity of the voice.

Transcriberwala · July 11, 2018, 5:05pm

Thanks a lot Steve. That Noise Reduction setting is really helping out in letting me hear the voice clearly.

BTW, I have kept the 20Hz to 60Hz levels to the minimum because I read somewhere that this will reduce the “hiss” and background noise a little bit.
I think in that same page, it mentioned that to hear a human voice ‘clearly’ the frequencies around 3000Hz can be increased, so that’s why I have kept the 2000Hz, 2500Hz, and the 3000Hz frequencies at the max.

Should I reduce/increase any other frequencies to hear the voice clearly (less robotic distortion)?

… I have no idea about audible frequencies, as I’m a newbie transcriber trying to make some extra side income from transcribing sucky audios, so any help would be appreciated.

steve · July 11, 2018, 5:48pm

I agree that boosting the 2 - 3kHz range, as you suggest, makes it a bit more intelligible (to me), but I’d not boost the 2kHz (2000 Hz) range as much as in your picture as (for me) it makes the sound very harsh and difficult to listen to for very long. However, this is a subjective matter, so whatever works best for you is what matters

Transcriberwala · July 12, 2018, 8:21am

Thanks Nintendoeats for the excellent guide. I am going through it step-by-step and learning a lot.

Here’s the Dropbox link to the entire audio file which I cleaned as much as possible via Audacity: https://www.dropbox.com/s/qa98h5nbhbxdy73/T558003%20(01_35_00%20-%2001_40_00).wav?dl=0
and here’s My Transcript of whatever I could hear: https://www.dropbox.com/s/aedcp1olro59dqm/My%20Transcript.odt?dl=0

Is it possible to make the audio quality better than this?

Thank you Steve for that suggestion. I will reduce the 2000Hz setting for all of my future audio files cleanup.

nintendoeats · July 12, 2018, 1:57pm

I don’t see getting much more out of it than you already have, for the reasons I mentioned above. I’m glad that my guide is helpful, but it is worth remembering that it is mainly concerned with remove unwanted audio from otherwise reasonably good quality files. This is related, but much harder. I don’t know how long your typical file is, but the kind of “by-hand” work require to clean up some of these things would be extremely time consuming.

Another thing that ocurred to me, are the files you were given 8-bit or 16-bit WAVs? Were they even WAVs? I forgot that Audacity exports to 16-bit by default. If the actual quality of the files you were given is worse than the ones you are exporting…heaven help you.

Transcriberwala · July 12, 2018, 2:46pm

Some of the files are .wav and some are .mp3

BTW, how do I check if an audio file is 8-bit or 16-bit in Audacity?

nintendoeats · July 12, 2018, 2:53pm

You can’t AFAIK. I always check it in Foobar 2000, but VLC will show you as well under [Tools → Media Information → Codec → Bits per sample]

That said, even if it says 16-bit the file may have been 8-bit at some point in the chain so it isn’t an absolute guarantee that the data is still there to be recovered. The proof is in the pudding as they say.