Being unable to attend a few lectures at my university, I asked my friend to record them with his phone. The lecturer and the students asking questions are often very hard to hear - they’re speaking from everywhere in the room and the lecturer doesn’t speak very loudly. There’s a lot of noise - my friend’s laptop fan (which he often placed next to the phone recording this), the phone hitting things and more.
Although this is my first time editing audio, I found out about the vocal isolation and the noise reduction features, which are very useful sometimes.
The main issue is, after noise reduction the human vocals’ volume is way too low, so I can’t hear anything if I don’t increase the gain to like +35dB. Which REALLY hurts my ears when some leftover noises peak. But worse, I still can’t always hear the vocals because of the noise - doesn’t matter what I do first: isolate vocals or reduce noises.
What do I need to know about optimizing vocal isolation and noise reduction? And how would you prevent sudden peaks in volume from making you deaf?
Unfortunately, the options for improving very noisy recordings are quite limited (this is why professionals use expensive microphones close to the person speaking, and do everything in their power to minimise background noise).
You can safely cut out very low and very high frequencies using the Equalization effect (set the curve to a low level below 150 Hz, flat from 150 to 7000 Hz, and low above 7000 Hz).
The field of artificial intelligence has progressed massively over the past few years, and I have managed to find some interesting work being done on the topic of single-channel source separation, e.g. http://www.csc.kth.se/~cthome/separate/ (if you can’t hear anything, press the 5 dots at the bottom and then go back).
I don’t expect such techniques to yield professional-level audio quality, and certainly not good enough for TV or TED to use. But it appears to me that the raw recording should contain enough data to extract comprehensible vocals, at least theoretically.
I wonder what algorithms Audacity uses. And assuming it’s not a state-of-the-art neural network, can someone recommend a software or a website that might be using such technology?
Audacity has two algorithms for separating sources.
The simple type can be seen in “Vocal Remover” (https://manual.audacityteam.org/man/vocal_remover.html). This works by inverting one channel and mixing with the other channel so as to cancel out sounds that are panned centre (common to both channels).