Being unable to attend a few lectures at my university, I asked my friend to record them with his phone. The lecturer and the students asking questions are often very hard to hear - they’re speaking from everywhere in the room and the lecturer doesn’t speak very loudly. There’s a lot of noise - my friend’s laptop fan (which he often placed next to the phone recording this), the phone hitting things and more.
Although this is my first time editing audio, I found out about the vocal isolation and the noise reduction features, which are very useful sometimes.
The main issue is, after noise reduction the human vocals’ volume is way too low, so I can’t hear anything if I don’t increase the gain to like +35dB. Which REALLY hurts my ears when some leftover noises peak. But worse, I still can’t always hear the vocals because of the noise - doesn’t matter what I do first: isolate vocals or reduce noises.
What do I need to know about optimizing vocal isolation and noise reduction? And how would you prevent sudden peaks in volume from making you deaf?
P.S. I’m using version 2.3.0.
Unfortunately, the options for improving very noisy recordings are quite limited (this is why professionals use expensive microphones close to the person speaking, and do everything in their power to minimise background noise).
You can safely cut out very low and very high frequencies using the Equalization effect (set the curve to a low level below 150 Hz, flat from 150 to 7000 Hz, and low above 7000 Hz).
There’s a “Pop Mute” Nyquist plug-in that can automatically reduce loud noises (not only “pops”, any noise that is louder than the “Threshold”) https://wiki.audacityteam.org/wiki/Nyquist_Effect_Plug-ins#Pop_Mute
Installation instructions are here: https://manual.audacityteam.org/man/installing_effect_generator_and_analyzer_plug_ins_on_windows.html#nyquist_install
expensive microphones close to the person speaking
Any microphone close to the person speaking.
Or, yes, you could use an expensive microphone, too.
Recording a lecture is an insanely difficult recording job and no, you usually can’t fix it in post production.
The TED Talks went through a lot of growing pains before they got the sound right. The early shows were nice to look at, but the sound was terrible.
The field of artificial intelligence has progressed massively over the past few years, and I have managed to find some interesting work being done on the topic of single-channel source separation, e.g. http://www.csc.kth.se/~cthome/separate/ (if you can’t hear anything, press the 5 dots at the bottom and then go back).
I don’t expect such techniques to yield professional-level audio quality, and certainly not good enough for TV or TED to use. But it appears to me that the raw recording should contain enough data to extract comprehensible vocals, at least theoretically.
I wonder what algorithms Audacity uses. And assuming it’s not a state-of-the-art neural network, can someone recommend a software or a website that might be using such technology?
Audacity has two algorithms for separating sources.
The simple type can be seen in “Vocal Remover” (https://manual.audacityteam.org/man/vocal_remover.html). This works by inverting one channel and mixing with the other channel so as to cancel out sounds that are panned centre (common to both channels).
The other type can be seen in the “Vocal Reduction and Isolation” effect (https://manual.audacityteam.org/man/vocal_reduction_and_isolation.html). This uses FFT analysis to find sounds that are common to both channels, and can reduce them, or (partially) isolate them.
This came up a while back in this topic: https://forum.audacityteam.org/t/reduce-voice-not-eliminate/45504/1
The software referred to is available on GitHub: https://github.com/wslihgt/separateLeadStereo