I have a 30 minute long voice recording of a phone conversation with heavy distortion. One of the voices sounds OK, but the other is quite distorted throughout the whole recording. The voice with the distortion has very heavy clipping, and it looks like the clipping is what is causing the distortion. I know that it’s not possible to fix clipping, so I’m looking for a way to mask the distortion a bit and make the voice sound a bit cleaner.
The only advice I have found in the manual about fixing large amounts of clipping is this:
(From > Audacity Manual> )
where there is mild distortion throughout a recording, using Effect > Equalization to reduce the higher frequencies can help to mitigate the damage. Sometimes a bass cut will help also by making the result sound less “muddy”.
My clipping is way beyond “mild”, but I still gave it a shot: I didn’t find any predefined equalization curves to fix clipping, so I fiddled a bit with the equalizer to create my own. It didn’t work. My equalization barely improved the sound quality, if anything at all. Regarding the “bass cut”, I guessed it’s the same as a high pass filter. I tried a few high pass filters and none seemed to help.
Should I keep trying with the equalization? If the equalization isn’t the way to go, will anything else help?
I see you’ve added an integrate step , (which I omitted) , which restores the original equalization.
( without integration the bass is missing ).
Your “second version” code is very effective on the 16kHz sample-rate : it removes most of the harsh clipping noise, just leaving an occasional big click/pop, (example attached) , but its effectiveness is dependent on the sample-rate.
So users should resample their heavily-clipped audio to 16kHz when using that code.
The differentiator (slope) is actually a pre-emphasis filter, it translates a block of equal samples into two values only:
0 1 1 1 0 --> 0 1 0 0 -1
In other words, square waves give single impulses while sine waves become cosine waves, triangle waves → square waves and so on.
The median filter removes extraneous values by always selecting the middle one.
For the sequence above, this gives all zeros because a triple consists here always of at least two zeros, i.e.
0 -1 0 --> 0
Wave forms that do not resemble a square wave are less affected.
0 1 2 3 2 1 0 --> (median filter) 0 1 2 2 2 1 0
The integration brings the bass back. The highpass filter removes DC that is heavily created by modifying the slope values.
Actually, this method works seldom and is rather a last-hope-decision because it is apparently harmful to the consonants.
We can’t tell if it works in this particular case, where we do not have a sample.
I tried to do that, but it looks like I’m doing something wrong. I didn’t even manage to follow the steps. It might be worth pointing out that I barely have any knowledge of audio edition. I have been using Audacity for years, but only for very simple tasks and mainly using a try-error method.
I attach a few files that might help pointing out what I’m doing wrong:
These are the steps I followed based on the method suggested:
Nyquist prompt → Nyquist command: (multichan-expand #'slope s) → OK.
Median filter → Leave default values of 5 and 50 → OK.
Amplify: Nothing to do here. The maximum amplitude is already 0 dB. Actually, I’m surprised about what happens in the method example because in this step you apply a -50 amplification and it results in a new amplitude of -28.1, meaning that before the amplification the amplitude was +21.9. I didn’t know that Audacity could manage amplitudes above 0 dB.
For whatever it might be worth, my very rough guess is that somehow I need to allow Audacity to handle amplitudes above 0 dB.
Median filter isn’t going to help “Original.mp3”, the median filter taken over 3-sample points only gets rid of high-frequency crackle, your “Original.mp3” sample is all below 4kHz so doesn’t have any high frequency.
Audacity already handles sound values over 0 dB. Audacity works internally at 32-bit floating which allows high volumes without damage. Use one of the volume tools such as Effect > Amplify to bring the level back down.
If you Export the show with high volume, then yes, there will be significant damage and you’re stuck with it.
So the original has very little energy over about 4KHz and and we are talking about a performance similar to an airplane pilot’s voice. Most of the intelligence is conveyed through psycho-acoustic recognition rather than plain good quality. Also see: bad cellphone call with pieces missing and somehow a message manages to squeak through.
There is no rescue. All we can do is turn the damage into other damage.
I’ve come up with something similar to Trebor’s plug-in.
It uses linear prediction coding instead of de-whitening in the fourier domain ( ;O, what’s this guy talking about?).
In other words, all streets lead to Rome.
The original is actually a bit more clipped. I’ve exported it to wave with a gain of 22 dB (!).
The filter has been applied after re-importing.
The code is:
(defun pre-emphase (s) (snd-biquad s 1 -0.96 0 0 0 0 0))
(defun de-emphase (s) (snd-biquad s 1 0 0 0.96 0 0 0))
;; voice model extraction
(setf lpanal-class (send class :new '(sound framesize skipsize npoles)))
(send lpanal-class :answer :isnew '(snd frame-sz skip-sz np) '(
(setf sound (snd-copy snd))
(setf framesize frame-sz)
(setf skipsize skip-sz)
(setf npoles np)))
(send lpanal-class :answer :next '() '(
(let ((samps (snd-fetch-array sound framesize skipsize)))
(cond ((null samps) nil)
(snd-lpanal samps npoles))))))
(defun make-lpanal-iterator (sound framedur skiptime npoles)
(send lpanal-class :new sound
framedur skiptime npoles))
(psetq blocksize 200; in samples
advance-by 50; in samples
order 10); modify at will
(setf obj (make-lpanal-iterator (pre-emphase s)
blocksize advance-by order))
;; apply filter to original:
(de-emphase (scale-db -35
(snd-lpreson s obj (/ advance-by *sound-srate*))))
The values after “psetq” can all be changed.
(note that the code is only for mono files)
Here’s the visual equivalent of before-after median filter …
I agree you can do wonders with pictures whose information doesn’t change that much pixel by pixel, but a greater similarity to this problem is motion video.
People have made “de-noisers” that trade off motion for the idea of averaging (or processing) successive frames whose information doesn’t change that much. The result is gooie, swimmy, slippery motion that’s completely noise free, or gritty, noisy perfect motion.
Free lunch being just out of grasp. The product never made the big time.
So we’re trying to identify the damaged portion of the wave and reconstruct the missing bits by guessing what would have been there had the wave been allowed to continue on its journey? Given massive clipping, I’m not shocked that he reconstructed wave is 20dB higher than anyone is expecting. Also given massive clipping, the slope before and after the damage is less and less representative of the original voice. Also given this was an admittedly “seat of the pants” recording, the system/microphone analog noise will make this an interesting academic exercise with little or no practical value.
What you’ve achieved with that code sounds (and looks) just like increasing-contrast using the “DtBlkFx” plugin (which is Windows & Mac only). It’s effectively like increasing contrast on the spectrogram.
[ NB: that “contrast” code can’t cope with silence , if the audio you select includes proper flat-line silence the effect won’t work and Audacity may crash ]
IMO setting blocksize to 1000 (rather than 200) with a sample-rate of 16kHz, ( maybe just me though ).
(you mean my code?
Yes, it is a known bug, lpanal returns NAN values (division by zero) for all coefficients.
I forgot to mention that mixing in white noise helps a lot (amp. 0.001).
One can actually use white noise alone instead of the original.
A clean voice will sound whispered because the excitation is missing, whereas filtering a buzz, saw-tooth and alike sound supresses unvoiced parts–the famous robot effect.
The poles of the LPC (called “order” in the above code) determine the accuracy of the voice synthesis.
For 4 formants, the order should be twice plus 2 → 10 poles. “S” and “Z” may require 300 poles to exactly reproduce them.
Your “original.mp3” isn’t clicking or crackling, so de-clicking and de-crackling software aren’t going to help.
Even the paid-for de-noise programs I have don’t make much difference to your “original.mp3”.
The Nyquist code Robert-J-H came up with above, or the DtBlkFx plugin, is about the best you’re going to get ,
less-noisy at the expense of sounding more-synthesised.
[ The audio recovery seen in Hollywood movies is science-fiction ].
Cutting off all the frequencies below 200Hz with the equalizer makes it sound a little less harsh.
If you compare it with the codecs used in radio-telephony , like LPC10 and Speex, the computery-ness isn’t that bad.