Improving a voice recording with heavy clipping

Alonshow · October 4, 2014, 5:28pm

I have a 30 minute long voice recording of a phone conversation with heavy distortion. One of the voices sounds OK, but the other is quite distorted throughout the whole recording. The voice with the distortion has very heavy clipping, and it looks like the clipping is what is causing the distortion. I know that it’s not possible to fix clipping, so I’m looking for a way to mask the distortion a bit and make the voice sound a bit cleaner.

The only advice I have found in the manual about fixing large amounts of clipping is this:

(From > Audacity Manual> )
where there is mild distortion throughout a recording, using Effect > Equalization to reduce the higher frequencies can help to mitigate the damage. Sometimes a bass cut will help also by making the result sound less “muddy”.

My clipping is way beyond “mild”, but I still gave it a shot: I didn’t find any predefined equalization curves to fix clipping, so I fiddled a bit with the equalizer to create my own. It didn’t work. My equalization barely improved the sound quality, if anything at all. Regarding the “bass cut”, I guessed it’s the same as a high pass filter. I tried a few high pass filters and none seemed to help.

Should I keep trying with the equalization? If the equalization isn’t the way to go, will anything else help?

Thanks.

Trebor · October 4, 2014, 6:22pm

Maybe try this method to repair clipping … Help needed cleaning screeches/peaks from voice recording - #4 by Trebor
[ You’ll need to install the “median-filter” plugin into Audacity ].

Robert_J_H · October 4, 2014, 9:04pm

Nice that my median filter has some application after all.

Unfortunately, the effect is very slow (for high window sizes)
Here’s a (very fast) 3-point version for the Nyquist prompt with the slope integrated (Trebors code produced too much level)

;; median out of 3
(defun median (sig)
  (multichan-expand #'(lambda (v0)
     (let* ((lsr (snd-srate v0))
            (rsr (/ lsr)) 
            (dur (1- (snd-length v0  ny:all)))
            (first (sref v0 0))
            (last (snd-sref v0 (* rsr dur)))
            (v+1 (seq (snd-from-array 0 lsr (vector first))
                      (extract-abs 0 (* dur rsr) v0))) 
            (v-1 (seq (extract-abs rsr (* rsr (1+ dur)) v0)
                      (snd-from-array 0 lsr (vector last)))))
(sim v-1 v0 v+1 (mult -1 (s-max  v+1 (s-max v-1 v0))) 
(mult -1 (s-min v+1 (s-min v-1 v0)))))) sig))
(multichan-expand 'median (mult (/ *sound-srate*) (slope s)))

However, there may some clicks remain after the code execution, a 5-point median is probably better.
In this case, apply only the following in the Nyquist prompt and call the median filter afterwards:

(multichan-expand 'mult (/ *sound-srate*) (slope s))

Also, you can improve the result by applying noise removal–median filtering turns some clicks into white noise.

Robert_J_H · October 4, 2014, 9:19pm

Here’s a second version of the above code with two-fold median:

;; median out of 3
(defun median (sig)
  (multichan-expand #'(lambda (v0)
     (let* ((lsr (snd-srate v0))
            (rsr (/ lsr)) 
            (dur (1- (snd-length v0  ny:all)))
            (first (sref v0 0))
            (last (snd-sref v0 (* rsr dur)))
            (v+1 (seq (snd-from-array 0 lsr (vector first))
                      (extract-abs 0 (* dur rsr) v0))) 
            (v-1 (seq (extract-abs rsr (* rsr (1+ dur)) v0)
                      (snd-from-array 0 lsr (vector last)))))
(sim v-1 v0 v+1 (mult -1 (s-max  v+1 (s-max v-1 v0))) 
(mult -1 (s-min v+1 (s-min v-1 v0)))))) sig))
(multichan-expand 'highpass8 (median (integrate (median (slope s)))) 50)

This removes also “normal” clicks (e.g. 0 0 1 0 0) and not only long clipping (e.g. 0 1 1 1 1 0).

Trebor · October 5, 2014, 2:49am

I see you’ve added an integrate step , (which I omitted) , which restores the original equalization.
( without integration the bass is missing ).

Your “second version” code is very effective on the 16kHz sample-rate : it removes most of the harsh clipping noise, just leaving an occasional big click/pop, (example attached) , but its effectiveness is dependent on the sample-rate.

So users should resample their heavily-clipped audio to 16kHz when using that code.

kozikowski · October 5, 2014, 3:39am

What’s the bumper-sticker explanation of that? Somehow you have to do something to the clipped areas to make them less objectionable. What does this code do? Or is the real problem finding them all?

I know clipping distortion always goes up in pitch and low pass equalization can help by simple rounding off all the flat areas, but that also muffles the dialog.

Koz

Robert_J_H · October 5, 2014, 9:24am

You have to ask Trebor, I’m only his tool.

The differentiator (slope) is actually a pre-emphasis filter, it translates a block of equal samples into two values only:

0 1 1 1 0 --> 0 1 0 0 -1

In other words, square waves give single impulses while sine waves become cosine waves, triangle waves → square waves and so on.
The median filter removes extraneous values by always selecting the middle one.
For the sequence above, this gives all zeros because a triple consists here always of at least two zeros, i.e.

0 -1 0 --> 0

.
Wave forms that do not resemble a square wave are less affected.

0 1 2 3 2 1 0 --> (median filter) 0 1 2 2 2 1 0

The integration brings the bass back. The highpass filter removes DC that is heavily created by modifying the slope values.
Actually, this method works seldom and is rather a last-hope-decision because it is apparently harmful to the consonants.
We can’t tell if it works in this particular case, where we do not have a sample.

Alonshow · October 5, 2014, 12:56pm

Thank you for all the answers!

I tried to do that, but it looks like I’m doing something wrong. I didn’t even manage to follow the steps. It might be worth pointing out that I barely have any knowledge of audio edition. I have been using Audacity for years, but only for very simple tasks and mainly using a try-error method.

I attach a few files that might help pointing out what I’m doing wrong:

These are the steps I followed based on the method suggested:

Nyquist prompt → Nyquist command: (multichan-expand #'slope s) → OK.
Median filter → Leave default values of 5 and 50 → OK.
Amplify: Nothing to do here. The maximum amplitude is already 0 dB. Actually, I’m surprised about what happens in the method example because in this step you apply a -50 amplification and it results in a new amplitude of -28.1, meaning that before the amplification the amplitude was +21.9. I didn’t know that Audacity could manage amplitudes above 0 dB.

For whatever it might be worth, my very rough guess is that somehow I need to allow Audacity to handle amplitudes above 0 dB.

Trebor · October 5, 2014, 2:49pm

Median filter isn’t going to help “Original.mp3”, the median filter taken over 3-sample points only gets rid of high-frequency crackle, your “Original.mp3” sample is all below 4kHz so doesn’t have any high frequency.

Your example needs heavy-duty noise-reduction.

There is a free plugin [Windows&Mac] called DtBlkFx which has a “contrast” effect , which is effectively dynamic noise reduction, a before-after example attached with contrast set at at 20%, 30%,40% & 50% …

The higher the contrast the more computery (synthesised) it sounds.
DtBlkFx Plugin in Audacity#.gif

kozikowski · October 5, 2014, 3:18pm

Audacity already handles sound values over 0 dB. Audacity works internally at 32-bit floating which allows high volumes without damage. Use one of the volume tools such as Effect > Amplify to bring the level back down.

If you Export the show with high volume, then yes, there will be significant damage and you’re stuck with it.

Koz

kozikowski · October 5, 2014, 3:25pm

Bumper sticker version

So the original has very little energy over about 4KHz and and we are talking about a performance similar to an airplane pilot’s voice. Most of the intelligence is conveyed through psycho-acoustic recognition rather than plain good quality. Also see: bad cellphone call with pieces missing and somehow a message manages to squeak through.

There is no rescue. All we can do is turn the damage into other damage.

Koz

Trebor · October 5, 2014, 3:50pm

Here’s the visual equivalent of before-after median filter …

Robert_J_H · October 5, 2014, 5:22pm

I’ve come up with something similar to Trebor’s plug-in.
It uses linear prediction coding instead of de-whitening in the fourier domain ( ;O, what’s this guy talking about?).
In other words, all streets lead to Rome.

The original is actually a bit more clipped. I’ve exported it to wave with a gain of 22 dB (!).
The filter has been applied after re-importing.

The code is:

(defun pre-emphase (s) (snd-biquad s 1 -0.96 0 0 0 0 0))
(defun de-emphase (s) (snd-biquad s 1 0 0 0.96 0 0 0))
;; voice model extraction
(setf lpanal-class (send class :new '(sound framesize skipsize npoles)))
(send lpanal-class :answer :isnew '(snd frame-sz skip-sz np) '(
  (setf sound (snd-copy snd))
  (setf framesize frame-sz)
  (setf skipsize skip-sz)
  (setf npoles np)))
(send lpanal-class :answer :next '() '(
  (let ((samps (snd-fetch-array sound framesize skipsize)))
    (cond ((null samps) nil)
          (t 
           (snd-lpanal samps npoles))))))
(defun make-lpanal-iterator (sound framedur skiptime npoles)
  (send lpanal-class :new sound 
     framedur skiptime npoles))
(psetq blocksize 200; in samples
       advance-by 50; in samples
       order 10); modify at will
;; analyze:
(setf obj (make-lpanal-iterator (pre-emphase s) 
   blocksize advance-by order))
;; apply filter to original:
(de-emphase (scale-db -35 
   (snd-lpreson s obj (/ advance-by  *sound-srate*))))

The values after “psetq” can all be changed.
(note that the code is only for mono files)

kozikowski · October 5, 2014, 6:29pm

Here’s the visual equivalent of before-after median filter …

I agree you can do wonders with pictures whose information doesn’t change that much pixel by pixel, but a greater similarity to this problem is motion video.

People have made “de-noisers” that trade off motion for the idea of averaging (or processing) successive frames whose information doesn’t change that much. The result is gooie, swimmy, slippery motion that’s completely noise free, or gritty, noisy perfect motion.

Free lunch being just out of grasp. The product never made the big time.

So we’re trying to identify the damaged portion of the wave and reconstruct the missing bits by guessing what would have been there had the wave been allowed to continue on its journey? Given massive clipping, I’m not shocked that he reconstructed wave is 20dB higher than anyone is expecting. Also given massive clipping, the slope before and after the damage is less and less representative of the original voice. Also given this was an admittedly “seat of the pants” recording, the system/microphone analog noise will make this an interesting academic exercise with little or no practical value.

Koz

Trebor · October 5, 2014, 9:08pm

Yes but sometimes less psycho-acoustically offensive “other damage” is possible.

Trebor · October 5, 2014, 9:19pm

“Darrel Tam” created that plugin, not me, … www.kvraudio.com/product/dtblkfx_by_darrell_tam

What you’ve achieved with that code sounds (and looks) just like increasing-contrast using the “DtBlkFx” plugin (which is Windows & Mac only). It’s effectively like increasing contrast on the spectrogram.

[ NB: that “contrast” code can’t cope with silence , if the audio you select includes proper flat-line silence the effect won’t work and Audacity may crash ]

IMO setting blocksize to 1000 (rather than 200) with a sample-rate of 16kHz, ( maybe just me though ).

before-after spectrogram showing increased ''contrast''achived using Robert-J-H code.gif

Robert_J_H · October 5, 2014, 10:01pm

(you mean my code?
Yes, it is a known bug, lpanal returns NAN values (division by zero) for all coefficients.

I forgot to mention that mixing in white noise helps a lot (amp. 0.001).
One can actually use white noise alone instead of the original.
A clean voice will sound whispered because the excitation is missing, whereas filtering a buzz, saw-tooth and alike sound supresses unvoiced parts–the famous robot effect.
The poles of the LPC (called “order” in the above code) determine the accuracy of the voice synthesis.
For 4 formants, the order should be twice plus 2 → 10 poles. “S” and “Z” may require 300 poles to exactly reproduce them.

Alonshow · October 5, 2014, 10:39pm

Thank you, Trebor, that improved the quality a bit. I was wondering though, if the de-clicking and de-crackling that you mention in the other median plugin thread would be useful in this case.

Trebor · October 6, 2014, 3:54am

Your “original.mp3” isn’t clicking or crackling, so de-clicking and de-crackling software aren’t going to help.
Even the paid-for de-noise programs I have don’t make much difference to your “original.mp3”.
The Nyquist code Robert-J-H came up with above, or the DtBlkFx plugin, is about the best you’re going to get ,
less-noisy at the expense of sounding more-synthesised.
[ The audio recovery seen in Hollywood movies is science-fiction ].

Cutting off all the frequencies below 200Hz with the equalizer makes it sound a little less harsh.

If you compare it with the codecs used in radio-telephony , like LPC10 and Speex, the computery-ness isn’t that bad.