Automatic removal of mouth smacks

Does anyone know of a plugin that does a good job of removing mouth smacks from recorded voice?

If not I was wondering how I might write such a thing.

Problems would be first identifying clicks, which are typically 5ms or less, and then repairing them. I have some untested ideas for both.

Deletion of clicks is something I do by hand but would be hard to program. It might be less useful when synchronization with other tracks is important. Would lowpass or band stop applied to very short regions work well enough?

In general, if you didn’t start out being a presenter/announcer, it’s rough to get there with filters, effects and editorial. As you found out, you can’t just slice out the offending sounds, you have to put something back in order to maintain the length of the performance. Whatever you put back has to be not worse than the original problem.

There was another presenter a while ago with this problem. I think he just threw in the towel and put up with the clicks.

There is a magic trick you can try.

Save your clicks in a file and use that as the Noise Removal Profile step. Be absolutely certain not to include any valuable voice in this clip. Just click after click. It’s remotely possible that if you play your cards right and the moons and starts line up, most of the worst of the staccato clicks can be made to soften with the second application of Noise Reduction. It’s also possible that it will destroy the show. It’s impossible to tell.

If you’re not familiar with Noise Removal, it works in two steps. Select some of the offending noise and use that as the profile step. In essence, letting the tool “taste” the problem. Then run the tool again and apply it to the show. It will try to remove the profile sound from the show.


That does not sound very hopeful.

I edit narration and synchronization with other tracks is not a concern, so I often just slice clicks out. I can get very clean results but it is tedious. If you slice a bit of a vowel out, you must zoom in and take out exactly one cycle of the vowel waveform or else you get a “bump” where there was once a click. If I want to automate that procedure, I suppose I need the Nyquist yin function to guess the fundamental.

Edit > Find Zero Crossings.

Noise Reduction may work for you and it’s already written.

The Nyquist elves will be along.


It takes more than Find Zero crossings to cut out a cycle of a vowel. A vowel waveform makes many zero crossings.

I tried noise reduction a few times a while ago and came to mistrust it. I don’t really understand how it works. It changed the timbre of leftover sounds. Now perhaps there was too much noise in my recording environment which I have improved since then.

But using it to remove clicks? Have you tried that with success?

The typical click is 5 ms or less. They are often easily seen in spectrogram view as a bright and narrow vertical stripe. If I were to attempt the detection problem, it would go something like this:

Convolve with a single cycle of a sine wave at a certain frequency. Apply snd-avg with OP-PEAK and a window of 10 ms and a skip of 5 ms. Identify windows in which the amplitude is more than a certain percentage of the amplitudes in the neighboring, nonoverlapping windows. The window triples I examine would be staggered by a half window, so I would not miss a click near a window boundary. It would also be centered in another window.

Repeat for several frequencies.

snd-fft essentially computes several convolutions at once, but it might be faster just to convolve a few frequencies, say 500Hz to 8kHz stepping by 500.

Am I making sense?

Am I making sense?

Perhaps to the Nyquist elves who aren’t here yet. I believe it’s 06:00 their time.

Have you thought of trying Brian Davies’ ClickRepair software for the “mouth snacks”, although it’s primarily designed to deal with the pops and clicks on LPs but from the way you describe how they look in spectrogram view then there’s a reasonable chance that CR might work for you here.

I’s not free software ($40) but Brian let’s you have a 14-day free-trial so you try it out first to see if it works.

I would recommend setting the reverse processing mode and I soften the de-click setting from 50 to 30 - and set the "pitch protection option.



I think that identifying the clicks would be the hardest part. Can your phoneme analyser help with that? Speech segmentation (not recognition!)

There are several easy techniques for reducing mouth smack (and other) clicks.

One approach is to use a low pass filter, but in order to avoid creating clicks at the start/end of the repair it is necessary to have a short transition period between the original and the repaired section. It is not sufficient to just set the start and end at zero crossing points.
Commented example:

; sweep filter frequency range in Hz
(setq nyqf (/ *sound-srate* 2)) ; Nyquist frequency
(setq lowf 2000) ; lowest frequency of the the sweep
(setq iter 4)

; one cycle of a sine wave the same duration as the selection.
(setf sine (osc (hz-to-step (/ (get-duration 1)))
                1 *sine-table* 90))
; make amplitude 0 to 1
(setf sine (mult 0.5 (sum 1 sine)))
; make amplitude lowf to nyqf
(setf frq-env (sum lowf (mult sine (- nyqf lowf))))

;; Envelopes to make very short crossfade.
(control-srate-abs *sound-srate*
    (setf filt-env (pwlv 0 0.05 1 0.95 1 1 0))
    (setf fade-env (pwlv 1 0.05 0 0.95 0 1 1))))

(setf s1 s)

;; Filter s1
(dotimes (i iter)
  (setf s1 (lp s1 frq-env)))

;; Crossfade
(sum (mult s fade-env)
     (mult s1 filt-env))

Another approach, specifically for clicks during “silence” is to patch over the click with a bit of “room tone”, which we discussed in these topics:

Well you sent me off on a little experimental journey there Paul …

I’ve habitually used Noise Removal (default settings) on FM broadcasts to remove the FM hiss. So following your comments I took a capture I made from BBC R4 on Saturday an listened carefully on my studio Sennheiser cans. Obviously wthout NR I still get the FM hiss - but you’re right the NR does seem to change the sound - you can’t hear it so much when the vocalist (with a big voice - Mary Coughlan) sings, but on the quiet electric piano intro you can really hear some strange tinkly artefacts. The question is, which do I prefer - D’OH !!! :confused: :unamused:

P.S. Both recordings uncompressed audio of course :sunglasses:


The default settings in Noise Removal are a bit too aggressive for my liking, but many users require the settings to be quite aggressive so as to have some impact on bad recordings. Setting the “Noise Reduction (dB)” to about -12 will still have an appreciable effect on low level noise, but will produce much lower tinkly artefacts than the default settings. I generally like to increase the “Frequency Smoothing” a bit too - typically around 500 Hz, but for critical work it’s best to experiment for optimum settings.

The “Sensitivity” control is interesting, but a bit tricky to use effectively. Usually I leave this at the default 0.0 dB.
Some sounds “mask” noise better than others. Sounds that have a significant amount of “breathyness” will cover up (mask) low level hiss much better than say low piano notes.
When the noise is not masked, it will be audible “through” the retained sound. This can be countered by increasing the Sensitivity (dB) control, but note that increasing the sensitivity makes the effect more aggressive so it is also necessary to reduce the “Noise reduction (dB)” setting and possibly also increase the “Frequency Smoothing” and reduce the Attack/decay to zero. For this type of situation I may increase the sensitivity up to about +8 dB and reduce the reduction to around 8 dB.
Reducing the sensitivity control can help to make the effect less aggressive, though personally I find that reducing the “Noise reduction (dB)” control does this better.

Ahhh useful feedback there Steve :ugeek: - thanks for that - and this is me writing that down - as Koz says :slight_smile:

I’ll do some experimenting with my next FM capture…


I do not understand steve’s code example. lp is iterated with a time varying cutoff frequency. How would that remove clicks anywhere in the selection? Are you supposing a very short selection in which a click is identified already?

The room tone patch is very useful and can be used without precise selections. But I must also deal with short bursts of noise superposed on the vowels.

Also, say “chimney” and you may hear a funny pop between 1 and 2 kHz between m and n.

Also record enough esses and you may hear LOW pitched clicks in some of them that show in spectrogram as red stripes below the cloud of noise, in the range of hundreds of Hz.

Meticulous editing removes these distractions. I wonder if I might at least solve the detection problem well enough that I need not listen for them but can let the program label them.

What they have in common is a rapid increase and decrease over a few ms in some frequency band, which is not periodic with the fundamental of the voice which would usually be in low hundreds of Hz. This led me to the procedure I suggested.


I still do not understand how lp does the work. Is frq-env varying the cutoff frequency during the short selection? Why is that useful?

I have reread the manual for NR and the first thing it says is, Don’t use this for click removal.

I have read the wiki about the algorithm, and the explanation why tinkle bells are unavoidable whatever the variant of the algorithm.

I wonder if the deessing problem can be treated as a sort of dual. Attenuate frequencies matching the profile of a harsh s, if exceeding a threshold, not if below it.

As for my experiments with identifying phoneme boundaries… The various voodoo I tried sometimes gets fooled by a click into making a boundary. But perhaps that is a “feature.” I have not yer done the work of a more complicated division of speech into overlapping regions, with transitions. Clicks might be labelled as transitions but not all transitions would be clicks.

If I can label a vowel or other voiced sound, I might use yin to find a fundamental and so identify cycles, and either excise a cycle or replace it with an averaging of neighboring cycles. The region would serve to identify a region in which yin would deliver a meaningful answer.

Just speculating here. My phone finding approach might be too elaborate but has given me so far a somewhat hseful deesser that identifies whistling frequencies separately foreach sibilant, just as yin might find different fundamentals once I know regions where that is somewhat constant.

However the click identification before fixing might not need any of that. Localized application of filters might be a good dumb approach that doesn’t need awareness of fundamentals.

Lets say that you’ve got the detection analysis working perfectly and it returns a list of “click” times as number pairs - the first number is the time of the centre of the click and the second number is an approximate “width” of the click. Some experimentation would be required to find the best length for each repair, but for simplicity let’s say that we have determined that 2 x click width works well.

Then you could create a single frequency envelope for the entire track, something like this:

; click list as time/width pairs
(setf clicklist (list 1 0.1 3.5 0.2 5.7 0.15))

; sweep filter frequency range in Hz
(setq nyqf (/ *sound-srate* 2)) ; Nyquist frequency
(setq lowf 2000) ; lowest frequency of the the sweep
(setq iter 4)

(setf frqenv (s-rest 0))

(do ((i 0 (+ i 2)))
    ((> i (1+ (/ (length clicklist) 2)))
     (sum nyqf (mult (- nyqf lowf) frqenv)))
  (let ((t (- (nth i clicklist)(nth (1+ i) clicklist)))
        (dur (* (nth (1+ i) clicklist) 2.0)))
    (setf blip 
        (mult 0.5
          (sum -1
            (osc (hz-to-step (/ dur)) dur *table* 90)))))
    (setf frqenv
      (sim frqenv
           (at-abs t (cue blip))))))

@Steve: might I be better, for removal of FM hiss, to use a notch filter?


But I still don’t understand steve’s treatment ofa sigle click. A lowpass filter, with the frequency parameter itself varying sinusoidally from 2 kHz to Nyquist over the little interval of a few ms – what??