Mathematics of the "Click removal" effect

enrico94 · January 12, 2024, 7:38pm

The click removal effect (on the Effects drop-down menu Audacity 3.4.2) has a dialogue box with 2 parameters:
Threshold : possible values between 0 and 900
Max spike width: possible values between 0 and 40.

Now, I’m a physicist, and was always told never to use numbers without units (assuming there is a unit for what you’re trying to describe). At the moment I can’t work out what the units of “Threshold” and “Max spike width” might be.

Does anyone know?

Do we know who developed the click removal algorithm?

Max spike width could be… milliseconds?

I’m not just asking this to be awkward; at the moment, using the click removal effect is completely trial and error. You start with the default values then, if those don’t work, you start cranking those two sliders up and down, blindly, hoping you’ll hit upon a magic combination that will remove one or more clicks from audio.

I can zoom in on an offending click (e.g. from a small scratch in a vinyl record I’m ripping) and get precise duration of the click, its height in absolute dB - and I could even calculate dB of the click relative to the surrounding non-click audio if that was useful. But without knowing what those numbers mean, it’s a fruitless task doing that.

Any thoughts? 0 to 900 and 0 to 40 seem such weird scales!

Thanks

DVDdoug · January 12, 2024, 9:00pm

I don’t know, but they are probably parameters in an algorithm and they may not have traditional units. They may affect more than one variable.

The width could be the sample count but I doubt it because then the time-width would vary with sample rate. Maybe it is milliseconds???

The threshold is NOT the amplitude because the audio data is in floating-point where the values are normally between zero and +/- 1.

…Audacity is open source if you know any programming and you want to dig-into the code!

steve · January 13, 2024, 3:46am

From the source code:

/**********************************************************************

  Audacity: A Digital Audio Editor

  ClickRemoval.cpp

  Craig DeForest

*******************************************************************//**

\class EffectClickRemoval
\brief An Effect for removing clicks.

  Clicks are identified as small regions of high amplitude compared
  to the surrounding chunk of sound.  Anything sufficiently tall compared
  to a large (2048 sample) window around it, and sufficiently narrow,
  is considered to be a click.

  The structure was largely stolen from Domonic Mazzoni's NoiseRemoval
  module, and reworked for the NEW effect.

  This file is intended to become part of Audacity.  You may modify
  and/or distribute it under the same terms as Audacity itself.

*//*******************************************************************/

That effect is extremely old (> 15 years), and is a very simple effect. Removal of “spikes” is done through simple linear interpolation, which actually works quite well for small spikes. One serious limitation is that it will not detect a “spike” like this, because the spike is not “above” the level of the surrounding waveform:

Screenshot_2024-01-13_03-46-12

enrico94 · January 13, 2024, 9:18am

Thanks @DVDdoug and @steve for the pointers to the code. I’d forgotten that Audacity is open-source, and we can look at the algorithm ourselves.

My main question is around how the current effect detects clicks, rather than how well it repairs them. Vinyl clicks have a distinctive time profile at 33rpm and 45rpm, so it’s crying out for an algorithm that can look for sections of waveform that have that distinctive profile (or frequency spectrum - see below).

There was a very old thread on the Audacity forum (referred to Audacity v1.x.xx) which implied the algorithm used a fast Fourier transform (FFT) to do the analysis - which would tie in with the “spectrogram” method recommended in the tutorial in Audacity Help. Clicks show up as high energy features in the frequency spectrum and, presumably, that would enable the linear interpolation that Steve refers to to home in on the offending sections of audio.

I don’t speak C++ but do some coding in other languages, so I’ll have a look in the click removal code. Not planning on altering anything, but more aimed at understanding what it’s doing - and what those mystery numbers mean.

steve · January 13, 2024, 10:57am

That must have been about a different effect.
Click removal calculates the mean square for a large window (ms_seq), and the mean square for a small “is it a click” window (msw). If the ms within a short period (msw) is greater than mThresholdLevel * ms_seq[i]/10), then msw is considered a “click”.
When a click is detected, linear interpolation is applied between the start and end of the msm region.

enrico94 · January 13, 2024, 12:17pm

Ah cool, that makes sense, thank you.

I’ll dig around inside the current function - but those statements you’ve copied inline make sense. Quite a nice little algorithm although, as you pointed out earlier, it won’t pick up a negative-going click.

All I need to know now are how the numbers translate into the operation of that large window mean and small window mean. It would be quite ironic if “trial and error” still ended up the most effective way to use the effect.

steve · January 13, 2024, 12:29pm

I have occasionally used the effect to reduce crackles in spaces between tracks in a vinyl recording. Generally what I’ve found is that if the default settings do not work well enough, then fiddling with the setting is very unlikely to make a significant improvement. In other words, the default settings are pretty much optimal for the algorithm.

enrico94 · January 13, 2024, 6:29pm

OK. I think “max spike width” is in samples (although not 100% since ClickRemoval.cpp code calls a function TimeToLongSamples( ) to convert the time range selected by the user into the counts for the for…loop, and I haven’t managed to find what TimeToLongSamples does yet) .

“Threshold” looks like it’s just a multiplier, so (in the line @Steve refers to above), the criterion is:

if(msw >= mThresholdLevel * ms_seq[i]/10)

where ms refers to mean square values for the small window and surrounding audio respectively for msw and ms_seq (I think - again I’m not 100% sure as the code isn’t particularly well commented). The ‘divide by 10’ in the statement effectively changes the range of mThresholdLevel (the number chosen by the user in the dialogue box) into going from 0 to 90 instead of 0 to 900, but it should be independent of the absolute level of the audio.

0 to 900 presumably gives more fine scale control of the effect, since the GUI slider only selects integers and 0 to 90 would be quite a coarse scale.

In use, I’d support @Steve’s comment that the defaults are pretty much optimized. However, for some particularly quiet vinyl ‘ticks’ in sections of music, increasing the “threshold” did force the algorithm to interpolate some clicky sections of waveform that weren’t processed with “threshold” set to the default 200.

I’ll park the FFT idea for now, then, but playing around in Matlab/Octave to see if using a frequency spectrum is any better than what we have now in ClickRemoval.

steve · January 15, 2024, 1:24pm

It converts time (seconds) into a whole number of samples. Because Audacity was originally a 32-bit app, and recordings may be very long (too long for 32-bit integer counting), Audacity uses a custom 64-bit count for the number of samples called “sampleCount”, which you can think of as a 64-bit integer.

The function definition for TimeToLongSamples is here: audacity/libraries/lib-mixer/WideSampleSequence.cpp at master · audacity/audacity · GitHub

Most calculations involving time are based on sample counts rather than seconds so as to avoid rounding errors.

system · February 11, 2024, 7:39pm

This topic was automatically closed after 30 days. New replies are no longer allowed.