I am developing a tool to remove the natural click noises that happen during speech and I am pleased with what can be done with the default settings.
Here’s a tip for before and after comparisons: duplicate the track, select one track, and apply the tool with “Isolate” as the action choice. Then listen to the tracks together for “after,” or the first track solo for “before.” Use shift-space to loop part of the track, and click the solo button on and off.
I thought it a good time to post by latest version of things. Many changes both to click detection, and also to the repair which is much better now. And some workarounds for a certain Nyquist crash! Yes, the code has ballooned a bit.
There are various option switches near the top of the code as setq’s that are not cluttering up the user interface. Various experiments there but I am undecided about their value.
Not sure it’s quite ready for the plug-ins board. I don’t know if my half educated method of band pass filtering with certain convolutions is really a better idea than the Chebyshev filters based in biquads that I do not understand so well and were also lately posted on the board, if those could compute faster. Time to dig into the mathematics. DeClicker.ny (53 KB)
I’ve not tested much but as long as the repaired sections occur during quiet sections it seems to work pretty well (which I guess will usually be the case for “mouth smacks”). In music it can make unpleasant “holes” in the sound, so I think that an implementation for music would require a different repair algorithm (but that’s outside of the scope of this plug-in).
For a “release” version this will need to be limited to a safe value, but that’s a minor issue. It’s coming along great and when finished I think it will be very popular with people making audio books.
I have had success removing some very obvious clicks that are not during silence but are superposed on the vowels. I can also remove the rattles that afflict some sibilant sounds, thus removing low frequency noise from high frequency signal. There are also lip closing sounds as in “sp” and “sm” which can be distracting things in the range of hundreds of Hz, and many of those are mitigated.
In case of music, maybe drumbeats would be wrongly repaired if block size is longer than the beat. Maybe a growling bass guitar would make many false positives too if the pitch is below the reciprocal of block length, a Hz value. Perhaps different setting could be found that work for different tracks.
As for the extreme value error, perhaps I should eliminate the choice of linear steps and specify the range by low frequency, step in semitones, and number of steps: no high value. (Some effects have connected sliders that are not independent but I don’t know that Nyquist supports that.) Then a lower bound on semitones and on low frequency. Because I believe the bad extreme case depends on both of those. The upper limit on number of bands to avoid the eror would depend on two other things, low and high frequency, not one.
Not yet, but we hope to have that in version 4 plug-ins. Don’t hold your breath though, it could be quite a long wait before we get version 4.
Could perhaps the “number of bands” control be replaced by a “frequency resolution” control (not necessarily those words) so that it is a scale from 0 to 100, minimum number of bands to maximum number of bands? You could then have the maximum (100 on the scale) translate into a safe maximum for the other dependent settings. Just an idea.
Most users don’t read the manual (sad fact of life ) so controls need to be as self explanatory as possible, even (especially) for technophobes.
Indeed. I already eliminated choices in repair method and used what I thought best – for each click, a composition of eq-band calls with constant parameters, not curves. Just let the crossfade make transitions. Variable parameter eq-band in Nyquist is suspect anyway, I understand, and highpass and lowpass don’t let me use all the information I gather about the noisiness of each band.
I could eliminate stepping by Hz and always go by semitones because that is the more sensible choice for generating the parameters for (logarithmically) equal width bands fixed by eq-band. Even if you are only using the tool for detection. One less mystifying control.
I hope the rest of the controls are self explanatory though many, and I thought I sequenced them according to what the user would most likely want to vary.
One of the advantages of making a tool that is for a specific job (in this case removing mouth smacks) there is often a lot of optimisation that can be done for that specific task. So, for example, if you find that one control always works best when set between 45% and 55% you can probably do without that control and hard code it to 50%.
One way that I’ve done this in some of my plug-ins is to comment out “advanced” controls like this:
I took a close look at this example. There are just seven anomalous samples making a very audible click. Is this a natural noise or some sort of tape noise? I suppose it’s the latter and my tool is just not well adapted to that. But the Repair effect handles it just fine.
I would like to make this tool more broadly useful.
That occurred to me as well. This is the “Paul L Mouth Smack Filter.” The chances of making it work with somebody else’s mouth geometry may be remote. You can certainly make sliders and adjustments so the tool is valuable to multiple different people, but where are you going to get the testers and what are you going to call the sliders? As far as I know, there is not a wide agreement on the name of each mouth noise (“plucking tongue snap,” “epiglottal vacuum release”), and the other option, what I call the “Geek Option,” describes in math what the tool does. Programmers love this one. “Nyquist Differential Velocity Derivation [higher/lower]?”
There was a room echo generator program available a while back and I called them on it. Adjusting the trig calculation functions in the program is not useful. I want a slider for “Room Size.”
If you do manage to resolve all that, you will be a hero to plucking tongue snappers everywhere.
Oh, none taken. Just discovering limitations of these methods.
What do you think caused the seven sample glitch in your example? A defect of old media, not a real but undesirable sound picked up by the microphone? Repair might help the first, then, and my tool the other. Neither replaces the other.
Here is a brief example of what I can fix, and I’m very pleased to fix it automatically. “Before” is in the left channel, “after” in the right. Zoom in about 0.44 seconds. There is an irritating high frequency click that got mostly filtered away, enough to be inaudible to my ears.
But this is about 240 samples, or 6ms, long, and it’s a “real world” oscillating and decaying sound riding on the vowel that I want to hear. (A bubble of spittle popping under the tongue or some such icky happening.)
It’s an “angry” looking, dense blue scribble if you zoom out to fit 1/10 second on the screen, but it’s actually a very long and low-amplitude defect, compared with Steve’s example of just seven bad samples.
Use shift-click to loop the selection and hear the difference.
If you look at spectrogram views of Steve’s example, which I don’t fix well, and my example, which I do, you can see a difference (and I suggest in Edit, Preferences, spectrograms you choose 256 and Blackman-Harris):
In both cases, the click appears as a bright, narrow, vertical stripe, but in my example, that stripe has a rather definite bottom at about 3kHz, below which the sound just seems to oscillate like the surrounding parts, so those frequencies are left mostly untouched by filtering.
But the anomaly in Steve’s recording is so sharp and narrow that the stripe has no bottom. (“Convolving with the delta function is an identity.” Cosines of all frequencies react to it. Hand wave, hand wave.) Zoom on low frequencies and you still see it. So my automatic detection and filtering tries to attenuate all frequencies and you get a drop-out.
I’m not doing exactly what spectrogram view does mathematically, but something like it, when detecting what to fix.
Now I wonder if the detection methods I use might be combined with the repair method of the Repair effect so that you could indeed have a tool that fixes glitches of this kind without zooming in and precise selection. That wouldn’t be suited to my purposes, but heck, it might even all be rolled up in one tool. Fix things that are “loud” in all frequencies at once with Repair, and things that are loud in only some frequencies with eq-band.
Except I don’t know what the method of Repair is, and it’s not in easily readable .ny code.