## A Nyquist de-esser, and more?

Using Nyquist scripts in Audacity.

If you require help using Audacity, please post on the forum board relevant to your operating system:
Windows
Mac OS X
GNU/Linux and Unix-like

### A Nyquist de-esser, and more?

Hello, I posted an earlier experiment here and I see there were many downloads but little discussion.
viewtopic.php?f=42&t=72160

I have been working off and on at this. I am a narrator. My intention is to identify boundaries between speech segments in vocal tracks, then classify speech segments into broad classes like vowels, pauses, stops, sibilants, then apply effects selectively to segments.

A larger ambition might be a one pass combination speech cleanup tool that could also eliminate many mouth crackles without muddy results elsewhere, by selective low-pass filtering of regions. That is more important, I suppose, for unaccompanied voice than for all vocal tracks. But meanwhile I think I have accomplished a passable de-esser.

Wheel reinvention perhaps, and I have little notion how other such software proceeds, but hey it's educational to try naively.

Problems:

1) Examine fft data and calculate certain statistics that can identify speech sound boundaries.
2) Use those or other statistics to identify sounds as sibilant.
3) Apply certain effects selectively to sibilants.
My provisional solutions (of course I continue experimenting):
1) Compute spectral standard deviation, make boundaries where absolute value of second derivative of that (or of its logarithm) exceeds some threshold; refine boundaries to zero crossings. (I have also tried mean and median and other things. Any might be good enough with the right values just for de-essing but I was also trying to make it separate stops from vowels, without too many extra boundaries. Various mixed success with the different criteria.)
2) Identify a sibilant sound as having the average value in excess of some threshold.
3)
• Do I simply de-amplify by some fixed factor? A more sophisticated approach (not yet tried) might make the factor a function of rms and change softer sounds less.
• I also do this: identify the peak frequency in the spectrum and notch it; then repeat. That fixes the occasional s sounds that come out with a painful whistle somewhere between 5 and 8 kHz which is evident when you look at the spectrogram. To my ears, this treatment does not noticeably affect the quality of other sibilant sounds so I apply it indiscriminately.
• Also: perhaps a crossfading of the effect might be desirable, I've written a bit to do that, but my experience tells me I can get away without that provided I solve part 1 precisely enough.

Another neat trick I'm using is to return the difference between the fixes and the original rather than the fixed version. Then I duplicate a track, fix one, listen to the combination, or hear the original again solo; if I don't like any fix, I can fix the fix by silencing or fading part of the diff track. Then I mix when all is done.

Who is curious to share code and make suggestions? Or tell me not to waste time and just use this or that package.
Paul L

Posts: 884
Joined: Mon Mar 11, 2013 7:37 pm

### Re: A Nyquist de-esser, and more?

Paul L wrote:Hello, I posted an earlier experiment here and I see there were many downloads but little discussion.

Unfortunately that is often the way. I wish we were able to persuade people that download plug-ins to provide feedback as it is a great help for plug-in developers.

I found your previous work very interesting, and it's an area that I am interested in, though as I wrote, from a different perspective.

Paul L wrote:Wheel reinvention perhaps

No harm in that - it can be very educational, and you may even come up with "better wheel".

Paul L wrote: 1) Examine fft data and calculate certain statistics that can identify speech sound boundaries.
2) Use those or other statistics to identify sounds as sibilant.
3) Apply certain effects selectively to sibilants.

The way that most de-essers work is much simpler than that. Usually they are just dynamic compressors that operate on a fairly narrow high frequency band. It will be interesting to see if your approach works better (a "better wheel")

Paul L wrote:1) Compute spectral standard deviation, make boundaries where absolute value of second derivative of that (or of its logarithm) exceeds some threshold; refine boundaries to zero crossings. (I have also tried mean and median and other things. Any might be good enough with the right values just for de-essing but I was also trying to make it separate stops from vowels, without too many extra boundaries. Various mixed success with the different criteria.)

One advantage of using a compressor as the basis of the processing is that the attack and release time, in effect, "fade" the effect in and out, so that precise alignment to zero crossings is unnecessary. However, I realise that you may want zero crossing detection for other aspects of your voice processing project.

What I think could be interesting would be to use your "sibilant detection" method, and then apply a more conventional "dynamic compression" to the detected sibilants. If the "S detector" works well, then potentially it could provide a de-esser that avoids the "dulling" of other sounds that can occur with more simple effects.
steve

Posts: 46210
Joined: Sat Dec 01, 2007 11:43 am
Operating System: Linux *buntu

### Re: A Nyquist de-esser, and more?

Can you explain "dynamic compression." Is there any simple version of that one could code for just one sibilant sound? Then in combination with the rest of my work, who knows.
Paul L

Posts: 884
Joined: Mon Mar 11, 2013 7:37 pm

### Re: A Nyquist de-esser, and more?

"Dynamic Compression" reduces the dynamic range of the audio. The common name for such an effect is "Compressor".
Audacity has a built-in compressor that is described here: http://manual.audacityteam.org/o/man/compressor.html

In order to operate on one specific frequency band, the audio is usually split into three parts by frequency. The low pass and high pass parts are left unprocessed, while the band pass part is processed through the compressor.

I started working on a de-esser some time ago - I'll have a look and dig out the code if you're interested.
steve

Posts: 46210
Joined: Sat Dec 01, 2007 11:43 am
Operating System: Linux *buntu

### Re: A Nyquist de-esser, and more?

I meant to ask about your de-esser. I found discussion of it with a search of old forum posts but only found a broken link to another site.
Paul L

Posts: 884
Joined: Mon Mar 11, 2013 7:37 pm

### Re: A Nyquist de-esser, and more?

Sorry about the delay - it was a long time ago that I was working on a de-esser and it took me a while to find the stuff.
Attached is one of the files that I found. Note that this is experimental, badly written, and probably does not work very well but it demonstrates the basic idea of compressing a high frequency band.

deesser.ny
steve

Posts: 46210
Joined: Sat Dec 01, 2007 11:43 am
Operating System: Linux *buntu

### Re: A Nyquist de-esser, and more?

How successful do you consider your experiment?

I have not yet tried it nor figured out every detail of the code, but I gather that you have options to apply the effect or to see first the graph of the control signal as one experiments with frequency settings. The complicated thing I put on the PlugIns board earlier similarly had an option to display a non-sound and a great many experimental dials. Its ultimate output was only labels and not an effect.

Surely your one page is less complicated than all of my stuff. I might discover that my calculations from fft data are not worth it but I don't really know yet. I am trying to graph curves whose levels or slopes have some relation to what we would subjectively identify with the boundaries between speech sounds. Absolutely sharp divisions might not exist yet sibilants at least seem well marked to my eyes in spectrogram view. They are noisy sounds with dispersed spectra unlike vowels and stops. Spectral standard deviation (not in the version yet that I shared earlier) is one way to quantify that dispersedness of the timbre, independently of the amplitude. Though it is f and th that are noisiest yet they do not tend to harshness. There must be some imprecision in my boundaries as snd-fft skips, if I am to have acceptable performance, but I can still catch the brief aspirations of t and k,which is good because I think those sounds often need treatment too.

I understand your method for applying the effect is to cut the sound into three bands with lowpass, highpass, and bandpass, then make a control signal based on the amplitude of the middle band, deamplify the middle band, then put the pieces back together. Do other de-essers do that? As I said, I was simply deamplifying sibilant slices by a constant factor but also identifying any whistling frequency and notching it. I find that some harsh esses have a white stripe in the spectrogram that you can simply see and this works to eliminate that. In fact I wrote a standalone effect for just that fix on any slected region and it was useful.
Paul L

Posts: 884
Joined: Mon Mar 11, 2013 7:37 pm

### Re: A Nyquist de-esser, and more?

I also see that your default boundary between the modified middle band and the unchanged high frequency band is 8 kHz.

My productions via the Audiobook Creation Exchange ultimately get sold through Audible.

I noticed something about downloaded titles from Audible. If I play them and capture the waveforms in Audacity, the spectrograms look neatly truncated just at 8kHZ in the "4" (highest quality) format. But do the same with the free sample excerpts available with any title, and the cutoff is 10 kHz! Is either truncation perceptible to sharper, younger ears than mine?

If I highpass my own speech at 8kHz with severe 48 dB rolloff, this discarded part is barely audible whistling to my slightly damaged ears. It sounds like the calls of waxwings. But does it make a subtle difference in the crispness of speech?
Paul L

Posts: 884
Joined: Mon Mar 11, 2013 7:37 pm

### Re: A Nyquist de-esser, and more?

Paul L wrote:How successful do you consider your experiment?

That was just one of the more readable parts of a series of experiments. With careful tuning I found that it could be very successful, especially with cases that had severe whistling sibilance. The major problem is to "tune it" correctly, but that is where your work with phonemes looks really interesting.

Paul L wrote:I understand your method for applying the effect is to cut the sound into three bands with lowpass, highpass, and bandpass, then make a control signal based on the amplitude of the middle band, deamplify the middle band, then put the pieces back together. Do other de-essers do that?

The ones that I've looked at use a similar approach, though it may be handled with FFT rather than biquad filters. The key part is the use of compression on the frequencies that need to be reduced. Using compression solves the problems of "fading" the effect in and out by using a little "lookahead" to smoothly reduce the gain of the required frequency band in time to catch the sibilance, then smoothly release the gain back to unity as the the sibilance passes.
steve

Posts: 46210
Joined: Sat Dec 01, 2007 11:43 am
Operating System: Linux *buntu

### Re: A Nyquist de-esser, and more?

Paul L wrote:do the same with the free sample excerpts available

Could you post a link to an example?

Paul L wrote:Is either truncation perceptible to sharper, younger ears than mine?

I'm not sure that my ears are any younger than yours
steve

Posts: 46210
Joined: Sat Dec 01, 2007 11:43 am
Operating System: Linux *buntu

Next