Help me understand spectrograms

Not sure which is the right board to ask this mathematical question on… anyway:

Where can I learn about the window type options for the Spectrogram preferences?

Generate a sine wave at 172.265625 Hz. That makes the period equal to 256 samples at 44.1kHz sampling, and so equal to the default window for Spectrogram settings (Edit, preferences, spectrogram). Amplitude, 0.1.

View the linear scale spectrogram. It’s bright white for the band between 1x and 2x that frequency. But it is nonzero for other bands. It grades away and looks like zero only at the tenth band from the bottom and above that. It is not zero for the DC band.

But the default window is not a simple rectangular one. I change my window type to Rectangular and that’s what I see.

I noticed significant problems with the display. I put a known audio clip in and I didn’t get anything I thought I could use for diagnostics. The answer is the tool is intentionally “crippled” in order to use minimum resources and complete at reasonable time. You can greatly increase the quality of the results with the options, but you’ll be making coffee waiting for it. Those at or near GMT will be making tea. Koz

So that is a special case where the signal lies dead centre of one of the frequency bands.
Take a more typical case where the frequency is not dead centre and you will see that Rectangle window gives a much wider spread of “side lobes”. Windowing can help to minimise the side lobes, but it is always a compromise between time domain, frequency domain main lobe width and frequency domain side lobe spread.

There’s a goo article from National Instruments: Understanding FFTs and Windowing - NI

For Nyquist programming, the Hann (Hanning) window function is often a good choice, both for its characteristics and because it is easy to generate in Nyquist.

; wlen is the windowsize in seconds (local time)
(mult 0.5 (sum 1 (osc (hz-to-step (/ wlen)) wlen *sine-table* -90)))

http://en.wikipedia.org/wiki/Window_function#Hann_.28Hanning.29_window

I do find spectrogram useful with the defaults. I can often zoom right in on a crackle inside recorded speech, which would be hard to hunt for in waveform view, but I use waveform dB view for a precise selection. The trick of synched tracks with identical contents, one of them mute, is useful.

Anyway what I want to understand in crib-sheet fashion without going deeply into the math, what each window type choice is useful for. Now that I am playing with snd-fft in Nyquist to make some analyzer tools, I need to know something about how the numbers I examine correspond to the pretty colors. I don’t understand whether I should use the simple rectangular window (nil) or not.

Center, or bottom? 173 Hz seems to correspond to the bottom of the band on the vertical scale.

Centre I believe (but I could be wrong :wink:)
This is the list of frequencies in Plot Spectrum for an FFT size of 256 and a sample rate of 44100 Hz:

Frequency (Hz)
172.265625
344.531250
516.796875
689.062500
861.328125
1033.593750
1205.859375
1378.125000
1550.390625
1722.656250
1894.921875
2067.187500
2239.453125
2411.718750
2583.984375
2756.250000
2928.515625
3100.781250
3273.046875
3445.312500
3617.578125
3789.843750
3962.109375
4134.375000
4306.640625
4478.906250
4651.171875
4823.437500
4995.703125
5167.968750
5340.234375
5512.500000
5684.765625
5857.031250
6029.296875
6201.562500
6373.828125
6546.093750
6718.359375
6890.625000
7062.890625
7235.156250
7407.421875
7579.687500
7751.953125
7924.218750
8096.484375
8268.750000
8441.015625
8613.281250
8785.546875
8957.812500
9130.078125
9302.343750
9474.609375
9646.875000
9819.140625
9991.406250
10163.671875
10335.937500
10508.203125
10680.468750
10852.734375
11025.000000
11197.265625
11369.531250
11541.796875
11714.062500
11886.328125
12058.593750
12230.859375
12403.125000
12575.390625
12747.656250
12919.921875
13092.187500
13264.453125
13436.718750
13608.984375
13781.250000
13953.515625
14125.781250
14298.046875
14470.312500
14642.578125
14814.843750
14987.109375
15159.375000
15331.640625
15503.906250
15676.171875
15848.437500
16020.703125
16192.968750
16365.234375
16537.500000
16709.765625
16882.031250
17054.296875
17226.562500
17398.828125
17571.093750
17743.359375
17915.625000
18087.890625
18260.156250
18432.421875
18604.687500
18776.953125
18949.218750
19121.484375
19293.750000
19466.015625
19638.281250
19810.546875
19982.812500
20155.078125
20327.343750
20499.609375
20671.875000
20844.140625
21016.406250
21188.671875
21360.937500
21533.203125
21705.468750
21877.734375

Yes, but I mean that the spectrogram display puts those frequencies at the bottoms of the corresponding colored bands, not the middle.

That’s just presentation of course. But do you mean that a better display would center the bands on those frequencies? That frequencies close to but not quite equal to one of those, either above or below, contribute to the coefficient?

The sort of experiment I described, I guess, helps answer my own question. I can observe how sine waves of various frequencies display, and get an idea of how they contribute to different bands. I am not sure I know what you mean by “lobes” but I see that with a rectangular window, a frequency that is exactly on that list looks very neat but a frequency far from those on that list bleeds into bands above and below. But it seems a Hann or other window makes the on- and off-frequencies bleed more evenly. Compare 172.265 Hz with 258.4 Hz (1 1/2 times that).

Or for other fun, generate a rising chirp instead of a single tone. With the rectangular window I see a few “cool” vertical lines where the glissando hits the “on” frequencies, but a lot of
hot stuff above and below the white stripe elsewhere. Vary the window, and there is less of that hot stuff.

It looks like the Hanning window is one of the “better” ones for making the white diagonal stripe look neat, but Blackman-Harris is “best.”

Am I getting it?

And so now I want to use those windows in my Nyquist programming. You told me the formula for a Hann window. Where can I research how to do the others?

This looks to be pretty close to being centred on 172.3 Hz - perhaps it is a fraction off, but I’d have thought close enough.
firsttrack003.png

Try “Plot Spectrum” as well.
Hanning173.png
rectangle173.png
Hanning190.png
rectangle190.png

“Best” depends on what it’s for. For making clear white lines, yes the Blackman-Harris is pretty good. The article that I posted the link for gives some application notes.

I am using version 2.0.3 on windows 7 and I am sure that when I make a single bright band in the view with rectangular window, it is from about 170 to 340 Hz on the scale.

Perhaps we have identified a minor difference of display behavior across platforms!

The previous screen shot was on Debian Linux.
This one is on Windows XP:
firsttrack000.png

You should try Zero padding as well, not only changing the window type.
You’ll get all the formulas you Need in the Wikipedia article about windowing.
For speech, 20 MS time segments are fairly common. Use a hamming window with 50 % overlap.

Hm, I think the placement of the band is varying for me with the zoom level. In a bad, misleading way, actually, sometimes with 173 not even inside the band! But it seems to prefer to put 173 at the bottom of the band if I zoom the scale to put 1722 at the top.