When playing a pure sine wave, a spectogram shows the graph attached below. The graph is for a 1000 Hz sine wave. I got the file from here: https://www.audiocheck.net/audiofrequencysignalgenerator_sinetone.php
Since a pure sine wave contains only one frequency, a spectogram should only show one thin horizontal line at that one frequency, right? That, however, is not what is shown in the spectogram. I don’t mean the transients in the beginning, let’s ignore them and just look at the formant part. Even there, the line is thick, already for the part which has the same white color, and even thicker if we include the other colors as well. When we look at the most zoomed in part, we can see on the y-axis that even for the part of the line which is white, the frequency ranges from somewhere between 950 to 1050 Hz.
If the range had been something like between 999 and 1001 Hz, I could understand that the FFT would not be able to pinpoint it exactly. But a range of 100 Hz? That’s a lot!
So I wonder, why is this? Does it have to do with the FFT algorithm after all? If so, why is the FFT so inexact? And/or does it have to do with something else, and in that case, what?
(Same file, just different amounts of zoom).
A “continuous” sine wave contains only one frequency.
An instantaneous transition from silence to a sine wave (or any instantaneous transition) contains all frequencies from DC up to the Nyquist frequencies, but only for a very short time (an “instant”) which you may be able to hear as a slight “click”. If the start of the sine wave is at a non-zero position, the click will be more evident.
The time resolution of the displayed spectrogram depends on size of the FFT window. If the sine wave fades in gradually, you will see just the single line that you are expecting.
What you are writing is correct but it only concerns the transients, not the continuous part of the wave.
I edited the wave to fade in gradually, and this the resulting spectogram is in the image below. No transients, sure, but the width of the line still has the same thickness.
So the question remains.
When using FFT analysis, there’s a trade off between precision in the frequency domain and time domain. Greater frequency precision can be achieved by increasing the “window size”, but this is at the expense of time precision. The inverse is also true - the smaller the window size, the greater the precision in the time domain, but the less precision in the frequency domain. The window size is configurable in the spectrogram preferences: https://manual.audacityteam.org/man/spectrograms_preferences.html
Recent versions of Audacity also have options for “reassignment” and “zero padding”. These can both produce narrower lines for simple tones (such as sine tones), but are not without their disadvantages. Reassignment gives the impression of more noise. Zero padding increases processing time and although it “looks” more precise, can be misleading when viewing complex signals.
Audacity’s default settings have been chosen to give a reasonable balance in the various trade-offs for most purposes. The options provide flexibility to tune the algorithm according to need.
The resolution of the spectrogram is directly related to the width of the window (in time) sample. The longer the sample the more precise the determination can be. To oversimplify the problem: To distinguish 1000 Hz from 1001 Hz requires a 1 second sample. To distinguish 1000 Hz from 1010 Hz only requires about 0.1 sec, etc. So as a result your 1000Hz tone is not plotted as sharp spike but instead as a relatively broad curve. I don’t know how wide the sampling window is for the Audacity spectrogram, but it is probably fairly short as it is more desirable to see how the spectrum is changing with time than to have great precision with single frequencies.
You can explore the effect in more detail with the “Plot Spectrum” analysis tool. It has a control for the width of the sample analysed as well as the window function used.