Default window size.

In Audacity, default window size for spectrograms is 256. However, it lacks in frequency resolution a lot. I suggest for it to be around 4096.

Listen to 256, 4096 and 32768 spectrograms: https://www.youtube.com/watch?v=ig9GCtJi594

The choice of default window size is a compromise between frequency resolution and processing speed. For long tracks with a large window size, calculating the spectrogram can be very slow. I agree that for modern machines, 256 is a bit small as the default, but I think that 4096 would be too large for many of our users.

Also, temporal resolution decreases as the window size increases, which is another compromise that needs to be taken into account.

Have you clicked the link to listen to these spectrograms? Now hear the original The Mine Song and hear how different it is to 256. Too much pitch information lost.

I did samples of more settings: https://www.youtube.com/watch?v=ji1it6awsN8

Notice how pitchless 256 is. A higher setting would be better.

The spectrogram settings in Audacity have no affect on the sound. They only affect the visual display of the spectrogram track view (http://manual.audacityteam.org/man/spectrogram_view.html)

I know. But hearing the spectrograms allows you to hear what details this spectrogram shows. If you hear 256 spectrogram in Photosounder (see video link, normalized, inverted then gamma set to 3.1:1 for proper volume scale) you can’t hear pitch of his voice, so this information is missing.

The frequency resolution of 256 - default for 44100Hz is 44100/256=172. It can’t distinguish semitones until at about 2900Hz. In contrast, 4096 has 16 times better frequency resolution, distinguishing semitones at 180Hz. Please note that the linear difference between semitones is lower in lower frequencies, so FFT has worse frequency resolution in basses. And remember that if you need better time resolution, 1024 and 2048 are always there with you. Spek (a program that shows spectrograms of sounds) went with 2048. Photosounder (a program that allows editing sounds by editing their spectrograms) went with another method, but it’s equivalent to using different window sizes for half-semitone resolution, or more frequency resolution if time resolution reached 1/100 seconds.

The frequency resolution of 256 - default “is 0”.

Good luck telling what sound it is in 256 - default. The frequency scale is logarithmic from 27 to 20000. If you drew the scale, you can see that it’s an increasing tone, starting at about 260Hz. Exactly, it is 12 second sine chirp from 264Hz to 528Hz at max volume.

And by the way, the spectral leakage stop in 256 - default is 396Hz having a period of 128 on 50688Hz sample rate.

Not only make 4096 default (or 2048, or 1024 if you really want time resolution) but also make logarithmic scale the default. Photosounder even goes as far as showing melodic octaves (semitones if vertically zoomed in, increasing in amount the more zoomed) on the right side. On linear scale, the entire top half is octave or less. Linear only could be useful in low frequencies, where it is much more compact than logarithmic (relative to higher frequencies). Also, linear is not how ears work. I don’t see how constant frequency resolution with FFT or equally spaced overtones could give useful information. Ears don’t hear overtones equally apart.

Paulstretch a song with factor of 1 and time resolution of 0.006. That’s what 256 - default sounds like.

A click on a voice recording:
tracks003.png

You didn’t show time scale.

This time, an actual song:

You probably haven’t watched these videos.

Regardless of the absolute time scale, it can clearly be seen that the time resolution with a window size of 256 is much better (16 x better) than at 4096). Similarly, frequency resolution is much better (16 x better) with a window size of 4096 than a window size of 256. (Are you familiar with the expression “swings and roundabouts”?) There is no “one size fits all”, which is why it is a setting that users can change to suit their needs.

Personally I think that as computers are typically faster than 10 years ago, and it is still not common for people to process multi-hour recordings, a default window size of 1024 would be a better compromise, but it will always be a compromise and will never suit everyone.

Also, did you know who invented weird OS name “Linux F*ckbuntu”?

I think “1024” is usable, but “256 - default” is definitely a stretch.

Listen to actual detail in spectrograms: https://www.youtube.com/watch?v=ji1it6awsN8