Usage of Audacity "ComputeSpectrum" function

Hi everybody

Not sure if this is the right section but it’s probably better to try the forum before going to the mailing list.

I’m currently experimenting with audio visualization in the browser (see here with a recent Firefox nightly: press compile, then play on the video) and I basically need a spectrogram visualization to prefilter the data.

There are some frameworks for this purpose, but I wanted something small and the Audacity ComputeSpectrum function seems to do the trick, as it only relies on the Fast Fourier Transformations being available, which means I can pretty much take the code, slap a “main” onto it and recompile.

The problem is that I don’t actually know how to use that function beyond the formal parameters:
bool ComputeSpectrum(float * data, int width, int height, int maxFreq, int windowSize, double rate, float *grayscaleOut, bool autocorrelation);

So maybe, just maybe somebody can explain and/or add the necessary comments to the file.

Here are my basic problems:

float * data, int windowSize
Is this supposed to be one window full of data or the whole stream?
What range is the float data supposed to use? -32k to 32k as if it was 16bit signed? 0 to 1? -1 to 1? -0.5 to 0.5? 0 to 64k?

int width, int height, float *grayscaleOut
Apparently, it’s supposed to output a bitmap, but where to? the grayscaleOut field is one dimensional, isn’t it? And additionally, it’s a float, which makes for a rather strange bitmap as well.

int maxFreq
I guess this is simply the frequency of the highest band that should be captured, like 11025 if you want the spectogram to go up to 11khz, correct?

double rate
I guess this is simply the playback rate, like 44100 for a 44khz stream, correct?

bool autocorrelation
Absolutely no idea…

Any help would be highly appreciated

OK, got a bit further… although I have to say that the argument naming is pretty crazy:

“width” apparently means the length of the input buffer, while “height” means the length of the output buffer, i.e. the number of slots.
Apparently, you’re supposed to pass in the whole audio for which you want the specogram, so for example if you want 1 spectogram for 0.1 seconds of a 44khz file you pass in 4410 floats and width==4410

I’ve had some success. The output I get so far is this (encoded as ogg theora & vorbis):
based on this song:
Sunrise (unplugged)
by admiralbob77
Vocals by shannonsongs
2008 - Licensed under Creative Commons
Attribution Noncommercial (3.0)


   inputBuffer float array with length of audio frames per video frame with range 0-1, for example 2205 floats for 20fps (I used 2048 for the test),
   length of above array,
   number of fields in the output (160 in this case),
   unused if last argument is true,
   window size (512 here),
   sameple rate (44100 here),
   array for output (160in my case, see argument #3).
   true (still no idea what that means, but setting it to false produces some strange output)

It’s still not perfect yet and I’m still wondering why. It seems to “miss” a few tones and generally has problems catching short tones… could it be that it’s measuring the frequency of each tone while ignoring the maximum amplitude?