FFT analysis

LSylvia · September 22, 2014, 9:59pm

I have files that contain the output of stimuli of an experiment that I am running. Is there anyone out there who can help me run an FFT analysis and help me to interpret what I am seeing? I know what I need to see but I am not sure how to get to that analysis and I am not 100% sure of what I am looking at. I live in NJ. Does anyone have a consultant they can recommend> I am willing to pay a consultant fee.

steve · September 22, 2014, 10:32pm

Does Plot Spectrum provide what you need?

Gunnar · September 22, 2014, 11:19pm

If you start with the “raw” PCM data, you will usually cut it into fixed size “frames” first. Typical frame sizes are powers of two, like 1024, 2048 or 4096 samples per frame. Next, you usually apply some kind of window function to each frame, e.g. the Hanning window. This is done by simply multiplying the i-th sample in the frame with the i-th weight of the window function. Finally, you compute the FFT of each frame. I would highly recommend to not implement FFT yourself (except for educational purposes), but instead use the FFTW library for that purpose. FFTW is OpenSource and extremely fast.

Now, how to interpret the FFT data: Computing the FFT of a frame with a size of N samples gives you N FFT coefficients. However, you actually only have N/2 FFT values. That’s because the second N/2 values are just the first N/2 values mirrored (because your input was real-valued). So you only need to look at the first N/2 FFT values. This means, if you use, for example, frames of size 2048 samples, you effectively get 1024 FFT coefficients per frame. Those FFT values are complex though! Usually you are interested in the signal magnitudes. So if r[n] is the real part of the n-th FFT value and i[n] is the imaginary part of the n-th FFT value, what you actually want to look at is p[n] = sqrt(r[n]^2 + i[n]^2), where “sqrt” is the square root. Put simply, the value p[n] is the magnitude of the n-th frequency. This is the “spectrum” of your input data. For example, if your input data was a simple Sine wave, you would notice that only a single frequency will have a significant value (and the index of p[n] where you have that peak indicates the frequency of the Sine), while all other frequencies are near-zero (they are not exactly zero due to the spectral leakage).

LSylvia · September 23, 2014, 1:58am

Thank you for your response but it is WAY beyond my understanding. What I’m looking to find out is whether I have the following 2 stimuli: 1. a bandpass masking noise with a center frequency of 1KHz, a width of 800 Hz, level of 30dB/Hz and a duration of 300ms; with a target signal of 1Kkz, 20 ms duration. This is considered my "no-notch " condition
and 2) a bandstop masking noise with a center frequency of 1KHZ, a width of 1200 Hz with a 400 HZ spectral notch, a level of 30dB/HZ and a duration of 300 ms. This also has a target stimuli of 1kHz, 20ms in duration.

I have some file that were recorded as the output from my headphones. I am not sure how to give me a plot that can check these parameters.
I’d love to have the contact name of someone who could help me with this.

Gunnar · September 23, 2014, 8:09pm

Well, what you are going to see in your FFT analysis greatly depends on what your original input signal was, not only on what filter you apply. Anyway, if we assume that your original input signal was “white noise” (which is always good for testing), then - without any filter applied, you would expect that all frequencies have approximately the same magnitude. Therefore, if you compute the FFT values p[n], as explained in my previous post, for such a “white noise” input signal, then you’d expect that all your values p[n], for n in the 0 to (N/2)-1 range, have pretty much the same absolute values - with a certain fluctuation over time, of course.

Now, if you apply a “bandpass” filter on such a signal, it will cancel out the frequencies outside of the filter’s range. In other words, all frequencies below the filter’s lower bound will be removed. And all frequencies above the filter’s upper bound will be removed too. Only frequencies between the filter’s lower and upper bounds remain - probably with some “fade in” and “fade out” near the boundaries. As far as the FFT values are concerned, this simply means there will be some index L for which all p[n] with n < L become zero. And there will be some index U for which all p[n] with n > U become zero. Or in other words, you will see that significant values also remains at p[n] for L < n < U.

White Noise:

Same with Bandpass filter (1000 to 2000 Hz):

LSylvia · September 24, 2014, 12:02am

Thank you. Actually the 2nd one is a bandstop filter. SO I think the energy between the upper and lower limit of the filter will be removed. There is a 400Hz spectral notch. In the middle of the notch is a 1000Hz pure tone and in the middle of the other masking noise (with no bandstop filter) there is also a 1000Hz pure tone. I have a pretty good idea of what both of those stimuli would look like but I’m not sure of how to get to it using “plot spectogram”. The other variables such as: encoding, byte order, start offset, and sample rate, I dont really understand. Also, the plots I am getting think have time across the top and maybe SPL on the left side.

Gunnar · September 24, 2014, 11:05am

To my understanding, a “bandpass” filters allows frequencies between the filter’s lower and upper bound to pass through, while other frequencies (outside the filter’s range) are suppressed. For the example, I had configured a bandpass filter with 1000 Hz lower bound and 2000 Hz upper bound. And indeed, as can be seen in the picture, frequencies below 1000 Hz and above 2000 Hz have been reduced to near-zero. Only frequencies in the 1000 to 2000 Hz range remain - with some “fade in” and “fade out” at the boundaries. The fade-in/fade-out at the boundaries of the filter are controlled by the “order” of the filter. Higher order means “steeper” fade-in/fade-out, and vice versa.

With a “bandstop” filter, on the other hand, we would expect that only frequencies within the filter’s range are removed, while the frequencies outside of the filter’s range remain. So that would be the opposite of what we see here.

Well, pretty much any audio editor has a “Spectrogram” or “FFT” view. But it’s also pretty straight forward to compute yourself, as explain before. Especially if you defer the hard work (the DFT computation) to FFTW

You don’t really have to care. If you use a library, such as libsndfile, the exact file format (encoding, byte order, start offset, etc.) will be hidden from you. All you do is feed the input file into libsndfile (it supports pretty much anything from WAVE to W64, AU/SND and AIFF), and what you get back from the library is simply a sequence of sample values - stored as floating point values in the -1 to 1 range - that you can pump trough your FFT analysis. Couldn’t be easier.

The sample rate does matter, of course. According to Nyquist’s Theorem, the highest frequency you can have is the sample rate divided by two. So if you have a sample rate of 44100 Hz (i.e. 44100 samples per second), for example, then the highest frequency that is retained in that signal is 22050 Hz. That becomes important as soon as you want to map the FFT coefficients to the corresponding frequency in Hertz!

(If I know the sample rate is 44100 Hz, then I know the highest FFT index corresponds to a frequency of 22050 Hz, for example)

steve · September 24, 2014, 1:37pm

This would probably be easier to discuss if we had access to an example file.

@ LSylvia
If your audio samples are short, you can attach them directly to your forum post - see here for details: https://forum.audacityteam.org/t/how-to-post-an-audio-sample/29851/1