elementary snd-fft question

Am I stupid or is this documentation page inconsistent? Does snd-fft return an array of values, or does it return an object that is repeatedly queried for arrays of values? http://www.audacity-forum.de/download/edgar/nyquist/nyquist-doc/manual/part8.html#index611

SND-FFT returns an array.

SND-FFT is a “class”, which normally you would want to access with an “iterator” so that you can step through the sound and get a series of arrays that define the sound. IMHO object orientated programming in Nyquist is not very pleasant, which is probably why you will find few examples, but it can be used to good effect in some specific cases such as when using FFT or performing bit-wise DSP.

I’ve attached the FFT tutorial from the Nyquist source documentation which you may find useful. You will need to unzip the file, then open the html file in your web browser.
fft_tutorial.html.zip (9.84 KB)

There’s one example of object orientated programming here: https://forum.audacityteam.org/t/a-study-in-pink/28026/4

“snd-fft” is a function, but it is also class, and snd-fft “returns” an array? … this doesn’t map quite to the notions of the C++ I did in a former life.

Sorry, I’m misleading and confusing you - my multi-tasking went a bit wonky :confused:
I’ve also got the opposite problem to you - I’m coming from LISP and trying to get to grips with C++ terminology.

Yes SND-FFT is a function that returns an array.
SND-FFT is usually used in a method. I think the tutorial that I attached should make a lot more sense than I can :wink:

Okay, let me see if I’m getting this. The page I first linked to says:

“Instead, Nyquist violates its “pure” functional model and resorts to objects for FFT processing. A sequence of frames is represented by an XLISP object. Whenever you send the selector :next to the object, you get back either NIL, indicating the end of the sequence, or you get an array of FFT coefficients.”

That talk of :next is not describing any class built into the Nyquist library. That is what misled me. Rather, it is describing the typical usage, such as in the examples you sent me. You demonstrate how to make an iterator class whose :next method invokes the snd-fft function. The page I saw looks like a truncation of your tutorial so context was missing.

The snd-fft function is like snd-fetch – it has a side effect on its sound argument, advancing some sort of built in cursor. So if I scan a sequence of frames for a sound, but also want to scan its samples, I must use snd-copy first. In this sense every sound is an “object” with state.

I could dispense with the object oriented stuff, not building an iterator class, and just query a sound repeatedly with snd-fft, getting a sequence of arrays to examine.

That’s about right.
To me, objects of this Kind are bad because they do not allow random Access (compare the Show coeffs functions). The objects have always to be gone thru from start to end, no matter which Frame you want to inspect.

Okay! I wrote and debugged some simple code (maddening, make stupid errors and Nyquist just chokes with no clue, but anyway).

I’ve got something that iterates over frames. But, now what? How do I interpret the numbers in the frames? What are their units? If I make a control for my analyzer with a -dB value, and negate it and db-to-linear it, I can compare that number to a sample value from snd-fetch: but can I compare it to a coefficient out of snd-fft?

Or rather, square that number, and compare to the sum of squares of two successive coefficients, if I’m interested in some frequency and want to ignore phase?

The FFT-mod (FM) example has somewhat hidden the code to calculate from real cosine/sine to amplitude (/phase).

No, that doesn’t answer my question. That shows how you combine corresponding entries of two corresponding frames. Scale the coefficients of one by the rms’s of pairs of coefficients of the other. That doesn’t tell me how I compare coefficients with a scalar dB value.

The simple exercise I’m trying, to test my understanding, is to label the “hot spots” at a certain frequency. I want to look at a spectrogram view, and see where it is “bright” for some frequency; then, make an analyzer that puts labels at (closely enough) the same places.

I want to know whether I can take a pair of coefficients from the array, square, add them, and then compare that to the square of the answer from (db-to-linear x) where x is a non-positive dB value. Or is that wrong?

If I made no mistake in my code, then the sum of the squares of coefficients for the same frequency is sometimes greater than (db-to-linear 0) even without any clipping, so I must be wrong.

I can’t follow you.
The calculated value from the mentioned code (only from one Frame), which you describe as RMS is just the Amplitude for any given frequency, and this is of course directly comparable with a scaler.
It is only written in linear form, i.e. 0.5 = -6 dB.
Maybe this will help you more:

;; fft magnitude and phase
(defun fft-mag-phase (frame)
(let* ((n (length frame))
 (phase (make-array (/ (- n 2) 2)))
(magnitude (make-array (/ n 2))))
(dotimes (i (length phase))
(setf (aref phase i) (* (/ (atan (aref frame (1+ (* i 2))) (aref frame (* (1+ i) 2))) pi) 180))
(setf (aref magnitude (1+ i)) (sqrt (+ 
(expt (/ (aref frame (1+ (* i 2))) (/ n 2)) 2.0)
(expt (/ (aref frame (* (1+ i) 2)) (/ n 2)) 2.0)))))  
(setf (aref magnitude 0) (/ (aref frame 0) n))
(setf (aref magnitude (1- (/ n 2))) (/ (aref frame (1- n)) n))
(vector magnitude phase)))

You can delete the second vector and its calculation, since you’re not interested in it and I leave it up to you to Format the code properly.

sqrt (a * a + b * b) is not the rms of a and b, but sqrt(2) times that, duh I misspoke.

Are we talking about the same code? I thought you meant Steve’s fft-modulator-class example on this page.


Samples from frames of two sounds are combined, but I don’t see where a sample is compared to a scalar sample threshold.

If the fft returned by snd-fft is as defined by the first formula in this page, then I think I see my misunderstanding. The first coefficient, for instance, is not the average DC offset, because it is not divided by the length of the sample window. So if you fft’d a constant non-zero signal, that coefficient depends on window size. Similar scaling for the other coefficients. Do I get it now?


I see you divide the values from the frame by (/ n 2) before squaring. What is the relation between the length of the frame and the length in samples of the window?

Frame size and window size are identical.
32 samples return 32 coefficients. The frequencies are samplerate/window size Hz apart (from DC to Nyquist).
But as you’ve mentioned earlier, snd-fft grabs always chunks of 1020 samples (equal to the destructive effect of snd-fetch).
Or are you refferring to the window the Sound is multiplied with? That has the same size (an error is raised if not).
snd-fft can take frame sizes that are not a power of two, but I don’t know if this is just done by Zero padding. In any case, snd-ifft does not accept odd lengths (a pity).

I hadn’t heard about the number 1020.

Should snd-fft be given only window sizes of 2^n samples?

So the first coefficient is dc offset, then there are pairs of coefficients for multiplies of sample rate / window. I got that.

The value 1020 is nowhere mentioned but you can take for granted that it is true.
I’ve stumbled over it only per Chance.
If it is useful for your Analysis to take odd numbers, do so.
For example: to inspect a Signal for 60 Hz hum, it is ideally to take an window size of 735 (at 44.1 kHz) because each bin-pair represents a partial of 60 Hz.
Maybe the Limit applies only within an iterator object. I once got this error message though.
By the way, the Phase Argument from my code above is not reliable yet, it Returns a value of 180 for a pure sine wave with 0 phase. I must see what results atan gives for different cases and properly correct the values.
Additionally, DC and Nyquist are not included, because the Amplitude for the sine wave is always 0 and the Phase for the cosine therefore also 0.
to get the Phase for a given Amplitude, the array-index is therefore n-1, i.e. a pair consists of n & n-1 for the two vectors.

Note that there is an error in the snd-fft documentation: http://audacity.238276.n2.nabble.com/Nyquist-FFT-Tutorial-Help-td7265789.html

Update – I fixed one stupid bug where I computed 2/x when I wanted x/2.

So here’s a little screenful of code. As explained I want to attach labels to the “hot spots” at a certain frequency. It’s doing something approximately like that.

I don’t yet know the intelligent things to do with the window size and skip. For now they are multiples of the period of the chosen frequency and not restricted to powers of 2. nil for the window parameter of snd-fft. I want to figure out a more useful way to place then endpoints of the labels too. I should do some zero crossing calculation.

;nyquist plug-in
;version 1
;type analyze
;categories "http://lv2plug.in/ns/lv2core#AnalyserPlugin"
;name "Find Hot Spots"
;action "Finding sound..."

;control control-frequency "Frequency [Hz]:" real "" 1500.0 0.1 44100
;control control-amplitude "Amplitude [-dB]:" real "" 24 0 60
;control control-window-length "Window Length [cycles]:" int "" 5 1 100
;control control-skip-length "Skip Length [cycles]:" int "" 1 1 100

(defun push-label (l start end length skip srate)
  ; find the times in seconds that delimit the start of the start-th frame
  ; and the end of the end-th frame
  ; do not make touching or overlapping labels
  (let ((scaled-start (/ (* start (float skip)) srate))
	(scaled-end (/ (+ length (* end (float skip))) srate)))
     ((or (null l) (> scaled-start (cadar l)))
      ; new label
      (cons (list scaled-start scaled-end "") l))
      ; overlap or touch the previous label, so extend it
      (cons (list (caar l) scaled-end "") (rest l))))))

(defun scan-frames (snd freq-hz amp-db window-length skip-length)
  (let* ((my-s (snd-copy s))
	 (srate (snd-srate s))
	 (period-samples (/ srate freq-hz))

					;to do: find more intelligent values for these three
	 (length (truncate (* window-length period-samples)))
	 (skip (truncate (* skip-length period-samples)))
	 (window nil)

	 (index1 (* 2 window-length))
	 (index0 (1- index1))

	 ; take the threshold in dB, convert to linear, and multiply
	 ; by length / 2 to make it comparable to the coefficients computed
	 ; by fft which have the same factor in them (apart from dc and
	 ; Nyquist frequencies for which it is length).
	 (amp-linear (/ (* length (db-to-linear amp-db) 2)))
	 (ampsq (* amp-linear amp-linear))
    (do* ((n 0 (1+ n))
	  (frame (snd-fft my-s length skip window)
		 (snd-fft my-s length skip window)))
	 ((not (arrayp frame))
	  (if (not start-of-label)
	    (push-label result start-of-label (1- n)
			length skip srate)))
	 (let* ((coeff0 (aref frame index0))
		(coeff1 (aref frame index1))
		(coeffsq (+ (* coeff0 coeff0) (* coeff1 coeff1)))
		(below (< coeffsq ampsq)))
	   (cond ((and (not start-of-label) (not below))
		   start-of-label n))
		 ((and (numberp start-of-label) below)
		   result (push-label result start-of-label (1- n)
				      length skip srate)
		   start-of-label nil)))))))

 (scan-frames s control-frequency (* -1 control-amplitude)
	      control-window-length control-skip-length)
 "Found no hot spots")

I don’t see any obvious errors or bugs, but I do find the code to be a bit convoluted. The first thing that I would do would be to tidy the code a bit.

There have been some conventions established for Nyquist plug-ins that were not in place in some of the very early released plug-ins, so I’ll comment on those too. (Nyquist plug-in conventions are generally not “must do” things, but “preferred”).


Round brackets are preferred in interface elements (applies also to built-in effects), so:

;control control-frequency "Frequency (Hz)" real "" 1500.0 0.1 44100

rather than:

;control control-frequency "Frequency [Hz]:" real "" 1500.0 0.1 44100

Where negative number input is required (for example for dB level) it is preferred to use negative numbers rather than negative units, thus:

;control control-amplitude "Amplitude (dB)" real "" -24 -60 0

rather than:

;control control-amplitude "Amplitude [-dB]:" real "" 24 0 60

Single letter variable names are generally discouraged. Better to use something like “labels” rather than “l”. Possible exceptions are within small self-contained functions or macros where the meaning is clear, for example:

;;; square of numbers of signals
(defun square (x)
  (mult x x))

Also note that a colon is automatically added after the control text, so no need to add a colon in the quoted text.


(let ((val1 init1)
      (val2 init2)
      (val3 init3))

(do ((val1 init1 next1)
     (val2 init2 next2)
     (val3 init3 next3))
    ((test) return-val)

(if (test)

(when (test)

Global variables.

Variables defined in the ;control statements are global, so no need to redefine them in functions. This can aid readability and thus bug fixing.
If you are using linear gain or amplitude within the code, but inputting as dB, it can improve readability to convert the (global) dB value to linear from the start, rather than putting the conversion into the function(s).
sound-srate is a global for the sample rate of the track, so there is no need to calculate it as local variable (srate (snd-srate s)) within a function. If you need it as an integer you can just use (let ((srate (truncate sound-srate))) for the integer value and the already defined sound-srate where you need it as a float.

Thus, your function call:

(scan-frames s control-frequency (* -1 control-amplitude)

could be simply:

(scan-frames s)

or probably better:

(scan-frames (snd-copy s))

(currently you have “snd” defined as a local variable in scan-frames, but you never use it)

“length” is the name of a LISP function, so best (less confusing) to not to use it as the name of a variable.

I have rewritten some things since then, such as renaming length.

I have a prejudice against global variables… I wanted to write functions that might be part of my little library and need inputs that are not directly from controls, so pardon my preference.

As for the controls – I was following examples like SilenceMarker.ny which is in the distribution but does not comply with those guidelines!

Tell me, do Nyquist functions mult and integrate work on arrays of floats as they do on sounds? I gather some generality of inputs is allowed. Suppose I want to examine the power spectrum of a frame f: could I assign (mult f f) to a variable and then just fetch and add pairs from it?

My little project now is “speech recognition.” Hardly in the full blown sense, but I wonder if it’s easy enough to teach an Analyzer to distinguish the vowels, voiced stops, unvoiced stops, and sibilants. Selecting a sentence and hitting a key and having neat boundaries drawn around those could be quite nice. I could tab through them and apply the usual fixes to those kinds of sounds, without much zooming in and out. Voiced stops with noise in them, for instance, are often improved with a lowpass filter that you would certainly not want applied to the whole sentence.

Making this tool so trustworthy that I just automate it as an effect would be really nice.

For now my challenge is understanding how to use these spectra and a good zero crossings finder.

I have added these magic words to my .emacs file

(put 'if 'lisp-indent-function nil)
(put 'when 'lisp-indent-function 1)
(put 'unless 'lisp-indent-function 1)

(put 'do 'lisp-indent-function 2)
(put 'do* 'lisp-indent-function 2)

Any others I should add? And how do I tell emacs to enter lisp-mode for .ny files? I saw no recommended magic words for let.