Calculate quantiles of audio data

I’m a bit at sea with Lisp, and would appreciate any advice.

This is a cut-down toy version of what I’m trying to do, but if anyone could get me this far, I think I could figure out the rest of it. So let’s suppose I wanted to calculate the upper and lower quartiles of a track’s sample values, and then linearly rescale the track (i.e., add an offset and constant gain) so that those quantiles moved to 0.4 and 0.6. How could I do that in Nyquist?

One issue, even before worrying about the syntax, is how to compute the quantile. The naive way is to sort the data first, but that seems awfully inefficient. In a more familiar platform, I’d be confident to write a quick iterative method based on Ross Millikan’s answer here: But it’s pretty daunting in Lisp! And also, for all I know, Nyquist or some well-known plug-in already has a percentile function.

Any ideas? :slight_smile:

We get into trouble every time a scientist tries to use Audacity for data jobs. Audacity is not a WAV editor. It’s an audio production manager and the goal is to sound good, not have every byte line up perfectly. Much of the time they don’t line up and experiments can give unexpected results.


Ha ha yeah, as I said, the problem I described above is just a toy example of what I have in mind. And it is of course an audio effect, which is why I’m hoping to do it in Audacity rather than a more familiar platform. I don’t have any ulterior number-crunching motivations here! :laughing:

In Nyquist, a “” is a data type. So where we might have “integers”, “floats”, “characters”, “strings” … as data types in other programming languages, in Nyquist we also have "sound"s.

As you no doubt know, in digital audio, a sound is comprised of a series of “samples”, where each “sample” is a number. That is also the case in Nyquist, and it is possible to iterate through the samples one at a time (usually that is done using SND-FETCH to grab the next sample value). However, looping through a sound one sample at a time in a LISP loop is pretty slow and inefficient (remember that Nyquist is an interpreted language). Much better, when possible, is to operate on “sounds” directly.

In Nyquist, all sample values are represented as 32-bit floats. A value of +1.0 is at the top of the audio track, and a value of -1.0 is at the bottom. A series of zeros (0.0) is absolute silence.

A stereo sound in Nyquist is an array containing two sounds. The first element is the left channel sound, and the second element is the right channel sound. Nyquist itself can handle multi-channel sounds (as arrays with “sounds” as each channel), though this is rarely useful in Audacity as Audacity can currently only handle mono / stereo.

Nyquist provides a lot of “DSP primitives” (functions that act on sounds), which in many cases are much faster and much more efficient than manually looping through samples. For example, if you wish to set all sample values greater than +/-0.5 to exactly +/-0.5, you can do it by “clipping” the sound:

(clip *track* 0.5)

or if you wish to clip only the top (positive) part of the waveform, you could use the S-MIN function:

(s-min *track* 0.5)

For more complex manipulation of sample values above/below specified threshold points, the function SHAPE can be used. An example can be found in the “Limiter” plug-in code:

Thanks Steve, that’s a lot of help. If I can write a function to tell me what proportion of a selected audio takes a value below an input y, then I can just use a few rounds of linear interpolation to get a good enough approximation.

I’ve run into another hitch, though. I have a ten-second audio track, that’s actually just a straight line from -1 at 0s to +1 at 10s, to experiment on. I am applying my function to it, with an input cutoff of y=-0.9. The proportion of the audio that’s below -0.9 is actually 0.05, so that’s the output I hope to see. Instead I get 0.5. The factor of ten suggests it might be related to the length of the track, but I don’t quite understand. Below is what I’m entering at the Nyquist Prompt. (I know the mean function’s ugly, and feel free to suggest improvements, but it does seem to work; the “cumulative” function is the one that’s going awry.)

(defun mean (s)
  ; Returns the mean of the signal. Adapted from acx-check.ny
  (setq chunk 4096)
  (setq blen (truncate (/ (snd-length s ny:all) chunk)))
  (setq result s)
  (when (> blen 1)
    (setq result (snd-avg result blen blen op-average)))
  (setq blen (snd-length result ny:all))
  (setq result (snd-fetch (snd-avg result blen blen op-average))))

(defun cumulative (y s)
  ; Returns the proportion of the signal that's at or below y.
  ; The cut-off y should be between -1 and +1.
  (mean (shape s (pwlv 1 (+ 1 y) 1 (+ 1 y) 0 2 0) 1)))

; Check these functions:
(print (mean *track*)) ; correctly gives 0
(print (mean (mult *track* *track*))) ; correctly gives 0.3333
(print (cumulative -0.9 *track*)) ;  incorrectly gives  0.5

Thanks for any assistance!

This is because of the way Audacity manipulates the Nyquist environment for ;process type effects.

What Audacity does, and this is usually very convenient, is to map 1 “unit” of Nyquist time to the length of the selection.
This is easiest to see with an example.

;type generate
(osc 72 0.5)  ;generates half a second sine wave

;type process
(osc 72 0.5)  ;generates a sine wave, half the length of the selection

Unfortunately, there are some situations where this can become confusing, and using SHAPE is one such situation.

In your code:

(pwlv 1 (+ 1 y) 1 (+ 1 y) 0 2 0)

This creates a sound (at the control sample rate) that is double the length of the selection. So, for example, if your selection is 10 seconds long, then you are generating a low sample rate sound that is 20 seconds long. However, the “origin” of the table is set to 1.0 seconds, so you don’t have the table that you think you do.

The simple workaround / solution, is to generate this control signal within Nyquist’s “default environment”. The default environment maps 1 “unit” of time to 1 second.

(abs-env (pwlv 1 (+ 1 y) 1 (+ 1 y) 0 2 0))

Another little tip: When using SHAPE with a low sample rate table, extend the end of the table a little so as to avoid a glitch at the end.

(defun cumulative (y sig)
  ; Returns the proportion of the signal that's at or below y.
  ; The cut-off y should be between -1 and +1.
  (setf table
      (abs-env (pwlv 1 (+ 1 y) 1 (+ 1 y) 0 2.1 0)))
  (mean (shape sig table 1)))

As expected, the result is close to 0.05 when applied to a ramp from -1 to +1.

Yes it does look a little peculiar. If, for example, the length of the selection is 10000 samples, then your step size in the first SND-AVG is 2.
Perhaps better to run your first SND-AVG with a step size of a few thousand samples, and then (if necessary) run SND-AVG again with a step size equal to the length of the first pass?


;; Sets step size between 1000 and 100000 samples
(setf step 1000)
(if (> len (* 100 step))
    (let ((temp (truncate (/ len step))))
      (setf step (truncate (/ len temp))))
    (setf step (truncate len)))

(print step)

You’re a legend, Steve. It’s all working correctly (and quickly) now. Here’s the result, as a plug-in, in case anyone’s interested:

;nyquist plug-in
;version 4
;type analyze
;name "Quantile..."
;action "Compute a specified quantile of a sound"
;author "David Bulger"
;control pin "Proprtion whose quantile you want" float "" 0.5 0 1

; 2019-09-24
; Calculate the pth quantile of a sound.

(defun mean (s)
 ; Returns the mean of the signal.
 (setf result s)
 ; Use large (approx equal) block size at each pass, to minimise any edge effects:
 (setf numpasses (round (+ 0.5 (/ (log (float (snd-length s ny:all))) (log 100000.0)))))
 (setf bloxize (round (power (snd-length s ny:all) (recip numpasses))))
 (dotimes (passnum numpasses)
  (setf result (snd-avg result bloxize bloxize op-average)))
 (snd-fetch result))

(defun cumulative (y s)
 ; Returns the proportion of the signal that's at or below y.
 ; The cut-off y should be between -1 and +1.
 (mean (shape s (abs-env (pwlv 1 (+ 1 y) 1 (+ 1 y) 0 2.01 0)) 1)))

(defun quantile (p s)
  ((loy -1) (lop 0)  ;  lower bound for quantile & corresponding cumulative proportion
   (hiy 1) (hip 1))  ;  same for high
  ((< hiy (+ loy 0.00001))  ;  convergence criterion
   (/ (+ loy hiy) 2))  ;  when loy & hiy are close enough, just return their midpoint
  (setf midy (+ loy (/ (* (- p lop) (- hiy loy)) (- hip lop))))  ;  linear interpolation
  (setf midp (cumulative midy s))
  ; Now either the left or right subinterval should contain the quantile:
   ((< midp p) (setf lop midp) (setf loy midy))
   (t (setf hip midp) (setf hiy midy)))))

(quantile pin *track*)
(setf numpasses (round (+ 0.5 (/ (log (float (snd-length s ny:all))) (log 100000.0)))))

Doesn’t that give you:
For tracks less than 100000 sample (~ 2.27 seconds @44100 Hz), 1 pass
For tracks greater than 2.27 seconds, 2 passes.

What are you actually trying to do there?

PS. For readability, I’d recommend a default indent of 2 spaces, and try to avoid really long lines of code - no-one should need to count parentheses to work out what the code says.

Yeah, that’s right. Or for tracks longer than a couple of months, three passes (though that’s certainly not my use-case any time soon).

What I’m trying to do here is to determine at the outset how many passes (say P) I’m going to do, and then calculate approximately the Pth root of the number of samples, and use that for the block size, so that each pass is doing about the same amount of work. That’s based on concerns that might be irrelevant—I dunno enough about how Nyquist functions work to be sure. But as an example of what I want to avoid, suppose a have a track with 101000 samples, and I break it into two blocks, one with the first 100000 samples, and the second with the remaining 1000. Then I average the values in each block; that gives me two values. Then, on the second pass, I compute the (unweighted) mean of those two values. That will produce a very incorrect estimate of the overall mean, in general.

I may be wrong, but I worked out 3 passes at around 64 hours. Regardless of the exact figure, the code will never get that far. The second argument for SND-LENGTH is “maxlen” - the maximum number of samples to be calculated (Nyquist Functions). NY:ALL is a “preset” large integer. By default:
NY:ALL = 1000000000 (a bit over 6 hours @ 44100 Hz)

(print ny:all)  ;returns 1000000000

Even for sound lengths of a few hours, this code is unnecessarily slow and uses a lot of RAM because of reading the sound into RAM to calculate the length.

It may look a bit hacky for a purist, but there’s a very useful (and efficient) workaround. Audacity already knows the length (in samples) of the selected audio, and makes that available to Nyquist as the value of “LEN”. As we know that “S” is track, then we don’t need to calculate the length because we’ve already got it as “LEN”. Note that “LEN” is a float.

So a much more efficient way to get the same result is:

(if (< len 100000)
    (setf numpasses 1)
    (setf numpasses 2))

It’s an interesting problem.

SND-AVG has a maximum step size of a little over 2 million samples.

Clearly, if the step size is very small on each pass, then we will need a lot of passes, which will be slow and inefficient.
If the step size is very large on each pass, then we risk inaccuracy due to “left over” samples, or increased complexity in dealing with those samples separately.
I assume that a small amount of error is acceptable.

SND-AVG applies a rectangular window of length blocksize to the audio, and calculates the average value within that window, then advances the window by stepsize and repeats.

In our case, where blocksize = stepsize, unless LEN is exactly divisible by blocksize, then there the final window will not be full, and the “empty” part of the window is padded with zeros. One might think that this would cause the final sample value to be less than the actual average, except in the case where LEN is exactly divisible by blocksize, but this is not the case. The function attempts to maintain the correct length of the output (at a lower sample rate), so if the final block is mostly empty, it is excluded from the output. The length of the output is rounded to the nearest sample.

Taking this all into account, one fairly simple strategy could be (pseudo-code):
IF LEN < 2,000,000
THEN blocksize = (int) LEN
RETURN blocksize
LEN = LEN / 1000

This gives us, in all cases, a final block that is less than 1000th of the selection length (so a very small error), and a blocksize of at least 2000 samples (unless the selection is less than that).

We can then either calculate the number of times that we will need to run SND-AVG, or simply run it in a loop until the output length is 1 sample.

Hi Steve,

Well yes, I was happy to have errors, as long as they were small, but the more I thought about it, the harder it was to be satisfied with that. I want to stress that we are now purely in the realm of curiosity. But I’ve now tried to write a mean function that does an exact calculation:

(defun meanOfAllSamples (s)
  ; Returns the mean sample value of the signal, accounting exactly for block size edge effects.
  (setf block 100000)
  (setf numWholeBlocks (round (- (/ len block) 0.5)))

  ; The "leftover" is the shorter block at the end, and the cutTime is where it begins:
  (setf lenLeftover (round (- len (* block numWholeBlocks))))
  (setf cutTime (/ (* block numWholeBlocks) *sound-srate*))

  ; Calculate the mean and weight of the leftover:
    ((> lenLeftover 0)
      (setf endTime (/ len *sound-srate*))
      (setf weightLeftover (/ (float lenLeftover) block))
      (setf meanLeftover (snd-fetch (snd-avg
        (extract-abs cutTime endTime s) lenLeftover lenLeftover op-average))))
    (t (setf weightLeftover 0) (setf meanLeftover 0)))

  ; Calculate the mean of the string of full-size blocks:
  (setf mean (snd-fetch
      (snd-avg (extract-abs 0 cutTime s) block block op-average)
      numWholeBlocks numWholeBlocks op-average)))

  ; Now correct the mean by including the leftover:
  (setf mean (/ (+ (* mean numWholeBlocks) (* meanLeftover weightLeftover))
                (+         numWholeBlocks                  weightLeftover))))

(print (meanOfAllSamples *track*))

There are a few things I’m unsure about:

  • I guess I don’t know exactly how it decides sample inclusion when the samples are right on the boundary, so there could be off-by-one errors there.
    I’ve had to use time (seconds) for extract, but samples for Len and snd-avg, and that’s meant some conversion. Maybe if I knew the language better, I could work entirely in samples, and avoid the conversion hassles.
    I’m using len and sound-srate to get the length and sample rate, but I think they report the length and sample rate of the track, which in general might not be the same as the input sound “s”. So that might fail if I applied this function to some other sound.

Anyhow, thanks for your help. If you have comments about this, I’ll certainly read them, but I’m not still stuck & frustrated!

Good :slight_smile:

This is how I think it works (by example):

The length of the selection is LEN.
The sample rate of the selection is SOUND-SRATE
The length of the selection is (/ len sound-srate)
Let’s say the stepsize & blocksize are STEP.

Then the sample rate of the result of SND-AVG is (/ sound-srate step)
So the length of the result in samples is (* (/ len sound-srate) (/ sound-srate step)) = (/ len step)
and because there can only be a whole number of samples, this is rounded to nearest (though the boundary case of half a sample may round either way due to the vagary of the SND-AVG function). To be sure of the length of the result in samples, we could use SND-LENGTH.

How about:

(setf numWholeBlocks (truncate (/ len block)))