working with audio samples

shravani · February 13, 2014, 8:43am

hello,

While accessing the audio track, I want to save the amplitude and time of samples that are above a particular threshold. How can i do that? I had thought of using structure array the way we use in C language, but structure is not supported in Nyquist.
Please help me through this.

Hoping for a reply soon.
Thanks

steve · February 13, 2014, 5:38pm

Generally with Nyquist it is better to work with “sounds” rather than with “samples”.
For example, if you want all of the samples that are above +0.5 in value (-6 dB positive going peaks, you could write a loop that reads the sample values and writes the value and index into a list of (index value) pairs:

(setf peak-list
  (do ((val (snd-fetch s)(snd-fetch s))
       (i 0 (1+ i))
       (values ()))
      ((not val) values)
    (if (> val 0.5)
        (push (list i val) values))))

; test it
(format t "~a samples over 0.5 found.~%Last found sample index: ~a~%~
            Last found value: ~a"
  (length peak-list)
  (first (nth 0 peak-list))
  (second (nth 0 peak-list)))

(print "done")

However, running through samples in a LISP loop is pretty slow.
A much more efficient way is to use Nyquist built-in functions and work with “sounds”:

(setf peak-list-as-sound (s-max s 0.5))
(format nil "Done.~%Returned ~a samples." (snd-length peak-list-as-sound ny:all))

The second code calculates a “sound” that is the maximum of 0.5 and the original sound. It is very much faster than the LISP loop (even though it is also calculating the length of the returned sound)
To see the output sound from the second code, just run:

(s-max s 0.5)

This will be a bit slower because Audacity has to calculate the waveform display and write the returned data to disk, but still a lot faster and more efficient than the LISP loop.

Whether or not you can work entirely with sounds and avoid looping through samples depends on your application (which you have not described), but if it is possible to do so, work with “sounds” rather than samples.

If you really have to work directly with samples, then it is often more efficient to convert the sound to a lower sample rate, so there are less samples to loop through. See “Silence Finder” (SilenceMarker.ny in your Audacity Plug-ins folder) as an example.

shravani · February 14, 2014, 3:42pm

Thanks for the code. I will surely try and execute it.

the code mentioned below is from Analyse menu - Beat Finder.
I am not understanding how it works. If you could please explain.

(do ((c 0.0)
     (l NIL)
     (p T)
     (v (snd-fetch s2)))
    ((not v) l)
  (if (and p (> v thres))
      (setq l (cons (list c "B") l)))
  (setq p (< v thres))
  (setq c (+ c 0.001))
  (setq v (snd-fetch s2)))

Thanks

steve · February 14, 2014, 4:33pm

I’ve added code tags and line indentations to the code that you posted making it easier to read.
(To add code tags, click on the “Code” button above the message composing box then insert your code between the code tags like this:

[code]
... code goes here ...

[/code]

The code is a “Loop”. It is described in the XLisp manual here: http://www.audacity-forum.de/download/edgar/nyquist/nyquist-doc/xlisp/xlisp-ref/xlisp-ref-093.htm

After the “DO” command, there are 4 local variables set. “Local” means that they only exist within this block of code.
“C” is set to an initial value of 0.0
“l” to NIL
“p” to “true” [note that “T” or “t” is a special symbol that has the boolean value “true”]
“v” is set to the value of a sample which is read with (snd-fetch s2), where “s2” must be a “sound”. http://www.cs.cmu.edu/~rbd/doc/nyquist/part8.html#index260

The next line is a test to see if the loop should continue: (not v)
and then the return value “l”.

Note that it is NOT generally good practice to use single characters as variable names. It may be OK where the variable is only used within a very limited context, but code is usually easier to read, easier to maintain and easier to bug fix if concise but meaningful names are used. “l” (lower case “L”) is a particularly bad name to use as in some fonts it looks like the number “1”.

So the loop initially sets these values, and then continues looping until “not v”, which means until “v” is NIL. “v” will be NIL when there are no more samples to fetch.

Next comes the body of the loop:

(if (and p (> v thres))
    (setq l (cons (list c "B") l)))

If “p” is “true” (not NIL) and “v” is less than “thresh”
then
set “L” to (cons (list c “B”) l))

(cons (list c “B”) l)) adds a list containing the value of “c” and the string value “B” to the list “L”

This could be written a lot more clearly.
If we used variable names:
“time” instead of “c”
“below-thresh” instead of “p”
“Label-list” instead of “l”
“val” instead of “v”

(if (and below-thresh (> val thres))
    (push (list time "B") label-list))

“push” is described here: http://www.cs.cmu.edu/~rbd/doc/nyquist/part13.html#index1000

The last part of the body sets new values for p, c and v.

For what its worth, I’d probably have written it something like this:

(do ((label-list ())
     (below-thresh T)
     (time 0.0 (+ time 0.001))
     (val (snd-fetch s2) (snd-fetch s2)))
    ((not val) label-list)
  (if (and below-thresh
           (> val thres))
      (push (list time "B") label-list))
  (setq below-thresh (< val thres)))

Which with lost of comments:

(do ((label-list ())    ; Initialise empty list.
     (below-thresh T)   ; Initialise flag as 'true'.
     ; Initialise 'time = 0'. Add 0.001 on each loop.
     (time 0.0 (+ time 0.001))
     ; Get sample values.
     (val (snd-fetch s2) (snd-fetch s2)))
    ; Until 'val = NIL'. Reurn label-list.
    ((not val) label-list)
  ; If crossing from below to above threshold
  (if (and below-thresh
           (> val thres))
      ; Add label to list.
      (push (list time "B") label-list))
  ; is val below threshold?
  (setq below-thresh (< val thres)))

shravani · February 15, 2014, 7:21am

Hello,

Thank you so much . The beat finder is finally understood perfectly. Here the time is incremented by .001. But when i tried to increment time by 1 or 2, the output remains same as that for 0 .001.

When the wave rises above the threshold and again after some time goes below the threshold, within that section i want to mark the max db value as the beat. I want to save that particular amplitude and the time at which it occurs. Can you please tell me the function which can be used to extract the corresponding time.

Also, another query is, where and how are the variables declared? I also want to know about the global variables.

Thanks in advance.

s2v2 · February 15, 2014, 9:07am

Hello,
I added your code to a plug in as follows :

;nyquist plug-in
;version 3

;type analyze
;categories "http://audacityteam.org/namespace#OnsetDetector"
;name "Testing"
;action "Calculation"

;; Released under terms of the GNU General Public License version 2:
;; http://www.gnu.org/licenses/old-licenses/gpl-2.0.html 

;control thresval "Threshold Percentage" int "" 75 5 100
        (setf peak-list
      (do ((val (snd-fetch s)(snd-fetch s))
           (i 0 (1+ i))
           (values ()))
          ((not val) values)
        (if (> val 0.5)
            (push (list i val) values))))

    ; test it
    (format t "~a samples over 0.5 found.~%Last found sample index: ~a~%~
                Last found value: ~a"
      (length peak-list)
      (first (nth 0 peak-list))
      (second (nth 0 peak-list)))

    (print "done")

But initially it gives the as “Nyquist returned value :75” (75 is the threshold value that we hve set) and it is giving following error after clicking on debug option.

error: bad argument type - #(#<Sound: #b08f9450> #<Sound: #b08f94f0>)
Function: #<Subr-SND-FETCH: #8ffbae0>
Arguments:
#(#<Sound: #b08f9450> #<Sound: #b08f94f0>)
Function: #<FSubr-DO: #8ffceb4>
Arguments:
((VAL (SND-FETCH S) (SND-FETCH S)) (I 0 (1+ I)) (VALUES NIL))
((NOT VAL) VALUES)
(IF (> VAL 0.5) (PUSH (LIST I VAL) VALUES))
Function: #<FSubr-SETF: #8ff9670>
Arguments:
PEAK-LIST
(DO ((VAL (SND-FETCH S) (SND-FETCH S)) (I 0 (1+ I)) (VALUES NIL)) ((NOT VAL) VALUES) (IF (> VAL 0.5) (PUSH (LIST I VAL) VALUES)))
1> error: unbound variable - PEAK-LIST
if continued: try evaluating symbol again
Function: #<Subr-SND-FETCH: #8ffbae0>
Arguments:
#(#<Sound: #b08f9450> #<Sound: #b08f94f0>)
Function: #<FSubr-DO: #8ffceb4>
Arguments:
((VAL (SND-FETCH S) (SND-FETCH S)) (I 0 (1+ I)) (VALUES NIL))
((NOT VAL) VALUES)
(IF (> VAL 0.5) (PUSH (LIST I VAL) VALUES))
Function: #<FSubr-SETF: #8ff9670>
Arguments:
PEAK-LIST
(DO ((VAL (SND-FETCH S) (SND-FETCH S)) (I 0 (1+ I)) (VALUES NIL)) ((NOT VAL) VALUES) (IF (> VAL 0.5) (PUSH (LIST I VAL) VALUES)))
2> “done”
“done”
2> 1>

Can you tell where i am going wrong.
Thanks

steve · February 15, 2014, 12:32pm

Variables do not need to be declared before use. A symbol becomes bound just by setting a value for it.
For simple values you can just use SET, but the symbol must be quoted so that LISP does not try to evaluate it before it is bound.

(set value 3)  ; error: unbound variable - VALUE
(print (* 2 value))

(set 'value 3)  ; VALUE is quoted
(print (* 2 value)) ; returns 6

“SET QUOTED” is so commonly used that it has shorthand notation:

(setq value 3)
(print (* 2 value)) ; returns 6

This is the standard way to bind a value to a symbol.

In some more complex situations we need a more powerful command than SETQ, for example if we want to set the value of an array element. In this case we use SETF [set field].

(setq myarray (vector 1 2 3 4)) ; zero indexed array
(setf (aref myarray 2) 30)
(print myarray) ; returns #(1 2 30 4)

You are applying the code to a stereo track.
A stereo track is passed to Nyquist as an array of 2 sounds, bound to the symbol “S”.
(snd-fetch s) tries to get the next sample value from the sound “s”, but if “s” is not a sound it will fail.

See here for how to handle stereo sounds: Missing features - Audacity Support

Robert_J_H · February 15, 2014, 1:27pm

Some additional facts about “Set” and related functions.

‘Setq’ is widely regarded as obsolete. You can always use ‘setf’ since it combines ‘setq’ and ‘setf’ and uses either depending on the context.
‘Set’ is useful to set a global variable when a variable of the same name is within a local construct. Thus:

(let (var)
   (setf var 10); local assignment
   (set 'var 8); global assignment
   (print var)); local print --> 10
(print var); global print --> 8

Another useful function for multiple assignments is ‘psetq’:

(psetq var 5 next-var 10 message "done")

steve · February 15, 2014, 5:36pm

In some dialects of Lisp SETQ may be less fashionable, but “obsolete”?
Even if it were deprecated in Common Lisp (which it isn’t), Nyquist is based on a relatively old version of XLisp. At some point in the future SETQ “may” be deprecated, at which point programmers should stop using it.

Personally I prefer to use SETQ when setting a variable to a value and SETF for lists, arrays, sounds and strings. By using the same scheme consistently I find it serves as a reminder and aids readability. As soon as I see SETF in my code I know that it is not a single value, whereas if I see SETQ I know that it is.

Nice tip regarding use of SET to set a global variable within a local construct, though unless writing a very large program it should be easy to avoid re-using a global variable name, so for clarity I’d prefer to use a different variable name within the local construct.

(let (lvar)
   (setq lvar 10); local assignment
   (setq var 8); global assignment
   (print lvar)); local print --> 10
(print var); global print --> 8
(print lvar); error: unbound variable - LVAR

s2v2 · February 17, 2014, 4:46pm

Hello,
When the wave rises above the threshold and again after some time goes below the threshold, within that section i want to mark the max db value as the beat. I want to save that particular amplitude and the time at which it occurs. Can you please tell me the function which can be used to extract the corresponding time.

Thanks in advance.

steve · February 17, 2014, 5:23pm

In the previous examples the time is calculated from the sample count.
time = sample-count / sample-rate

shravani · February 21, 2014, 6:56pm

hi,

Thanks for all your help until now.

As you wanted to know what exactly we are trying to implement, we are working on a project for our final year Engineering course. We want to add a plugin to Audacity which will count the Tempo i.e Beats Per Minute of a sound clip. We have thought of a way to implement it. According to what we have thought, we need to store the time instant at which the sound wave rises above a given input threshold value that is in decibel, and also the instant when the wave falls below the threshold. We also require to store the peak value which is in decibel. For your better understanding, we have uploaded an image which will make you clear about the exact situation. According to the figure we need to save the values of t1,t2,t3… which is in seconds and the values of p1, p2,… which are the peak values, in decibel. The peak values will be considered as the beats which will be counted.

This is what we need according to what we have thought. If you can think of a better solution then please do let us know.

If this is a success we are willing to contribute this plugin to the Audacity.

Hoping for a reply soon.
Thanks in advance.

shravani · February 21, 2014, 6:59pm

Here is the URL for the image that is refered to in the above post. If it does not get loaded please refer to this URL.

https://plus.google.com/photos/111381929354498427442/albums/5982921943066307665/5982921946881729042?enfplm&hl=en&utm_source=lmnavbr&utm_medium=embd&utm_campaign=lrnmre&rtsl=1&pid=5982921946881729042&oid=111381929354498427442

Robert_J_H · February 21, 2014, 10:47pm

Have you seen my recent plug-in “Tempo Teller”?
The current version is:
rjh-tempo-teller.ny (3.73 KB)
I’ve decided not to work with single sample values. If you wanna do that, you should resample the audio track.

My algorithm is roughly as follows:

take 30 s, resample to 51200 (100* 2^9)
create a curve, following the peaks.
Downsampling to get 4096 samples.
take the FFT
Each bin represents one bpm
take only the magnitude for each bin
make a harmonic product spectrum (*).
search peaks that are enclosed by smaller magnitude values.
search for the highest point on the curve defined by these three points.
store the middle bin plus the offset and its magnitude.
sort the list by the magnitude and return the (interpolated) bpm value.

(*) The harmonic product spectrum HPS is employed to eliminate higher harmonics from the spectrum. I have actually used summation instead of multiplication.
It works as follows:
the spectrum with 4096 bins is downsampled to 1/2, 1/3 … 1/6 and all is added up.
For instance:
‘0 4 0 4 0 3 0 2’ plus
‘4 4 3 2’ equals
‘4 8 3 6 0 3 0 2’
You can see that the fundamental frequency is now higher than the second harmonic.

Although I can’t see your image, I think that you’re working on a fully time domain based solution.

The first problem that I see is the threshold for the peaks. It’s always a dangerous thing to use absolute values, at least it isn’t elegant for “blind tempo estimation”. Of course, you can urge the user to set this value but he will soon skip this task and use the default-- which probably produces a wrong result.
Even a learning phase with hundreds of songs won’t you help much, each song has its own dynamic range.
A common approach is to get the local peaks in comparison to the global average and to adjust the threshold dynamically.
Here’s a sample snippet that isolates strong peaks (actually RMS values) (and returns the audio):

(defun drum-filter (s ol-ratio blk blk2 cut &key isolate)
(let* ((r-blk (/ *sound-srate* blk)) 
   (rms (rms (reson s cut (/ cut 20) 2) r-blk (round (/ blk ol-ratio))))
   (avg (snd-avg rms blk2 1 op-average))
   (ctrl (snd-shape (mult avg (recip rms)) (snd-pwl 0 44100 
      (list 0 0.01 70000 0.01 75000 0.95 88201 1.0 88201)) 1))
   (csr (snd-srate ctrl))
   (drums-cut (mult s (snd-avg (snd-chase ctrl (/ 1 csr) (/ 1 csr)) 15 1 op-average))))
(if isolate (diff s drums-cut) drums-cut)))
(multichan-expand 'drum-filter s 0.1 147 1000 100 :isolate t)

The function is called in the last line.
The first value is the hop size; 0.1 means that the window advances by 10 %.
then comes the window size in samples, those will be the local peaks, or better, the local RMS values in this function.
The third value is the amount of windows that make the environment for the local peak, in the sample code 147000 samples or 1000 windows.
There’s also a center frequency for a reference frequency in Hz, 100 here. That’s to get mostly kick drum hits for example. 10000 Hz would also well work for snare drum, cymbals etc.
There are some other, rather complicated functions used, but they serve only one purpose, namely to return a smoother audio.
“T” or “NIL” after “:isolate” will decide if the beats should be isolated or attenuated.

You can of course pursuit your original idea, it is possible to gather the sample values in one vector (single-dimensional array) and the on-/offset times in another one.

Vanita · February 24, 2014, 4:20pm

Hello,

We want to save the time values when the sound wave rises above the input threshold value & falls below the threshold value so as to locate the beats. Please can you help us with the nyquist coding . As our main aim is to caculate beats per minutes for which we have added a new plugin in Audacity analyze menu.

Hoping for reply soon,

Thank you.

steve · February 24, 2014, 5:55pm

Have a look at the code for “beat.ny” in the Audacity plug-ins folder.

Robert_J_H · February 24, 2014, 11:49pm

Is that what you want?

;; return first derivative
(defun differentiator (s-in)
  (biquad s-in  1 -1 0  1 0 0))
;; main function
(defun get-times (snd-in threshold hold result)
  (let* (
     (*sr* (snd-srate snd-in))
     (hour (round (* 3600 *sr*)))
     (peaks (snd-oneshot (s-abs snd-in) threshold hold))
     (start-stop (differentiator peaks))
     (index (snd-pwl 0 *sr* (list 0 0.0 hour (float hour) (1+ hour))))
     (times (mult start-stop index))
     (shortened  (snd-compose times 
        (snd-inverse (integrate (s-abs start-stop)) 0 *sr*))))
(eval result) ))
;;
;; Definitions and outputs to Debug screen
;; The input sound
(setf sig (snd-from-array 0 *sound-srate*
   #(0.1 0.3 0.5 0.3 0.1 0 -0.1 -0.3 -0.5 -0.3 -0.1 0 0 0)))
(psetq threshold 0.2); = -14 dB
(setf stay&hold (/ 3 *sound-srate*)); hold for min 3 samples 
;;
;; Input sound
(snd-display sig)
(terpri); New line
;; First sound, 1 for all that is above threshold:
(snd-display (get-times sig threshold stay&hold 'peaks))
(terpri)
;; differentiate sound to get only start (1) and stop (-1):
(snd-display (get-times sig threshold stay&hold 'start-stop))
(terpri)
;; Multiply with the sample indices:
(snd-display (get-times sig threshold stay&hold 'times))
(terpri)
;; Remove all samples that are zero:
(snd-display (get-times sig threshold stay&hold 'shortened))

The start and stop times are returned as a sound. They are here expressed as sample indices, but you can change them easily to seconds by replacing ‘(float hour)’ with3600.0 in the ‘index’ definition.
The Stay&Hold variable is to bridge valleys in the wave form. You can simply assign a value in seconds, e.g. 0.05 for 50 ms.
You have to add this value to the negative end times to get the proper time, where the peak goes under the threshold.
Change in the above example ‘3’ to ‘4’ and the zero crossing will be ignored. However, the returned sound will only have the start time, but the end time will be simply the total length - 4.
It is probably best to gather those pairs in a list for easier manipulation.
You have to tell us some more details about your algorithm, otherwise, we aren’t able to help you further.

Robert_J_H · February 25, 2014, 1:36am

I’ve used the above submitted code to “filter” out some prominent beats in the following example
The original loop is at the beginning. This loop is then analysed at 12000 Hz (that’s made with “reson”), this leaves kick and snare (the high pitch of beater and stick are found).
The third part holds the bass beats alone, the filter is set to 50 Hz, 10 Hz wide.
The line that does the whole filtering for the kick drum is:

(mult s (get-times (reson s 50  10 1)  0.02 0.05 'peaks))

Note that ['peaks] is taken from the function, that’s the one that writes 1 for all samples above the threshold.
The hold value is 50 ms and the threshold -34 dB (= 0.02).
As I’ve mentioned before, finding the right threshold will be your greatest problem. Modern music has often a prominent drum accompaniment, not so older music. The tempo can of course also be found from instrument onsets, if cleverly analyzed.

weldo5 · October 6, 2016, 7:09pm

Since I work with music, samples in that context would be a harmonic sound/noise that loops in a sample editor, and make notes with say, a pattern editor. Audacity can create “samples” from certain sounds, to be looped in music software (once exported as a wav or soft or flac)