how to run YIN starting at sound start + 0.25s?

stepheneb · January 24, 2018, 10:51pm

In my Extended Pitch Detect plugin I’m selecting 5s of audio from one track starting at the beginning of the track and the plugin is generating headers and one row of data that looks like this:

time        duration    frequency   RMS         Confidence
0.000       0.500       309.40      0.295       0.992

I’d like to generate 7 more rows of data incrementing the start time by 0.25s with each row.

yin operates on the start of the sound for a specific duration: (yin sound minstep maxstep stepsize)

see: https://www.cs.cmu.edu/~rbd/doc/nyquist/part8.html#index575

So to generate the next row looks like I need to pass it a new sound based on the existing sound starting at 0.25s.

Reading this section of Nyquist doc am not seeing an obvious way to clip 0.25 from the beginning of a sound.

https://www.cs.cmu.edu/~rbd/doc/nyquist/part8.html

Anyone know how to do this?

Thanks!

steve · January 25, 2018, 1:22am

One way to step through a sound in blocks, is to grab a section of the sound as an array, and then convert the array back into a sound for analysis (or analyze the array directly if appropriate).

Here’s a simple example that grabs blocks of 1000 samples and performs very simple analysis (prints the peak level), then steps through to the next 1000 samples, and so on.

(do ((ar (snd-fetch-array *track* 1000 1000)(snd-fetch-array *track* 1000 1000)))
    ((not ar) "Results in debug log")
  (setf audio-block (snd-from-array 0 *sound-srate* ar))
  (print (peak audio-block 1000)))

stepheneb · January 25, 2018, 6:56am

Thanks Steve for the sample!

I’ve adapted it with some of my code and your pitch-detect YIN code to create a table and am having trouble getting YIN to produce correct results.

Here’s a screenshot of a pluck of a high E string and the report generated by my Extended Pitch Detect plugin. The track’s about 5s. The fundamental frequency under testing tension is about 309 Hz.

When I run my adapted code which generates a table YIN reports the fundamental frequency is around 62 Hz.

I’m pretty sure the do loop is iterating properly because the reported RMS values trend down in just the way I expect.

Am wondering if YIN needs more than the sound I’m generating and binding to audio-block …

Maybe something I’ve got wrong because I’m just learning lisp … confused a bit about differences between setq and setf …

Here’s what the table code generates – intended to duplicate what I am doing manually now by adjusting the selection and running the Extended Pitch Detect plugin. The sample window is 0.5 s and it is moved forward 0.25 s with every iteration.

The size of step varies a bit because I’m correcting for integer rounding errors when determining the size of the the array of sound values.

sample-index represents the sequence position of the first sample of audio-block represented as it’s position in the original sound in track.

time          duration      frequency     Confidence    RMS           sample-index  step
0             0.500         62.904        0.992         0.295         0             2756
0.250         0.500         62.877        0.994         0.203         2756          2757
0.500         0.500         62.886        0.994         0.117         5513          2756
0.750         0.500         62.886        0.994         0.087         8269          2756
1.000         0.500         62.886        0.994         0.062         11025         2756
1.250         0.500         62.864        0.995         0.046         13781         2757
1.500         0.500         62.864        0.995         0.033         16538         2756
1.750         0.500         62.864        0.995         0.025         19294         2756
2.000         0.500         62.864        0.995         0.019         22050         2756
2.250         0.500         62.864        0.995         0.016         24806         2757
2.500         0.500         62.864        0.995         0.015         27563         2756
2.750         0.500         62.864        0.995         0.012         30319         2756
3.000         0.500         62.864        0.995         0.011         33075         2756
3.250         0.500         62.864        0.995         0.010         35831         2757
3.500         0.500         62.864        0.995         0.008         38588         2756
3.750         0.500         62.864        0.995         0.007         41344         2756
4.000         0.500         62.864        0.995         0.007         44100         2756
4.250         0.500         62.864        0.995         0.005         46856         2756
4.500         0.500         62.864        0.995         0.004         49612         2757
4.750         0.500         62.864        0.995         0.005         52369         2756
5.000         0.500         62.864        0.995         0.005         55125         2756

And here’s the lisp code which I’m running in the Nyquist Prompt effect to generate the table:

;; Initializations
(setq f0 nil)       ; initialise detected frequency
(setq confidence 1) ; initialise confidence
(setq sample-window-time 0.5)
(setq sample-step-time (/ sample-window-time 2))
(setq sample-index 0)
(setq sample-length (round (* *sound-srate* sample-window-time)))
(setq time 0)

(setq *float-format* "%1.3f")

(psetq min-hz 40 max-hz 8000)

;; Set range in steps (MIDI note numbers)
(psetq minstep (hz-to-step min-hz)
       maxstep (hz-to-step max-hz))

;;; Apply YIN to first DUR seconds
(defun getyin (sig dur)
  (let ((srate (min *sound-srate* (* 8 max-hz))))
    (if (< srate *sound-srate*)
        (progn
          (setf sig
            (if (arrayp sig)
                (sum
                  (extract-abs 0 dur (force-srate srate (aref sig 0)))
                  (extract-abs 0 dur (force-srate srate (aref sig 1))))
                (extract-abs 0 dur (force-srate srate sig))))
          (setq srate (snd-srate sig)))
        (setf sig
          (if (arrayp sig)
              (sum
                (extract-abs 0 dur (aref sig 0))
                (extract-abs 0 dur (aref sig 1)))
              (extract-abs 0 dur sig))))
    (let ((stepsize (truncate  (/ (* 4 srate) min-hz))))
      (yin sig minstep maxstep stepsize))))

;;; Find most confident frequency
(defun bestguess (yin-out)
  (do ((step (snd-fetch (aref yin-out 0))(snd-fetch (aref yin-out 0)))
       (conf (snd-fetch (aref yin-out 1))(snd-fetch (aref yin-out 1))))
      ((not step))
     ;(format t "~a Hz \t ~a %~%" (step-to-hz step) (* 100 (- 1 conf)))
    (when (and (= conf conf)  ; protect against nan
               (< conf confidence))
      (setq confidence conf)
      (setq f0 step)))
  f0)

(round (* *sound-srate* sample-step-time))
(round (* *sound-srate* (- time (/ sample-index *sound-srate*))))

(defun generate-table ()
  (format t "~a\t~a\t~a\t~a\t~a\t~a\t~a~%" "time" "duration" "frequency" "Confidence" "RMS" "sample-index" "step")
  (do* ((index 1 (+ index 1))
        (sample-step (round (* *sound-srate* sample-step-time))
          (round (* *sound-srate* (- (+ time sample-step-time) (/ sample-index *sound-srate*)))))
        (ar (snd-fetch-array *track* sample-length sample-step)
          (snd-fetch-array *track* sample-length sample-step)))
      ((not ar) "Results in debug log")
    (setf audio-block (snd-from-array 0 *sound-srate* ar))
    (setf f0 (bestguess (getyin audio-block sample-window-time)))
    (format t "~a\t~a\t~a\t~a\t~a\t~a\t~a~%"
      time
      sample-window-time
      f0
      (- 1.0 confidence)
      (snd-fetch (rms audio-block))
      sample-index
      sample-step)
    (setq sample-index (+ sample-index sample-step))
    (setq time (+ time sample-step-time))))

(generate-table)

steve · January 25, 2018, 2:44pm

There’s two schools of thought:

Always use SETF (it does everything that SETQ does and more)
Use SETQ for setting simple numeric values, and use SETF for everything else.

I think that historically SETQ came first. It is just a shorthand way of writing (set (quote variable) value)
The QUOTE function, which may be written as a single quote character, tells Lisp not to evaluate the variable.
These assignments are just different ways of writing the same thing:

(set (quote my-var 42))
(set 'my-var 42)
(setq my-var 42)

This will throw an error:

(set my-var 42)
;; error: unbound variable - MY-VAR

The SETF command is more powerful, and will allow other types of assignments, such as setting the value of an element in an array:

(setf ar (make-array 3))
(setf (aref ar 1) "My String Value")
(print ar)  ;returns "My String Value"

Which prints to the debug window:

#(NIL "My String Value" NIL)

We could use (setq ar (make-array 3))
but (setq (aref ar 1) "My String Value) will fail because (aref ar index) is a function and not a simple “symbol”. SETQ can only be used with symbols (simple variables).

For looking up things about XLISP, the XLISP manual is more detailed than the Nyquist manual, and provides examples for most functions.
The XLISP manual is here: XLisp

Regarding the bigger question, I’ll need to spend some time with your code, which I don’t have time to do right now.
Can you narrow down the problem by writing short test scripts for each of your functions?

stepheneb · January 25, 2018, 9:14pm

The most obvious bug I fixed was to remember to use step-to-hz before reporting the frequency value!

But there is still a more subtle problem. I made a simpler implementation to generate a table of frequency estimations which can be run in the Nyquist Prompt. It shifts the sample window over approximately 0.5s on every iteration and asks YIN to calculate frequency on a 0.1s slice.

First however I used some plugins to generate a simple waveform that descends in pitch and amplitude from A3 to G2. I’ll use this as a reference waveform to test the implementation.

Generate Tone: 220 Hz, 0.8 amplitude for 5s
Sliding Time Scale/Pitch Shift: final pitch shift: -2 semitones
Adjustable Fade: S-Curve Out

A screenshot of the results:

Here’s the code which is running in the Nyquist Prompt:

;; Initializations
(setq time 0.0)
(setq window-time 0.5)
(setq window-length (+ 1 (round (* window-time *sound-srate*))))
(setq f0 nil)       ; initialise detected frequency
(setq confidence 1) ; initialise confidence
(setq *float-format* "%1.3f")

(setf sndcopy (snd-copy *track*))

(psetq min-hz 40 max-hz 8000)

;; Set range in steps (MIDI note numbers)
(psetq minstep (hz-to-step min-hz)
       maxstep (hz-to-step max-hz))

;;; Apply YIN to first DUR seconds
(defun getyin (sig dur)
  (let ((srate (min *sound-srate* (* 8 max-hz))))
    (if (< srate *sound-srate*)
        (progn
          (setf sig
            (if (arrayp sig)
                (sum
                  (extract-abs 0 dur (force-srate srate (aref sig 0)))
                  (extract-abs 0 dur (force-srate srate (aref sig 1))))
                (extract-abs 0 dur (force-srate srate sig))))
          (setq srate (snd-srate sig)))
        (setf sig
          (if (arrayp sig)
              (sum
                (extract-abs 0 dur (aref sig 0))
                (extract-abs 0 dur (aref sig 1)))
              (extract-abs 0 dur sig))))
    (let ((stepsize (truncate  (/ (* 4 srate) min-hz))))
      (yin sig minstep maxstep stepsize))))

;;; Find most confident frequency
(defun bestguess (yin-out)
  (do ((step (snd-fetch (aref yin-out 0))(snd-fetch (aref yin-out 0)))
       (conf (snd-fetch (aref yin-out 1))(snd-fetch (aref yin-out 1))))
      ((not step))
    (when (and (= conf conf)  ; protect against nan
               (< conf confidence))
      (setq confidence conf)
      (setq f0 step)))
  f0)

(defun generate-frequency-table ()
  (format t "~a\t\t~a\t~a~%" "time" "frequency" "RMS")
  (do ((ar (snd-fetch-array sndcopy window-length window-length)
        (snd-fetch-array sndcopy window-length window-length)))
      ((not ar) "Results in debug log")
    (setf audio-block (snd-from-array 0 *sound-srate* ar))
    (format t "~a\t~a\t\t~a~%"
      time
      (step-to-hz (bestguess (getyin audio-block 0.1)))
      (snd-fetch (rms audio-block)))
    (setq time (+ time (/ window-length *sound-srate*)))))

(generate-frequency-table)

Here’s a table of the results with one extra column on the right that shows the frequency estimations individually calculated:

time    frequency   RMS         calculated separately
0.000   219.948     0.458       219.95
0.500   219.948     0.535       217.66
1.000   219.948     0.518       215.38
1.500   219.948     0.437       213.01
2.001   219.948     0.363       210.68
2.501   208.306     0.287       208.26
3.001   208.306     0.196       205.83
3.501   208.306     0.116       203.45
4.001   208.306     0.053       200.95
4.501   208.306     0.013       198.45

The RMS value is dropping each time through the loop – so the audio being processed each time through the loop appears to correctly represent 0.5s slices of the original sound.

I suspect some subtle problem in the functions getyin or bestguess … ??

stepheneb · January 25, 2018, 11:09pm

Fixed.

The problem was the global variable confidence used to represent the best confidence value in the function bestguess needs to be reset to 1 every time bestguess is called.

Here’s the updated bestguess function:

;;; Find most confident frequency
(defun bestguess (yin-out)
  (setq confidence 1)
  (do ((step (snd-fetch (aref yin-out 0))(snd-fetch (aref yin-out 0)))
       (conf (snd-fetch (aref yin-out 1))(snd-fetch (aref yin-out 1))))
      ((not step))
    (when (and (= conf conf)  ; protect against nan
               (< conf confidence))
      (setq confidence conf)
      (setq f0 step)))
  f0)

steve · January 25, 2018, 11:17pm

You beat me to it. I was about to give you a cryptic clue: “you’re getting too confident”

You have discovered why it is so often said that “globals are bad / evil” (not that they are - you just have to be very careful when and where you use them).

JimWI · January 5, 2021, 3:53pm

Hello: I just tried your “updated” example on a file of mine with 5 mandolin notes recorded. The out I received was:
Nyquist !
%1.3f
I’d love to be able to get the frequencies and the time durations, as shown in your table.
Any suggestions?
Jim

steve · January 5, 2021, 4:14pm

My guess is that you are trying to use the code on a stereo track.
stepheneb’s code is written for mono tracks only.

JimWI · January 5, 2021, 4:36pm

Thanks for the reply.
When I input the file, Audacity says: Mono, 44100Hz, 32-bit float
Steve: Your code works on the file, but it only gets the 1st note. It’s very accurate. If I create a new file with the 1st note removed, your code works on that as well.

JimWI · January 5, 2021, 4:49pm

If it helps, I put 2 test files here:
https://chambersislandusa.com/mando/audacity/testnotes.wav (4 secs long)
https://chambersislandusa.com/mando/audacity/scalenotes.wav (12 secs)

stepheneb · January 5, 2021, 5:04pm

Yes.

When I record in Audacity I’m collecting stereo data but delete one of the channels before running my scripts on mono data.

steve · January 5, 2021, 5:58pm

Results from “testones.wav”

time		frequency	RMS
0.000	5512.501		0.002
0.500	441.234		0.041
1.000	495.712		0.104
1.500	495.898		0.042
2.000	556.505		0.117
2.500	554.872		0.028
3.000	590.785		0.026
3.500	659.309		0.092
4.000	659.394		0.022

steve · January 5, 2021, 6:01pm

Please post the full code that you are using.
When posting, use the “code tags” from the “</>” button around the code like this:

[co****de]
code goes here
[/co****de]

When you run the code in the Nyquist prompt, click the “Debug” button rather than the “OK” button.
After the code has run, the debug window will open. Copy the contents of the debug window and include it (in “code” tags) in your reply.

JimWI · January 5, 2021, 7:41pm

Steve and steveneb:

Thanks!! Both files are mono.
Got it. Had to run it using the Nyquist prompt. I was running it from the Analyze tab.
Is there an ‘easy’ way to save the output to a file?

Thanks again!
Jim

steve · January 5, 2021, 8:16pm

Saving can be written into the Nyquist script, but the “easy” way:

Now that you have it working.

“Help menu > Diagnostics > Show Log”
In the log window, click “Clear”
Run the Nyquist script again, but use the “OK” button this time.

Notice that the output appears in the Audacity log window.

You can now either “Save” the contents of the log window, or copy and paste from the log window.

JimWI · January 8, 2021, 4:18pm

I got the output window to show on both my test files. I noticed that the output shows frequencies in 0.5 sec intervals. What do I change to get frequencies to show in smaller intervals, as 0.1 sec? Does it have something to do with the "dur’ variable?
Thanks in advance.

JimWI · January 8, 2021, 5:06pm

I found it. I wasn’t looking very hard. It’s the “window-time” variable.
Thanks for all the help.
Jim