how to run YIN starting at sound start + 0.25s?

Using Nyquist scripts in Audacity.
Post and download new plug-ins.
Forum rules
If you require help using Audacity, please post on the forum board relevant to your operating system:
Windows
Mac OS X
GNU/Linux and Unix-like
stepheneb
Posts: 16
Joined: Wed Jan 17, 2018 3:51 am
Operating System: OS X 10.11 El Capitan or later (macOS)

how to run YIN starting at sound start + 0.25s?

Post by stepheneb » Wed Jan 24, 2018 10:51 pm

In my Extended Pitch Detect plugin I'm selecting 5s of audio from one track starting at the beginning of the track and the plugin is generating headers and one row of data that looks like this:

Code: Select all

time        duration    frequency   RMS         Confidence
0.000       0.500       309.40      0.295       0.992
I'd like to generate 7 more rows of data incrementing the start time by 0.25s with each row.

yin operates on the start of the sound for a specific duration: (yin sound minstep maxstep stepsize)

see: https://www.cs.cmu.edu/~rbd/doc/nyquist ... l#index575

So to generate the next row looks like I need to pass it a new sound based on the existing sound starting at 0.25s.

Reading this section of Nyquist doc am not seeing an obvious way to clip 0.25 from the beginning of a sound.

https://www.cs.cmu.edu/~rbd/doc/nyquist/part8.html

Anyone know how to do this?

Thanks!

steve
Site Admin
Posts: 47940
Joined: Sat Dec 01, 2007 11:43 am
Operating System: Linux *buntu

Re: how to run YIN starting at sound start + 0.25s?

Post by steve » Thu Jan 25, 2018 1:22 am

One way to step through a sound in blocks, is to grab a section of the sound as an array, and then convert the array back into a sound for analysis (or analyze the array directly if appropriate).

Here's a simple example that grabs blocks of 1000 samples and performs very simple analysis (prints the peak level), then steps through to the next 1000 samples, and so on.

Code: Select all

(do ((ar (snd-fetch-array *track* 1000 1000)(snd-fetch-array *track* 1000 1000)))
    ((not ar) "Results in debug log")
  (setf audio-block (snd-from-array 0 *sound-srate* ar))
  (print (peak audio-block 1000)))
9/10 questions are answered in the FREQUENTLY ASKED QUESTIONS (FAQ)

stepheneb
Posts: 16
Joined: Wed Jan 17, 2018 3:51 am
Operating System: OS X 10.11 El Capitan or later (macOS)

Re: how to run YIN starting at sound start + 0.25s?

Post by stepheneb » Thu Jan 25, 2018 6:56 am

Thanks Steve for the sample!

I've adapted it with some of my code and your pitch-detect YIN code to create a table and am having trouble getting YIN to produce correct results.

Here's a screenshot of a pluck of a high E string and the report generated by my Extended Pitch Detect plugin. The track's about 5s. The fundamental frequency under testing tension is about 309 Hz.
extended-pitch-309Hz.png
When I run my adapted code which generates a table YIN reports the fundamental frequency is around 62 Hz.

I'm pretty sure the do loop is iterating properly because the reported RMS values trend down in just the way I expect.

Am wondering if YIN needs more than the sound I'm generating and binding to audio-block ...

Maybe something I've got wrong because I'm just learning lisp ... confused a bit about differences between setq and setf ...

Here's what the table code generates -- intended to duplicate what I am doing manually now by adjusting the selection and running the Extended Pitch Detect plugin. The sample window is 0.5 s and it is moved forward 0.25 s with every iteration.

The size of step varies a bit because I'm correcting for integer rounding errors when determining the size of the the array of sound values.

sample-index represents the sequence position of the first sample of audio-block represented as it's position in the original sound in *track*.

Code: Select all

time          duration      frequency     Confidence    RMS           sample-index  step
0             0.500         62.904        0.992         0.295         0             2756
0.250         0.500         62.877        0.994         0.203         2756          2757
0.500         0.500         62.886        0.994         0.117         5513          2756
0.750         0.500         62.886        0.994         0.087         8269          2756
1.000         0.500         62.886        0.994         0.062         11025         2756
1.250         0.500         62.864        0.995         0.046         13781         2757
1.500         0.500         62.864        0.995         0.033         16538         2756
1.750         0.500         62.864        0.995         0.025         19294         2756
2.000         0.500         62.864        0.995         0.019         22050         2756
2.250         0.500         62.864        0.995         0.016         24806         2757
2.500         0.500         62.864        0.995         0.015         27563         2756
2.750         0.500         62.864        0.995         0.012         30319         2756
3.000         0.500         62.864        0.995         0.011         33075         2756
3.250         0.500         62.864        0.995         0.010         35831         2757
3.500         0.500         62.864        0.995         0.008         38588         2756
3.750         0.500         62.864        0.995         0.007         41344         2756
4.000         0.500         62.864        0.995         0.007         44100         2756
4.250         0.500         62.864        0.995         0.005         46856         2756
4.500         0.500         62.864        0.995         0.004         49612         2757
4.750         0.500         62.864        0.995         0.005         52369         2756
5.000         0.500         62.864        0.995         0.005         55125         2756
And here's the lisp code which I'm running in the Nyquist Prompt effect to generate the table:

Code: Select all

;; Initializations
(setq f0 nil)       ; initialise detected frequency
(setq confidence 1) ; initialise confidence
(setq sample-window-time 0.5)
(setq sample-step-time (/ sample-window-time 2))
(setq sample-index 0)
(setq sample-length (round (* *sound-srate* sample-window-time)))
(setq time 0)

(setq *float-format* "%1.3f")

(psetq min-hz 40 max-hz 8000)

;; Set range in steps (MIDI note numbers)
(psetq minstep (hz-to-step min-hz)
       maxstep (hz-to-step max-hz))

;;; Apply YIN to first DUR seconds
(defun getyin (sig dur)
  (let ((srate (min *sound-srate* (* 8 max-hz))))
    (if (< srate *sound-srate*)
        (progn
          (setf sig
            (if (arrayp sig)
                (sum
                  (extract-abs 0 dur (force-srate srate (aref sig 0)))
                  (extract-abs 0 dur (force-srate srate (aref sig 1))))
                (extract-abs 0 dur (force-srate srate sig))))
          (setq srate (snd-srate sig)))
        (setf sig
          (if (arrayp sig)
              (sum
                (extract-abs 0 dur (aref sig 0))
                (extract-abs 0 dur (aref sig 1)))
              (extract-abs 0 dur sig))))
    (let ((stepsize (truncate  (/ (* 4 srate) min-hz))))
      (yin sig minstep maxstep stepsize))))

;;; Find most confident frequency
(defun bestguess (yin-out)
  (do ((step (snd-fetch (aref yin-out 0))(snd-fetch (aref yin-out 0)))
       (conf (snd-fetch (aref yin-out 1))(snd-fetch (aref yin-out 1))))
      ((not step))
     ;(format t "~a Hz \t ~a %~%" (step-to-hz step) (* 100 (- 1 conf)))
    (when (and (= conf conf)  ; protect against nan
               (< conf confidence))
      (setq confidence conf)
      (setq f0 step)))
  f0)

(round (* *sound-srate* sample-step-time))
(round (* *sound-srate* (- time (/ sample-index *sound-srate*))))

(defun generate-table ()
  (format t "~a\t~a\t~a\t~a\t~a\t~a\t~a~%" "time" "duration" "frequency" "Confidence" "RMS" "sample-index" "step")
  (do* ((index 1 (+ index 1))
        (sample-step (round (* *sound-srate* sample-step-time))
          (round (* *sound-srate* (- (+ time sample-step-time) (/ sample-index *sound-srate*)))))
        (ar (snd-fetch-array *track* sample-length sample-step)
          (snd-fetch-array *track* sample-length sample-step)))
      ((not ar) "Results in debug log")
    (setf audio-block (snd-from-array 0 *sound-srate* ar))
    (setf f0 (bestguess (getyin audio-block sample-window-time)))
    (format t "~a\t~a\t~a\t~a\t~a\t~a\t~a~%"
      time
      sample-window-time
      f0
      (- 1.0 confidence)
      (snd-fetch (rms audio-block))
      sample-index
      sample-step)
    (setq sample-index (+ sample-index sample-step))
    (setq time (+ time sample-step-time))))

(generate-table)
You do not have the required permissions to view the files attached to this post.

steve
Site Admin
Posts: 47940
Joined: Sat Dec 01, 2007 11:43 am
Operating System: Linux *buntu

Re: how to run YIN starting at sound start + 0.25s?

Post by steve » Thu Jan 25, 2018 2:44 pm

stepheneb wrote:confused a bit about differences between setq and setf
There's two schools of thought:
1) Always use SETF (it does everything that SETQ does and more)
2) Use SETQ for setting simple numeric values, and use SETF for everything else.

I think that historically SETQ came first. It is just a shorthand way of writing (set (quote variable) value)
The QUOTE function, which may be written as a single quote character, tells Lisp not to evaluate the variable.
These assignments are just different ways of writing the same thing:

Code: Select all

(set (quote my-var 42))
(set 'my-var 42)
(setq my-var 42)
This will throw an error:

Code: Select all

(set my-var 42)
;; error: unbound variable - MY-VAR
The SETF command is more powerful, and will allow other types of assignments, such as setting the value of an element in an array:

Code: Select all

(setf ar (make-array 3))
(setf (aref ar 1) "My String Value")
(print ar)  ;returns "My String Value"
Which prints to the debug window:

Code: Select all

#(NIL "My String Value" NIL)
We could use (setq ar (make-array 3))
but (setq (aref ar 1) "My String Value) will fail because (aref ar index) is a function and not a simple "symbol". SETQ can only be used with symbols (simple variables).

For looking up things about XLISP, the XLISP manual is more detailed than the Nyquist manual, and provides examples for most functions.
The XLISP manual is here: http://www.audacity-forum.de/download/e ... -index.htm

Regarding the bigger question, I'll need to spend some time with your code, which I don't have time to do right now.
Can you narrow down the problem by writing short test scripts for each of your functions?
9/10 questions are answered in the FREQUENTLY ASKED QUESTIONS (FAQ)

stepheneb
Posts: 16
Joined: Wed Jan 17, 2018 3:51 am
Operating System: OS X 10.11 El Capitan or later (macOS)

Re: how to run YIN starting at sound start + 0.25s?

Post by stepheneb » Thu Jan 25, 2018 9:14 pm

The most obvious bug I fixed was to remember to use step-to-hz before reporting the frequency value!

But there is still a more subtle problem. I made a simpler implementation to generate a table of frequency estimations which can be run in the Nyquist Prompt. It shifts the sample window over approximately 0.5s on every iteration and asks YIN to calculate frequency on a 0.1s slice.

First however I used some plugins to generate a simple waveform that descends in pitch and amplitude from A3 to G2. I'll use this as a reference waveform to test the implementation.
  • Generate Tone: 220 Hz, 0.8 amplitude for 5s
    Sliding Time Scale/Pitch Shift: final pitch shift: -2 semitones
    Adjustable Fade: S-Curve Out
A screenshot of the results:
simnplest-frequency-table.png
Here's the code which is running in the Nyquist Prompt:

Code: Select all

;; Initializations
(setq time 0.0)
(setq window-time 0.5)
(setq window-length (+ 1 (round (* window-time *sound-srate*))))
(setq f0 nil)       ; initialise detected frequency
(setq confidence 1) ; initialise confidence
(setq *float-format* "%1.3f")

(setf sndcopy (snd-copy *track*))

(psetq min-hz 40 max-hz 8000)

;; Set range in steps (MIDI note numbers)
(psetq minstep (hz-to-step min-hz)
       maxstep (hz-to-step max-hz))

;;; Apply YIN to first DUR seconds
(defun getyin (sig dur)
  (let ((srate (min *sound-srate* (* 8 max-hz))))
    (if (< srate *sound-srate*)
        (progn
          (setf sig
            (if (arrayp sig)
                (sum
                  (extract-abs 0 dur (force-srate srate (aref sig 0)))
                  (extract-abs 0 dur (force-srate srate (aref sig 1))))
                (extract-abs 0 dur (force-srate srate sig))))
          (setq srate (snd-srate sig)))
        (setf sig
          (if (arrayp sig)
              (sum
                (extract-abs 0 dur (aref sig 0))
                (extract-abs 0 dur (aref sig 1)))
              (extract-abs 0 dur sig))))
    (let ((stepsize (truncate  (/ (* 4 srate) min-hz))))
      (yin sig minstep maxstep stepsize))))

;;; Find most confident frequency
(defun bestguess (yin-out)
  (do ((step (snd-fetch (aref yin-out 0))(snd-fetch (aref yin-out 0)))
       (conf (snd-fetch (aref yin-out 1))(snd-fetch (aref yin-out 1))))
      ((not step))
    (when (and (= conf conf)  ; protect against nan
               (< conf confidence))
      (setq confidence conf)
      (setq f0 step)))
  f0)

(defun generate-frequency-table ()
  (format t "~a\t\t~a\t~a~%" "time" "frequency" "RMS")
  (do ((ar (snd-fetch-array sndcopy window-length window-length)
        (snd-fetch-array sndcopy window-length window-length)))
      ((not ar) "Results in debug log")
    (setf audio-block (snd-from-array 0 *sound-srate* ar))
    (format t "~a\t~a\t\t~a~%"
      time
      (step-to-hz (bestguess (getyin audio-block 0.1)))
      (snd-fetch (rms audio-block)))
    (setq time (+ time (/ window-length *sound-srate*)))))

(generate-frequency-table)
Here's a table of the results with one extra column on the right that shows the frequency estimations individually calculated:

Code: Select all

time    frequency   RMS         calculated separately
0.000   219.948     0.458       219.95
0.500   219.948     0.535       217.66
1.000   219.948     0.518       215.38
1.500   219.948     0.437       213.01
2.001   219.948     0.363       210.68
2.501   208.306     0.287       208.26
3.001   208.306     0.196       205.83
3.501   208.306     0.116       203.45
4.001   208.306     0.053       200.95
4.501   208.306     0.013       198.45
The RMS value is dropping each time through the loop -- so the audio being processed each time through the loop appears to correctly represent 0.5s slices of the original sound.

I suspect some subtle problem in the functions getyin or bestguess ... ??
You do not have the required permissions to view the files attached to this post.

stepheneb
Posts: 16
Joined: Wed Jan 17, 2018 3:51 am
Operating System: OS X 10.11 El Capitan or later (macOS)

Re: how to run YIN starting at sound start + 0.25s?

Post by stepheneb » Thu Jan 25, 2018 11:09 pm

Fixed.

The problem was the global variable confidence used to represent the best confidence value in the function bestguess needs to be reset to 1 every time bestguess is called.

Here's the updated bestguess function:

Code: Select all

;;; Find most confident frequency
(defun bestguess (yin-out)
  (setq confidence 1)
  (do ((step (snd-fetch (aref yin-out 0))(snd-fetch (aref yin-out 0)))
       (conf (snd-fetch (aref yin-out 1))(snd-fetch (aref yin-out 1))))
      ((not step))
    (when (and (= conf conf)  ; protect against nan
               (< conf confidence))
      (setq confidence conf)
      (setq f0 step)))
  f0)

steve
Site Admin
Posts: 47940
Joined: Sat Dec 01, 2007 11:43 am
Operating System: Linux *buntu

Re: how to run YIN starting at sound start + 0.25s?

Post by steve » Thu Jan 25, 2018 11:17 pm

You beat me to it. I was about to give you a cryptic clue: "you're getting too confident" :D

You have discovered why it is so often said that "globals are bad / evil" (not that they are - you just have to be very careful when and where you use them).
9/10 questions are answered in the FREQUENTLY ASKED QUESTIONS (FAQ)