Understanding yin

The next thing I’m trying out is yin. The following seems to work to estimate the fundamental frequency of something a few cycles in length. But I am not sure of the best inputs for yin. I gave it the bottom and top notes of the piano, but what’s best for the last argument? Is the last argument a number of samples or do I misunderstand?

How few cycles can I select and still get good answers?

I want to develop it into a tool that will label a whole number of cycles of a spoken vowel. Ever looked at the waveform of “ah?” You can have a dozen zero crossings per cycle! If I select a piece of a vowel and hit Z, then, it doesn’t always give me a whole number of cycles. Hit shift-space, and you don’t always hear a smooth sound or hear the same note as was spoken.

This is why I’ve wished for Nyquist plug-ins that could make Audacity simply adjust the selection boundaries for you. Making labels that you have to delete later is not quite so convenient.

;nyquist plug-in
;version 1
;type analyze
;categories "http://lv2plug.in/ns/lv2core#AnalyserPlugin"
;name "Label Cycles..."
;action "Labelling..."

;control debug-me "Debug Me" int "" 10 1 10000

(setq twelfth-root-2 (expt 2.0 (/ 12.0)))
(setq note-0 (/ 440 (expt twelfth-root-2 69)))

;;; takes the output of yin, returns Hz or nil
(defun find-best-frequency (y)
  (let* ((notes (aref y 0))
	 (confidences (aref y 1))
	 best-confidence
	 best-note)
    (do ((note (snd-fetch notes) (snd-fetch notes))
	 (confidence (snd-fetch confidences) (snd-fetch confidences)))
	((or (not note) (not confidence)))
      ;; beware the IND values sometime in confidences,
      ;;plusp should screen them out
      (if (plusp confidence)
	  (if (or (not best-confidence) (< confidence best-confidence))
	      (setq best-confidence confidence
		    best-note note))))
    (and best-note
	 ;; convert to Hz
	 (* note-0 (expt twelfth-root-2 best-note)))))

(defun label-cycles (snd)
  (let* ((len (snd-length snd ny:all))
	 (y (yin snd 21 108 (/ len debug-me)))
	 (freq (find-best-frequency y)))
    freq))

(label-cycles s)

It depends on how periodic the waveform. For highly periodic waveforms, 10 or so cycles will probably be enough. For waveforms that are less periodic you are likely to need more.

The step size is in samples. You may need to experiment for optimum results, but I found that about 4x the lowest frequency worked well.

Does the rest of my math look good?

Here is a fairly final draft. It seems to work to pick zero crossings selectively so I get cycles of my voiced speech sounds. I also developed my technique of zero-crossing-finding using an array obtained from a sound made by snd-inverse.

;nyquist plug-in
;version 1
;type analyze
;categories "http://lv2plug.in/ns/lv2core#AnalyserPlugin"
;name "Label Cycles..."
;action "Labelling..."

;control debug-me "Debug Me" int "" 10 1 10000

(defun make-zero-crossing-finder (snd)
  (let* ((srate (snd-srate snd))
	 (blip (snd-from-array 0 srate (vector srate)))
	 (blips (trigger snd blip))
	 (count-of-crossings-snd (integrate blips))
	 (inv-snd (snd-inverse count-of-crossings-snd 0 1))
	 (inv-array (snd-fetch-array inv-snd (snd-length inv-snd ny:all) 1))
	 (inv-array-len (length inv-array))
	 (count-of-crossings-array
	  (snd-fetch-array (snd-copy count-of-crossings-snd)
			   (snd-length count-of-crossings-snd ny:all) 1))
	 (count-of-crossings-len (length count-of-crossings-array)))
    ;; global time
    ;; find the nearest crossing to either side
    ;; or return nil if out of bounds
    #'(lambda (time)
	(let ((index (truncate (* time srate))))
	  (and (>= index 0)
	       (< index count-of-crossings-len)
	       (let ((num (truncate
			   (aref count-of-crossings-array index))))
		 (and (>= num 0)
		      (< num inv-array-len)
		      (let*
			  ((t2 (aref inv-array num))
			   (t1 (if (plusp num)
				   (aref inv-array (1- num))
				   t2))
			   (diff2 (abs (- time t2)))
			   (diff1 (abs (- time t1))))
			(if (< diff1 diff2) t1 t2)))))))))

;;; takes the output of yin, returns Hz or nil
(defun find-best-frequency (y)
  (let* ((notes (aref y 0))
	 (confidences (aref y 1))
	 best-confidence
	 best-note)
    (do ((note (snd-fetch notes) (snd-fetch notes))
	 (confidence (snd-fetch confidences) (snd-fetch confidences)))
	((or (not note) (not confidence)))
      ;; beware the IND values sometimes in confidences,
      ;;plusp should screen them out
      (if (plusp confidence)
	  (if (or (not best-confidence) (< confidence best-confidence))
	      (setq best-confidence confidence
		    best-note note))))
    (and best-note (step-to-hz best-note))))

(defun label-cycles (snd)
  (let* ((srate (snd-srate snd))
	 (len (snd-length snd ny:all))
	 (seconds (/ len srate))
	 (y (yin snd 21 108 (/ len debug-me)))
	 (freq (find-best-frequency y))
	 (text ""))
    ;;(text (format nil "~A Hz" freq)))
    (if (not freq)
	"Can't find the fundamental"
	(let*
	    ((period (/ freq))
	     (finder (make-zero-crossing-finder snd))
	     (prev-zero (funcall finder 0.0))
	     results)
	  (if (numberp prev-zero)
	      (do* ((next-zero (funcall finder (+ prev-zero period))
			       (funcall finder (+ prev-zero period))))
		  ((not (and (numberp next-zero) (< prev-zero next-zero))))
		(setq results (cons (list prev-zero next-zero text) results)
		      prev-zero next-zero)))
	  ;; what to do at the right boundary, if it rises to zero and
	  ;; does not cross?  Make one more label?
	  (if (and (numberp prev-zero)
		   ;; improve this criterion for good-enough?
		   (> (- seconds prev-zero) (* .9 period)))
	     (setq results (cons (list prev-zero seconds text) results)))
	  results))))

(or
 (label-cycles s)
 "Can't find zeroes")

Seems OK to me.

What I learned today: You can feed yin one of your vowels, but don’t give it a sibilant! Even with a stepsize parameter that works well with a vowel of similar length, yin given a sibilant starts spinning the CPU and eating memory. I walked away for a minute and came back and saw “Nyquist did not return audio.”

A caution for me if my crude “speech recognition” goes anywhere. Be sure I can identify the noisier speech segments by other means first.

Did you mean “INF” (rather than “IND”)?
If so, then you need to be careful with “plusp” because it only catches -inf and not +inf
If you get the same results in Windows (below) then checking for equality looks like the best bet.

(setq test (linear-to-db 0))
(print test) ; returns -inf on Linux

(if (numberp test) ; returns "Is number" 
  (print "Is number")
  (print "Is not number"))

(if (= test test) ; returns "Is not number"
  (print "Is number")
  (print "Is not number"))

(if (plusp test) ; returns "Is not number"
  (print "Is number")
  (print "Is not number"))

(if (minusp test) ; returns "Is number"
  (print "Is number")
  (print "Is not number"))



(setq test (- (linear-to-db 0)))
(print test) ; returns inf on Linux

(if (numberp test) ; returns "Is number" 
  (print "Is number")
  (print "Is not number"))

(if (= test test) ; returns "Is not number"
  (print "Is number")
  (print "Is not number"))

(if (plusp test) ; returns "Is number"
  (print "Is number")
  (print "Is not number"))

(if (minusp test) ; returns "Is not number"
  (print "Is number")
  (print "Is not number"))

I do mean #-1.IND which I did see when I experimented with yin in the Nyquist prompt. I don’t know what IND means as opposed to INF but plusp does give false for it.

Besides which, if yin ever really does report infinity as a confidence value, then there might be no harm in treating that as a number greater than all finite numbers. I want to use the frequency for the smallest confidence that is a number or an infinity.

Google is a wonderful thing :smiley:
#-1.IND is how Windows represents a NAN (not a number)
A useful property of a NAN is that it does not have equality with anything, even itself, so (/= x x) is a good way to test.

The best way is to eliminate the -#ind values right in the Sound itself.

;; create sound with all undesired values + -1 0 1 
(setf sig (s-log (snd-from-array 0 1 (vector 
  -1 0 (- (linear-to-db 0))(/ 2.718281828)1 2.718281828 ))))
(snd-display sig) (terpri)
;; set positive infinity + -#ind to 2
 (setf sig2 (s-min 2 sig))
(snd-display sig2)(terpri)
;; set negative infinity + -#ind to -2
 (setf sig3 (s-max  -2  sig))
(snd-display sig3) (terpri)
;; set positive NANs to 3, negative infinity to -2
(snd-display (s-max -2 (s-min 3 sig)))

For the values in YIN which return the confidence it is enough to use ‘(s-min 1 (aref yin-result 1))’ since all values are positive and 1 means non-periodic.

I don’t understand the usefulness of that. My sound contained no illegal values. It was just a recording of voice. Yin returns two arrays in which some of the frequencies are legal values but far from the correct answer, and corresponding confidence values for them were non-numbers.

Yin Returns an Array of sounds, not two Arrays.
My method sets all illegal values to a confidence of 1 which means non-periodic.

I misspoke, technically, but of course the “sounds” from yin are disguised sequences of numbers, practically arrays in other form, and that’s how I think of them. They are not “sounds” in the commonplace sense of the word.

I understand you now. This method does prevent eliminate bad results coming out of yin, it only filters them out. This is not a thing to apply to the inputs.

but they are in the technical sense of the word, in that they have a sample rate, start time, stop time and logical-stop.

… and you can Profit from there structure as sounds.
You can use them as envelopes and control-signals.
You can use the first one as cut-off frequency in any variable filter (lp, hp, eq-band etc.) and the second one to attenuate noisy parts of a Signal.
Furthermore, Sounds use 4 Bytes per sample whereas Arrays use 14.