increasing accuracy of Silence Marker

SilenceMarker1.2

;Reducing the samples per second should improve the performance and decrease
;the accuracy of the labels. Increasing the samples per second will do the
;opposite. The more samples checked, the longer it takes. The more samples
;checked, the more precisely the program can place the silence labels.
;my-srate-ratio determines the number of samples in my-s. Set the number after (snd-srate s)
;higher to increase the number of samples.

(defun my-s (s-in)
 (setq my-srate-ratio (truncate (/ (snd-srate (mono-s s-in))(/ *sound-srate* 100))))
 (snd-avg (mono-s s-in) my-srate-ratio my-srate-ratio OP-PEAK)
)



Set the number after (snd-srate s) higher to increase the number of samples.

This plug-in is very fast - on my machine, on the order of four seconds per hour of audio at 44100. If it were more accurate and took twice as long I would be satisfied with the time consumption. How do I “increase the number of samples”?

You and I have both customized this so I have attached the entire plug-in source.
SilenceMarker1.2.zip (2.25 KB)

There’s quite a lot that should be updated in this plugin, so I’d not suggest spending too much time on it (unless you want to rewrite it from scratch).

For personal use, just try different values for “my-srate-ratio”.

Try changing the line:

 (setq my-srate-ratio (truncate (/ (snd-srate (mono-s s-in))(/ *sound-srate* 100))))

I may rewrite it in C++ as trying to read the lisp is giving me headaches. How should I change that line?

 (setq my-srate-ratio XXX)

If so, what are reasonable minimum and maximum values of XXX? Otherwise, should I be changing some of those variables (I tried changing the “100”: 500, 3000, 30,000 - none of these values had any effect on execution time nor accuracy.

Try this in the Nyquist prompt.
Note: if you’re using 2.1.0 alpha, enable “legacy” mode.
The first three lines are for user input. If you want this as a plugin, make those into “controls”.

(setf threshold -20)  ;dB
(setf min-length 2.0) ;seconds
(setf label-text "Silence")  ;the label text

;; --------- End of user input -------------

(setf threshold (db-to-linear threshold))
(setf minlen (* min-length *sound-srate*))

(if (arrayp s)
    (setf s (s-max (s-abs (aref s 0))
                   (s-abs (aref s 1)))))

(do ((val (snd-fetch s)(snd-fetch s))
     (labels ())
     (count 0 (1+ count))
     (start 0)
     (silcount 0))
    ((not val) (if (> (length labels) 0)
                   labels
                   "No silence found."))
  (cond
    ((< (abs val) threshold)
      (if (= silcount 0)
          (setf start count))
      (incf silcount))
    ((> silcount minlen)
      (push
        (list (/ start *sound-srate*)
              (/ count *sound-srate*)
              label-text)
        labels)
      (setf silcount 0))
    (T (setf silcount 0))))

This is rather slow on my machine, taking about 1:40 for a 10:00 mono track, but it’s accurate.

Very strange - Audacity 2.0.5 (did not know how to use “legacy” mode), I tried this on a '70s era Aerosmith vinyl; there was at least three seconds of almost-silence between tracks. The result was no labels created and a mess of the audio:
Aero.png

In Audacity 2.1.0 alpha, there is a checkbox near the bottom of the Nyquist Prompt effect: “Use legacy (version 3) syntax”.
To run “version 1, 2 or 3” code in the Nyquist Prompt effect, that checkbox must be selected.

The “mess” with the audio is that it has been “rectified” (sample value converted to absolute values, so negative value samples become positive).
Are you sure that you copied the entire code (33 lines)?
If you did, please try it with the Debug button and post any debug info that there may be.

Oops, hang on a second - there seems to be a problem. The code on the forum is not the same as the code that I posted :open_mouth: :confused:

Well I’ve not seen that happen before.
The code on the forum had an extra line break :astonished:
I’ve reposted it, and I think it’s OK now.

Very peculiar, but here you can see the difference between my original code, and what I copied from the post (scroll down to the bottom of the image):
error.gif
Anyway, I’ve checked the reposted code and it’s working.

Way too slow - three minutes on a 39 minute stereo track. Also, not very useful to me; I tried your default settings then I used these settings:

(setf threshold -26)  ;dB
(setf min-length 1.0) ;seconds
(setf label-text "")  ;the label text

which gave me way too many labels:
Smith.png
I want point labels centered in the track. I also tried:

(setf threshold -36)  ;dB
(setf min-length 2.5) ;seconds
(setf label-text "")  ;the label text

which eliminated all the false positives but missed about one third of the real track gaps.

Don’t spend any more time on it for me, I have got the C++ code well in hand.

That’s why we don’t do it like that - but it is extremely “accurate”.

Not the same as “accurate”. To get good gap detection on real world recording, you will need noise filtering prior to the threshold detection. A simple form is bandpass filtering to reduce rumble, hiss, clicks and crackle. That will throw out any measurements that you may have made of the noise level, so you will need to compensate with your settings. You can be quite aggressive with the filtering because you are not actually going to hear the filtered sound - it’s only for the detector.

OK, so I make a copy of the audio, apply aggressive filtering (as long as it never changes the length of the audio - maybe even caching absolute peak value), do something like normalizing, figure out the timestamps for my point labels, throw out the copy of the audio then create my label track immediately under the current audio track. I will need to figure out how to apply aggressive filtering in C++…

Thanks for all the help Steve!

Give this one a go. It’ll be a lot faster than the previous one, and it includes filtering:

(setf threshold -30)  ;dB
(setf min-length 1.0) ;seconds
(setf label-text "Silence")  ;the label text

;; --------- End of user input -------------

(setf step 100) ;tweak this for performance
(setf threshold (db-to-linear threshold))



(setf s (highpass2 s 150))

(setf s 
  (if (arrayp s)
      (s-max (s-abs (aref s 0))
                     (s-abs (aref s 1)))
      (s-abs s)))

(setf s
  (mult 2
    (snd-avg s step step op-average)))

(let* ((sr (snd-srate s))
       (minlen (* min-length sr))
       (labels ())
       (start 0)
       (silcount 0))
  (do ((val (snd-fetch s)(snd-fetch s))
       (count 0 (1+ count)))
      ((not val) (if (> (length labels) 0)
                     labels
                     "No silence found."))
    (cond
      ((< val threshold)
        (if (= silcount 0)
            (setf start count))
        (incf silcount))
      ((> silcount minlen)
        (push
          (list (/ (+ start count) (* 2 sr)) label-text)
          labels)
        (setf silcount 0))
      (T (setf silcount 0)))))

Almost perfect! I bumped the minimum length up to 2.5 (my experience has been that vinyl pressed before 1970-ish has gaps of about 5-6 seconds and thereafter a very uniform three seconds). The original effect (processing a 39 minute A+B sides stereo track) took three seconds, this one took 4.5 seconds. The original threw a number of false positives and missed about a third of the gaps. This version threw no false positives and correctly identified each track gap.

The only problem with this is that it is putting the point label .5 seconds in front of the whole gap:
Aerosmith.png
Close inspection will show you that the track gap is 3.470 seconds, and the created point label is .512 seconds before the beginning of the track gap. I want the label to fall in the center of the track gap.

It looks to me like:

(setf s
  (mult 2
    (snd-avg s step step op-average)))

is averaging the left & right channels and then doubling that value. I would like to try just summing the two channels (which should be quicker and result in the same mathematical value). I would also like to try twice the value of the loudest channel but I have no idea how to express either of those in Nyquist. What I am looking for is to eliminate what few false positives I have seen. They seemed predominantly to be when one channel is silent and the other channel has a musician “sneaking in”.

No, the code snippet works only with a single channel.
Step refers to samples, i.e. step samples are averaged and then the buffer advances step samples to average those again.
In other words, the chunk size (first step) is equal to the hop size (second step in ‘snd-avg’.
The resulting sample rate is the track’s sample rate divided by (2nd) step.

You can get the per-sample maximum of the two channels like this:

(setf s (s-max (s-abs (aref s 0)) (s-abs(aref s 1))))

Does this help?

Not quite what I need, I either need the sum of the individual channel absolute values or twice the maximum channel’s absolute value. Given:

T is the target - anything at or below this is "silence"
R = AbsoluteValue (channel 0 sample)
L = AbsoluteValue (channel 1 sample)
B = R + L
M = 2 * (the maximum of R and L - whichever is greater)

I want to try:

If B <= T Then the sample is silence

and I also want to try:

If M <= T Then the sample is silence

and I might even try:

If (B <= T) Or (M <= T) Then the sample is silence

I have attached the code that I am currently using as a Nyquist effect file (note that it has no GUI). In Steve’s original implementation the label was always placed before the actual region of silence so I must pad the label location. Unfortunately, I do not know how to calculate the center of the silence and place the label there (which is what I want). The current padding is almost good enough when tested on “modern” (created since 1970) vinyl which has fairly uniform 2.8-3.0 second track gaps. I processed all 23 of the recordings I had in the can - about 400 tracks - with about 5 false positives, about 10 missed gaps and all other labels falling within the track gap. The missed gaps were invariably caused by “clicks” within the track gap - I think that even a very rudimentary click repair would probably suppress these enough to eliminate all of these missed gaps but I have done no experimentation.
AlbumGapMarker.ny (1.38 KB)

As far as I can see, the code works correctly and takes the maximum value of the two channels.
I’m not sure why you want to multiply by 2. You’re aware that the -30 dB threshold is afterwards -26 dB?
Instead of the snd-avg function you can take the rms function instead.

(rms s >target-samplerate<)

where target samplerate means the track’s samplerate divided by the step size:

(rms s (/ *sound-srate* step) step)

Again, the secondary step is the hop size and is optional. Use it only if you want a smaller value–that smoothens the curve while preserving the accuracy.
Also, there’s no lowpassfilter applied yet, this would of course reduce spurious clicks as well.

I gather you want to apply it in a chain, don’t you?
Otherwise, it would be easier to simply tell the program how many tracks there are on the LP and to let it search for those x longest gaps.
Another improvement is to use the zero crossing rate as well.
This is e.g. done in GSM standards to detect voice activity.
For instance:
A song might fade out with a droning bass tone, the amplitude might soon be under the threshold but the gap will actually be longer than it should. Placing the point label in the center could result in splitting the fading tone.
The zero crossing rate for the bass tone is pretty low while it is quite high for silence (or rather white or coloured noise).
The gap would be calculated from the intersection of the amplitude below the threshold and a ZCR above a secondary threshold (ee.g. 6000 Hz).
As for the center:
the last cond expression searches for pauses that are longer than minlen.
However, once this len is reached, the counting begins again. Thus the center would always be half of the minlen.

the following code counts til the end:

    (cond
      ((> val threshold)
        (when  (and (/= silcount 0) (> silcount minlen))
            (setf start (- count (/ silcount 2)))
            (push
              (list (* (recip sr) start) label-text)
              labels)
        (setf  silcount 0)))
      ((<= val threshold)
        (incf silcount)))))

I would personally push the point nearer to the next song since fade-outs are much more common than fade-ins.
Again, the Zcr could be helpful for those overlong gaps.

There seems to be some confusion about the code that I posted, so here is a heavily commented version that describes what it does:

(setf threshold -30)  ;dB
(setf min-length 1.0) ;seconds
(setf label-text "Silence")  ;the label text

;; --------- End of user input -------------

;;"step" is the number of samples taken when
;;we are 'averaging' the sound in (snd-avg)
(setf step 100) ;tweak this for performance

;; convert "threshold" from db to linear.
(setf threshold (db-to-linear threshold))


;; apply second order butterworth highpass filter
;; corner frequency = 150 Hz.
(setf s (highpass2 s 150))

;; If the sound is stereo (an array), take the maximum
;; absolute sample value of the two channels.
;; Thus, if one channel has a sample value of -0.8 and
;; the other channel has a sample value of +0.5, the resulting
;; sample will be +0.8 [maximum of the absolute values].
;; For mono sounds, "S" becomes the absolute value of each
;; sample, thus a sample value of -0.5 becomes +0.5
(setf s 
  (if (arrayp s)
      (s-max (s-abs (aref s 0))
                     (s-abs (aref s 1)))
      (s-abs s)))


;; find the average of "step" number of samples [100], then
;; step forward by "step" number of samples to the next 
;; "step" number of samples. Thus, each 100 sample produce 1
;; sample that is an average of 100 sequential samples.
;; The average value will obviously be a lot lower than the peak
;; value, so it is then multiplied by 2 (+6 dB) as a ballpark
;; figure to make the output sound have roughly the same amplitude
;; as the input sound.
;; Taking the average value will substantially reduce the effect
;; of snap crackle and pop. [in effect, a lowpass filter].
(setf s
  (mult 2
    (snd-avg s step step op-average)))

;; initialise some variable:
(let* ((sr (snd-srate s)) ;the sample rate
       (minlen (* min-length sr)) ;the "minimum length" in samples
       (labels ()) ;an empty list for our labels
       (start 0) ;the start time [in samples] of the label
       (silcount 0)) ;the number of sequential "silent" samples
  ;; The main "while" loop.
  (do ((val (snd-fetch s)(snd-fetch s)) ; fetch the next sample
       (count 0 (1+ count))) ;increment "count"
      ; continue until no more samples [val=NIL]
      ; and return the labels, or an error.
      ((not val) (if (> (length labels) 0)
                     labels
                     "No silence found."))
    (cond
      ((< val threshold) ;value is below threshold
        (if (= silcount 0) ;the first "silent" sample
            (setf start count)) ;set the label start to the current sample number
        (incf silcount)) ;increment the count of sequential silent samples
      ;; OR [so we must be above the threshold]
      ((> silcount minlen) ;the count of silent samples is greater than "minlen"
        (push ;add a label to the list "labels"
          (list (/ (+ start count) (* 2 sr)) label-text)
          labels)
        (setf silcount 0)) ;reset "silcount" to zero
      ;; OR [so we are above the threshold, but 
      ;; we don't have a long enough series of silent samples
      (T (setf silcount 0))))) ;ensure that the silent sample counter is zero.