;Reducing the samples per second should improve the performance and decrease
;the accuracy of the labels. Increasing the samples per second will do the
;opposite. The more samples checked, the longer it takes. The more samples
;checked, the more precisely the program can place the silence labels.
;my-srate-ratio determines the number of samples in my-s. Set the number after (snd-srate s)
;higher to increase the number of samples.
(defun my-s (s-in)
(setq my-srate-ratio (truncate (/ (snd-srate (mono-s s-in))(/ *sound-srate* 100))))
(snd-avg (mono-s s-in) my-srate-ratio my-srate-ratio OP-PEAK)
Set the number after (snd-srate s) higher to increase the number of samples.
This plug-in is very fast - on my machine, on the order of four seconds per hour of audio at 44100. If it were more accurate and took twice as long I would be satisfied with the time consumption. How do I “increase the number of samples”?
You and I have both customized this so I have attached the entire plug-in source. SilenceMarker1.2.zip (2.25 KB)
I may rewrite it in C++ as trying to read the lisp is giving me headaches. How should I change that line?
(setq my-srate-ratio XXX)
If so, what are reasonable minimum and maximum values of XXX? Otherwise, should I be changing some of those variables (I tried changing the “100”: 500, 3000, 30,000 - none of these values had any effect on execution time nor accuracy.
Very strange - Audacity 2.0.5 (did not know how to use “legacy” mode), I tried this on a '70s era Aerosmith vinyl; there was at least three seconds of almost-silence between tracks. The result was no labels created and a mess of the audio:
In Audacity 2.1.0 alpha, there is a checkbox near the bottom of the Nyquist Prompt effect: “Use legacy (version 3) syntax”.
To run “version 1, 2 or 3” code in the Nyquist Prompt effect, that checkbox must be selected.
The “mess” with the audio is that it has been “rectified” (sample value converted to absolute values, so negative value samples become positive).
Are you sure that you copied the entire code (33 lines)?
If you did, please try it with the Debug button and post any debug info that there may be.
That’s why we don’t do it like that - but it is extremely “accurate”.
Not the same as “accurate”. To get good gap detection on real world recording, you will need noise filtering prior to the threshold detection. A simple form is bandpass filtering to reduce rumble, hiss, clicks and crackle. That will throw out any measurements that you may have made of the noise level, so you will need to compensate with your settings. You can be quite aggressive with the filtering because you are not actually going to hear the filtered sound - it’s only for the detector.
OK, so I make a copy of the audio, apply aggressive filtering (as long as it never changes the length of the audio - maybe even caching absolute peak value), do something like normalizing, figure out the timestamps for my point labels, throw out the copy of the audio then create my label track immediately under the current audio track. I will need to figure out how to apply aggressive filtering in C++…
Almost perfect! I bumped the minimum length up to 2.5 (my experience has been that vinyl pressed before 1970-ish has gaps of about 5-6 seconds and thereafter a very uniform three seconds). The original effect (processing a 39 minute A+B sides stereo track) took three seconds, this one took 4.5 seconds. The original threw a number of false positives and missed about a third of the gaps. This version threw no false positives and correctly identified each track gap.
The only problem with this is that it is putting the point label .5 seconds in front of the whole gap:
Close inspection will show you that the track gap is 3.470 seconds, and the created point label is .512 seconds before the beginning of the track gap. I want the label to fall in the center of the track gap.
(snd-avg s step step op-average)))
is averaging the left & right channels and then doubling that value. I would like to try just summing the two channels (which should be quicker and result in the same mathematical value). I would also like to try twice the value of the loudest channel but I have no idea how to express either of those in Nyquist. What I am looking for is to eliminate what few false positives I have seen. They seemed predominantly to be when one channel is silent and the other channel has a musician “sneaking in”.
No, the code snippet works only with a single channel.
Step refers to samples, i.e. step samples are averaged and then the buffer advances step samples to average those again.
In other words, the chunk size (first step) is equal to the hop size (second step in ‘snd-avg’.
The resulting sample rate is the track’s sample rate divided by (2nd) step.
You can get the per-sample maximum of the two channels like this:
(setf s (s-max (s-abs (aref s 0)) (s-abs(aref s 1))))
Not quite what I need, I either need the sum of the individual channel absolute values or twice the maximum channel’s absolute value. Given:
T is the target - anything at or below this is "silence"
R = AbsoluteValue (channel 0 sample)
L = AbsoluteValue (channel 1 sample)
B = R + L
M = 2 * (the maximum of R and L - whichever is greater)
I want to try:
If B <= T Then the sample is silence
and I also want to try:
If M <= T Then the sample is silence
and I might even try:
If (B <= T) Or (M <= T) Then the sample is silence
I have attached the code that I am currently using as a Nyquist effect file (note that it has no GUI). In Steve’s original implementation the label was always placed before the actual region of silence so I must pad the label location. Unfortunately, I do not know how to calculate the center of the silence and place the label there (which is what I want). The current padding is almost good enough when tested on “modern” (created since 1970) vinyl which has fairly uniform 2.8-3.0 second track gaps. I processed all 23 of the recordings I had in the can - about 400 tracks - with about 5 false positives, about 10 missed gaps and all other labels falling within the track gap. The missed gaps were invariably caused by “clicks” within the track gap - I think that even a very rudimentary click repair would probably suppress these enough to eliminate all of these missed gaps but I have done no experimentation. AlbumGapMarker.ny (1.38 KB)
As far as I can see, the code works correctly and takes the maximum value of the two channels.
I’m not sure why you want to multiply by 2. You’re aware that the -30 dB threshold is afterwards -26 dB?
Instead of the snd-avg function you can take the rms function instead.
(rms s >target-samplerate<)
where target samplerate means the track’s samplerate divided by the step size:
(rms s (/ *sound-srate* step) step)
Again, the secondary step is the hop size and is optional. Use it only if you want a smaller value–that smoothens the curve while preserving the accuracy.
Also, there’s no lowpassfilter applied yet, this would of course reduce spurious clicks as well.
I gather you want to apply it in a chain, don’t you?
Otherwise, it would be easier to simply tell the program how many tracks there are on the LP and to let it search for those x longest gaps.
Another improvement is to use the zero crossing rate as well.
This is e.g. done in GSM standards to detect voice activity.
A song might fade out with a droning bass tone, the amplitude might soon be under the threshold but the gap will actually be longer than it should. Placing the point label in the center could result in splitting the fading tone.
The zero crossing rate for the bass tone is pretty low while it is quite high for silence (or rather white or coloured noise).
The gap would be calculated from the intersection of the amplitude below the threshold and a ZCR above a secondary threshold (ee.g. 6000 Hz).
As for the center:
the last cond expression searches for pauses that are longer than minlen.
However, once this len is reached, the counting begins again. Thus the center would always be half of the minlen.
There seems to be some confusion about the code that I posted, so here is a heavily commented version that describes what it does:
(setf threshold -30) ;dB
(setf min-length 1.0) ;seconds
(setf label-text "Silence") ;the label text
;; --------- End of user input -------------
;;"step" is the number of samples taken when
;;we are 'averaging' the sound in (snd-avg)
(setf step 100) ;tweak this for performance
;; convert "threshold" from db to linear.
(setf threshold (db-to-linear threshold))
;; apply second order butterworth highpass filter
;; corner frequency = 150 Hz.
(setf s (highpass2 s 150))
;; If the sound is stereo (an array), take the maximum
;; absolute sample value of the two channels.
;; Thus, if one channel has a sample value of -0.8 and
;; the other channel has a sample value of +0.5, the resulting
;; sample will be +0.8 [maximum of the absolute values].
;; For mono sounds, "S" becomes the absolute value of each
;; sample, thus a sample value of -0.5 becomes +0.5
(if (arrayp s)
(s-max (s-abs (aref s 0))
(s-abs (aref s 1)))
;; find the average of "step" number of samples , then
;; step forward by "step" number of samples to the next
;; "step" number of samples. Thus, each 100 sample produce 1
;; sample that is an average of 100 sequential samples.
;; The average value will obviously be a lot lower than the peak
;; value, so it is then multiplied by 2 (+6 dB) as a ballpark
;; figure to make the output sound have roughly the same amplitude
;; as the input sound.
;; Taking the average value will substantially reduce the effect
;; of snap crackle and pop. [in effect, a lowpass filter].
(snd-avg s step step op-average)))
;; initialise some variable:
(let* ((sr (snd-srate s)) ;the sample rate
(minlen (* min-length sr)) ;the "minimum length" in samples
(labels ()) ;an empty list for our labels
(start 0) ;the start time [in samples] of the label
(silcount 0)) ;the number of sequential "silent" samples
;; The main "while" loop.
(do ((val (snd-fetch s)(snd-fetch s)) ; fetch the next sample
(count 0 (1+ count))) ;increment "count"
; continue until no more samples [val=NIL]
; and return the labels, or an error.
((not val) (if (> (length labels) 0)
"No silence found."))
((< val threshold) ;value is below threshold
(if (= silcount 0) ;the first "silent" sample
(setf start count)) ;set the label start to the current sample number
(incf silcount)) ;increment the count of sequential silent samples
;; OR [so we must be above the threshold]
((> silcount minlen) ;the count of silent samples is greater than "minlen"
(push ;add a label to the list "labels"
(list (/ (+ start count) (* 2 sr)) label-text)
(setf silcount 0)) ;reset "silcount" to zero
;; OR [so we are above the threshold, but
;; we don't have a long enough series of silent samples
(T (setf silcount 0))))) ;ensure that the silent sample counter is zero.