Looking for a programmer - details inside (Beat counting)

So - when opening an audio file in audacity (or any other software which allows spectrogram),
it instantly converts the sound into the 2D graph

Now let’s say the background audio is mostly silent
And the only sound that is being heard from time to time is DRUM BEATS, so every time a drum beat is heard, it is CLEARLY nicely visible in the spectrogram.
It just fills it up in lots of “frequency ranges” and then drops off back into silence.

My question is:
Would there be a way to code something that could tell apart WHEN each of those sounds starts?
and then - be able to COUNT it?

so for example - silence - DRUM BEAT - the millisecond that the spectrogram shows a drumbeat-like pattern - then BOOM, recognize it, say: in this exact millisecond, a drumbeat was heard.
count as one.

so the goal of the first question is:
would it be possible to create a program that could COUNT the number of drumbeats heard in an audio file?
(i realize this may be possible without needing spectrogram but I think it’s a much better way to visualize it, and might be easier)

Second question:

if a drumbeat is heard, the sound “lingers on” for a moment before fading into silence
if there were to be a SECOND drumbeat during the “drop-off” period, then would it be possible to teach the program to recognize it?
what if both sounds were super close together? 300ms apart? 200ms apart? 100ms apart? 50ms apart?

what are the limitations that one would face when trying to code something like this, if even possible?

apologies if I didn’t explain it in the best way possible, but I really am curious if I could get something like this done
hopefully someone can enlighten me on it, so I could hire a programmer to get it done!

It’s much easier to work programmatically with amplitudes rather than frequency spectra.
If the background sound is all low level, then the drum beats will be clearly visible in the Waveform view. These can be detected quite easily, for example with the “Beat Finder” tool https://manual.audacityteam.org/man/beat_finder.html

“Beat Finder” is a Nyquist plug-in (https://manual.audacityteam.org/man/nyquist.html). These kind of plug-ins are relatively easy to write and / or modify.

I’m not for hire, but if you are interested in learning how to program such an effect in Nyquist, I’d be happy to help you.

wow - i’m surprised that “beat finder” is a thing
I’ve also noticed “sound finder” - seems to have more options in the GUI at least, but not sure which one would be better

I’ll be reading all I can on both to see which one would work best (Trying various “thresholds” with beat finder - some work pretty well, but fail in certain parts)

regarding the amplitude vs frequency spectra bit:
I guess that’s exactly what beat finder is using, but by using low/high filters I technically could focus on the sounds that interest me

because the problem is - that I have simplified it quite a lot in my original post, because I didn’t know if it’d be possible in the first place
so the background audio is not that silent (but thinking about it - it wouldn’t affect it this much)

I’m glad that you’d be willing to help me if I wanted to program something like this through Nyquist (by the sound of it - it looks like it’s EXACTLY the thing that i need, I thought this was more in realm of neural networks!)

but two questions that I would want answered for me to get a full grasp on this:

  1. would it be possible to tell the beat/sound finder to look for a specific type of sound? example:
    I select one “beat” - I tell my program: THIS is the sample that you’ll have to look for
    so that it can try to see all the sounds which would be closest to it?
    example: drumbeat, drumbeat, drumbeat, flute note, drumbeat, drumbeat - would label all the 4 correctly, but not count the flute as one

  2. this will definitely be the most complicated question
    if I have a specific drumbeat sound, EACH lasting around 30 milliseconds

and then I were to have an INRCEDIBLY rapid succession of them in a short span
(example: 24 of those exact beats, in the span of 1 second)

would it be possible, through specific settings, to get them all labeled correctly?
would changing the tempo, for example, make it easier for my program to label each of the 24 ones correctly?

thank you so much for pointing me to the right direction :slight_smile:

Drum sounds are typically short, and flute sounds are typically long.
Looking at the amplitude of the sound, you could look to see where the amplitude exceeds a specified threshold, and then exclude places where the sound remains above the threshold for more than a specified amount of time. This simple algorithm would allow you to exclude flute sounds, provided that they are not very short flute sounds, but it would also exclude drum sounds that occur during a flute sound.

As you suggested, filtering can be helpful.

Analysing the spectrum is a better approach in a lot of ways, but it is much more difficult to program.

Yes, Beat Finder looks at the amplitude. If I recall correctly, it also uses a low pass filter so that it is more sensitive to low frequency thumps (such as a bass drum).

In an ideal test case, I’d guess that it would be possible. However, with 24 beats, 30ms duration, the gaps between the beats would be around 12 ms. That’s about the same duration as the distance between peaks of a 43 Hz tone. Given that drum beats tend to have a sharp attack followed by a rapid decay, the beats would be easier to detect if you are only looking at the tips of the peaks, but then you will miss quiet drum beats.

“Tuning” the algorithm to pick up what you want while rejecting false positives can be tricky. The more complex the sound, the more tricky this becomes.

Yes it does. Here’s the Nyquist code: https://github.com/audacity/audacity/blob/master/plug-ins/beat.ny

Beat Finder uses the SND-FOLLOW function to track the amplitude of the waveform. I’d suggest using S-AVG rather than SND-FOLLOW.

I’d also recommend developing the effect with mono tracks only. Handling stereo complicates things a bit, and can be left 'till later if required.

To handle rapid sequences of beats, you will need a granularity of at least 10 ms. For a sample rate of 44100, that’s 441 samples. Finer granularity will improve precision, but will increase processing time. If too fine, the amplitude tracker will tend to follow individual cycles in the waveform rather than the beat as a whole, so probably best to compromise somewhere between 1ms and 10ms.

This code looks at the waveform in blocks of 100 hundred samples (about = 2.27ms for a sample rate of 44100). For each 100 samples, it returns one sample equivalent to the absolute peak within those 100 samples.

(s-avg *track* 100 100 op-peak)

When that code is applied to this track:

First Track000.png
the result is a much shorter track (each 100 samples has been replaced by one sample), that follows the peak amplitude:

First Track001.png
If we then look for each point where the waveform exceeds 0.1 vertically, that will catch the start of most of the beats.

This code prints the times at which the waveform exceeds a threshold of 0.1:

;debugflags trace

(setf threshold 0.1)

(let ((sig (s-avg *track* 100 100 op-peak))
      (silence-flag t)
  ; get the exact sample rate of 'sig'.
  (setf srate (snd-srate sig))
  ;; Loop through the samples in 'sig'.
  (do ((val (snd-fetch sig) (snd-fetch sig))
       (count 0 (+ count 1)))
      ; until snd-fetch returns NIL (no more samples).
      ; finally, return the number of samples tested.
      ((not val) (format nil "Tested ~a samples" count))
    ;; When not inside a beat, and current sample is above
    ;; the threshold...
    (when (and silence-flag (> val threshold))
      ; Convert sample count to time in seconds, and print.
      (print (/ count srate)))
    ; Set silence-flag to t ('true') when sample is below
    ; threhold, else nil ('false')
    (setf silence-flag (< val threshold))))

The first line (“;debugflags trace”) tells Audacity to show the debug window so that we can see a list of times.
All other lines that begin with a semicolon are “comments” (ignored by Nyquist - just there to help understand the code).
All of the functions used can be looked up in the Nyquist manual: https://www.cs.cmu.edu/~rbd/doc/nyquist/indx.html

and here’s a modified version that returns a list of labels. A label is created each time the level exceeds the threshold:

(setf threshold 0.1)

(let ((sig (s-avg *track* 100 100 op-peak))
      (silence-flag t)
      (labels ())
  (setf srate (snd-srate sig))
  (do ((val (snd-fetch sig) (snd-fetch sig))
       (count 0 (+ count 1)))
      ((not val) labels)
    (when (and silence-flag (> val threshold))
      (push (list (/ count srate) "") labels))
    (setf silence-flag (< val threshold))))

As you see here, there are quite a lot of false positives, so the algorithm will need tweaking for accurate results:


steve you’re simply out of this world, I wish I could give you a rundown on everything you’ve done so far but it will require me some time to start testing out, but I’ve re-read this a few times and I am SHOCKED. pure disbelief.

the “peak” algorithm alone, checking each X ms range is… a LEAP forward by itself.
it will simplify what I had in mind by a lot

I’ll start to be doing lots of testing, with various thresholds / millisecond ranges to see if I can make some progress with what I had in mind, I’ll be sure to update as soon as possible, no idea on how to thank you for this!