Onset detection algorithm (Beat tracking)

Hi everyone!

I am new to audio programming, well, I have never really done it before actually. I didn’t have much knowledge on audio processing as well. I only know the basics of audio programming (like playing loaded music file, pausing, stopping, etc) and I have only learnt the very basic of audio processing (like quantization and sampling).
Recently, I have a task to work on beat detection algorithm. I have asked Mr. Google and he had given me a lot of answers actually, but they are still a bit hard for me to understand. So far what I’ve understood is there are multiple steps needed to completely track the onset (beat) from a sound file.
There are terms like Fourier Transform, Comb Filtering, Smoothing, and more which I don’t understand.
If there are anyone who really understand the complete method of beat detection, I would appreciate if you can elaborate it for me. Step-by-step detail explanation would be great.
Thanks in advance =)

What exactly are you trying to program? In what application - Audacity? Your own application?


Also see:


Basically, I’m creating an application for Android, it’s something like rhythm game like Dance Dance Revolution. The user can select the song and the program can detect the beat of the music played. It doesn’t have to be very accurate anyway. I’ve browsed through many resources about onset detection but since I’m new to audio programming and audio technology, it’s quite confusing to me about which method is the most suitable for me.
I’m using Java for the application since it’s developed in Eclipse IDE.

I found one source from http://www.docstoc.com/docs/23999316/Java-Beat-Detection-Program
Do I have to go through all the written processes there?
What I’m really confused about is what kind of data from the music file that can be used for generating the arrow signs (like in DDR)?
Is it possible to retrieve data from the sine wave in the music file? Because so far I think that sine wave is the interpretation of music data in the computer.

Also, I would appreciate if there’s any suggestion for open source library. So far I’ve only found Minim and I’m trying to figure out how it works.

P.S. Sorry if the question is noob, I’m really new into audio programming :blush:

The most simple form of beat detection is to just look for the peaks. With some music this will work, but it is likely to have a lot of false detections.
Accuracy can be improved by band limiting the frequency range that is being analysed. For example, to detect kick drum beats you could filter the audio to pass only bass frequencies, then look for the beats (I think that this is the method that Audacity uses).

Possibly a better way is to look at the (unfiltered) RMS level and look for the peaks there - this will give an indication of where the acoustic energy peaks. From my own tests this works better with dance music than peak level detection. The peaks in the RMS level can then be compared with the RMS level of a window size of a couple of seconds or so and if the level of the small window RMS level is significantly higher than the level of the large window, then it is marked as a beat.

The second part of the problem (which Audacity beat detection does not implement) is to analyse the detected beats and look for statistical correlation to a regular rhythm within a sensible tempo range. For example, if 50% of the detected beats are at intervals that suggest 120 BPM, 30% suggest 170 BPM, 10% suggest 180 BPM and the remainder suggest greater than 200 BPM, then the very fast beats can be ignored, 180 is 3/2 times 120 and so suggests an off-beat, so statistically the suggestion would be 120 BPM. Statistical analysis is difficult to get right but can massively improve the accuracy of beat detection.

OK, so far I think I’ve get the basic concept of onset detection algorithm. But somehow I’m still a bit blur about the basic of audio programming itself.
From what I’ve learnt, the only way to analyze audio file programmatically is to access the PCM data of the audio file by using a decoder (which is by using certain library). In my case, Java doesn’t recognize MP3 so I will use external library “jLayer” to decode the MP3 files and get the PCM data.
After that, I need to store the PCM data(which should be an array of numbers) into a temporary array (which would be the buffer) within the specified frame size(which would be 1024 normally).
Only then I will be able to analyze the audio file, is this correct so far? I need to really understand the whole process of audio processing since I have nearly zero experience in both audio programming and audio processing. I know the basic concept of the whole process of audio processing but I need to know further detail on each step.
Thanks in advance, please bear with my lack of knowledge :cry:

I’m not aware of any Java experts on this forum, so your best friend will probably be Google.

Also, Minim looks to have comprehensive documentation: http://code.compartmental.net/tools/minim/