Need pointers on adding feature to search pattern


I needed a feature in Audacity but it’s currently not present. Therefore I’d like to code it myself, but need some initial pointers/comments.

Feature is to be able to select small section from an audio waveform and locate all occurrences of that section in the complete audio file.

This could be achieved in various way, simplest being the use of cross-correlation, which I intend to implement. I have downloaded Audacity source code (I’m not familiar with the code structure yet).

I think the task can be divided into following steps -

  1. Store the selected waveform amplitude an an array pattern[]. Some sort of averaging approximation can be used as we don’t need to store one number for each sample .

  2. Store amplitude numbers for complete audio using similar logic as in step 1 completeAudio[].

  3. Perform sliding cross-correlation and store results in another array crossCorr[].

  4. Search for maximas in crossCorr[]

Do you think this approach will work? Where should I look in the code base for this task (may be some of these are already taken care of in some function)?


The general case of selecting an arbitrary sound, then searching for that sound, is very difficult to implement.

There are specific cases that are much more simple. For example, to look for “all occurrences of sound that have a peak level above a specified threshold” is relatively simple to achieve. This is essentially what the “Sound Finder” effect does: Audacity Manual

Moving one step up in complexity; looking for a “beep” that is at a specific frequency is not too complicated, The audio can be filtered to pass only narrow frequency bands that correspond to the frequencies contained in the “beep”, then the filtered audio searched for where each of those frequency bands has a signal that lies within a specified amplitude range.

Moving further up in terms of complexity; The “needle” sound could be analysed for its frequency content using FFT analysis, then searching for a similar spectrum profile within the “haystack” audio.

For the general case you need to be searching for a sequence of spectrum profiles. A little about this has been written here: Missing features - Audacity Support

Cross correlation is a reasonable starting point, it has it’s draw backs though.
Let’s take a simple example:

The search pattern is roughly in the first 18000 samples – nearly a second at 22.05 kHz.
We will now try to search for it with convolution.
This is nothing but correlation, if the pattern is reversed.
That’s the snippet to do it:

;; Store pattern as array
(setf pattern (snd-samples s 18000))
;; reverse array
(do* ((i 0 (1+ i)) (j (1- (length pattern)) (1- j)) temp) 
     ((or (= j (1- i)) (= i j))) 
      (setf temp (aref pattern i))
      (setf (aref pattern i) (aref pattern j))
      (setf (aref pattern j) temp))
;; Back to a sound
(setf pattern (snd-from-array 0 *sound-srate* pattern))
;; cross correlation by convolution
(setf result  (convolve s pattern))
;; peak is way to high (thousands of times)
(setf result (scale (/ 1.0  (peak result  36000)) result)) 
;; The mask will last for 18000 samples,
;; after a threshold of 0.3 is detected
;; Since the peak is at the end of the pattern,
;; all is shifted back by this amount
(setf mask (extract-abs (/ 18000 *sound-srate*) 3600 
      (snd-oneshot result 0.3 (/ 18000 *sound-srate*))))
;; mask original
(mult s mask)

Copy the code into the nyquist prompt and press ok.
(Of course, you have to import the sample file above first.)
There should now only the occurrances of “Audacity” be audible in the returned sound.
The threshold of 0.3 is somewhat arbitrary.
That’s perhaps something that had to be set by a plug-in control.

Also, the marked places do not exclusively hold “Audacity” because the word is sometimes longer and sometimes shorter.
Thus a perfect pattern matching algorithm has to work in two dimensions.

The code above is admittedly slow because I’ve not averaged the samples.
The result is getting worse, the more we do that. However, you can try to down sample, take the rms values or whatever you want.

Very nice Robert.

Thank you Steve,
It is after all only an example under “clinical” conditions.
I’ve just tried the snippet with the audio book extract from “Manipulating Decibels” at 16 kHz.
It is interesting that the code marks all places with the same intonation–which is actually more difficult for humans than finding a single word.
For practical applications, the convolution had to be implemented with FFT.

Brilliant stuff Robert. It works! (within reasonable limits).

I will try to come up with something more sophisticated. Btw is it possible to bring the Nyquist prompt with a keyboard shortcut? I don’t see an option for that under kbd shortcut preferences.

The Nyquist Prompt is an “Effect” so you can set a keyboard shortcut in the same way as other effects.
but probably better would be to make the code as a plug-in

You can update a plug-in without needing to restart Audacity, but to add a new plug-in Audacity must be restarted before it will appear in the menu.

Thanks Steve, but under Effect in kbd preferences where is only key binding available called “Repeat Last Effect”.

Which version of Audacity are you using? (look in “Help > About Audacity”). The current version is 2.0.5 which is available here:

Have you entered any character in the filter field?
Just choose “All Commands”, change to the list and press “N” until Nyquist Prompt gets focus.

I wouldn’t program at all without a shortcut to the Nyquist prompt.
Another Tip: Use Control-Enter in the NP if you don’t want to always click on Ok. But you’ll probably need the Alt-G shortcut even more…

I’m running Ubuntu Precise, so latest version for me is 2.0.0 May be that’s why the shortcut is missing.

@Robert, I tried that (hitting N under All), Nyquist Prompt didn’t come up.

I think the version is too old for that.
I don’t know if per chance an alpha version between 2.0.0 and 2.01 with this feature is available and compatible with your OS.

You could uninstall your package of Audacity 2.0.0 then install the “2.0.5+” Audacity version for Ubuntu 12.04 from Audacity Daily Build : “Audacity Team” team . That build is actually the latest Audacity source code 2.0.6-alpha which is not “stable” . Note that you will get updates for it most days unless you remove it from your sources list.

You could instead compile Audacity. Then you can either build latest Audacity HEAD or build the 2.0.5 release tarball. See Redirecting to: for the tarball and Missing features - Audacity Support for help with compiling.



Has this worked for you? I need a similar tool to identify identical advertisements.

I am using audacity to record streaming audio from radio stations. My recordings are normally 10 hr long. In a usual 10hr recording there are about 50-100 advertisements replayed about 4-8 times in each 10hr span. I need a tool that will match identical advertisements in each 10hr recording. Any ideas???

Any progress on this?

I sometimes have the need to analyze a track for a specific sound that re-occurs in isolation (no other background noises to interfere) throughout the track and put a marker at those spots. I can’t seem to find any programs that do this–seems like something people would use fairly often when manipulating sound files, as well something “basic” that audio editing programs could do (if the sound is very close to exactly the same), but maybe it’s more complex than I imagine. Or maybe I’m just not using the correct search terms. (Although I know some basics with audio editing stuff, I’m not terribly adept.)

Any recommendations for programs that can do something like this? Cheers!

Did you see Robert’s post near the start of this topic?

Did you see Robert’s post near the start of this topic?

Yeah, but I’m afraid I’m not knowledgeable enough to modify the code to make it work with a sample from my own audio. I tried a manipulating it a bit, but I can’t seem to get it.

If anyone has time and could let me know what I need to do to get the right numbers from a sample and input them, I’d appreciate it. :slight_smile:

(The reason I asked if there was any progress is because I saw that audio-enthusiast mentioned something about “coming up with something more sophisticated”, so I thought perhaps a plug-in or something…)

Okay, I figured out part of why I couldn’t figure things out: the code only seems to work with mono files and I was working with a stereo file.

A few questions:

  1. So the only two numbers I care about are the number of samples and the threshold, right? (The 36000 and 3600 aren’t anything I’d want to change?)
  2. Also, is there an efficient way to do this with stereo tracks or had I just better change to mono to find reoccurring audio?
  3. Is there a way to change part of the code to insert bookmarks rather than deleting the non-sampled audio? (Otherwise it seems like the easiest way for me is to paste the analyzed track below the original to find the points of re-occurrence

I can help you the way through, I think.
Point 1:
1/36000 is just a scaling factor, we could as well have scaled the threshold to e.g. 12000 (instead of 0.3).
Other pattern durations need other scaling factors.
If we keep the current behaviour (the first x samples are the pattern), then the normalization can be done after the convolution by searching for the highest peak in those x processed samples and by multiplying the rest with 1/.
The reason is that we won’t encounter any higher value since we have found the first perfect match already.
Thus, a threshold of almost 1 would only mark perfect matches.
However, this will probably only produce one match since even equal audio can be off by e.g. half a sample during recording.

3600 means that the whole audio to be analysed can’t be longer than 1 hour–arbitrary set.

  1. Stereo is rather an advantage than a drawback since we can throw out positives that are not common to both channels. However, time consumption increases naturally.

  2. The found matches can be returned as Labels in a separate track.
    This needs searching for the samples that are over the threshold and the highest ones in the neighbourhood of pattern-length.

It would be somewhat better to wait until the next release (due this month) in order to implement the code as a plugin.
The pattern could than be in another track or the first/last clip within the track itself.
A multiple choice would ask where to look for it.
Other controls would be pattern length (if not already known), the threshold and the kind of return (silenced audio or labelled occurrances).

Any other ideas?


Thanks for the explanation, Robert!

Makes sense to wait till the next release to create plug-in. I can’t really think of anything else for the plug-in, but then, I’m not terribly knowledgeable with this sort of stuff. So hopefully those with more experience will make suggestions. Seems like a plug-in that could be quite beneficial to those needing operations such as this!