Audio Search (EXPERIMENTAL)

This plug-in is based on the innovative “cross correlation” code snippet by Robert J. H. that he posted here: https://forum.audacityteam.org/t/need-pointers-on-adding-feature-to-search-pattern/33729/3

NOTE: THIS PLUG-IN IS AN EXPERIMENTAL PROOF OF CONCEPT. Use at your own risk.
The effect also has a lot of limitations as described below.

The main reason that I posted this code is because I spent ages yesterday looking for Robert’s script, which was buried in the middle of a long topic on the “General Audio Programming” board. Having eventually located it, I found that it was written using the old “version 3” syntax, so I updated it for the current (2.3.0) version of Audacity, and made it into a plug-in for easier testing.

Unfortunately, the convolution code in Nyquist is still extremely slow, and even though I have increased the speed by reducing the sample rate, it is still quite a slow effect. This effect may become more practical in future versions of Audacity as I see that the current “stand alone” version of Nyquist has a much faster convolution algorithm.

Don’t expect this effect to match spoken words, other than exact copies of the word. There is usually too much variation in natural speech to achieve a match.

Note that this effect works best at matching very short sounds. At (the maximum) half second duration setting, matches are not likely to be found for much
other than pure tones.

WARNING: During development of this plug-in I found that some settings would cause Audacity to crash. The final version published here appears to be reasonably stable on Audacity 2.3.1 alpha, but crashes very regularly on the Linux version of 2.3.0. Use at your own risk.


The concept:

The idea of this plug-in is to take the first fraction of a second of sound from the start of the selection, and search for it occurring again later in the selection. If a match is found, the plug-in adds a label at each match.

Controls:

  • Duration of search pattern (ms) [1ms to 501ms (default 100ms)]
    The duration at the start of the selection that forms the “pattern” that the plug-in then searches for.
  • Correlation (high = better match) [0.5 to 1.0 (default 0.7)]
    The higher the Correlation coefficient, the better the match must be in order for the plug-in to label it.

And the plug-in:
pattern-match.ny (2.45 KB)

I did give pattern-match.ny (2.45 KiB) a go, (Audacity 2.3.0. Windows Vista),
but it’s not finding any matches, (other than the target), not even on audio that includes an identical copies of the target

censor bleep (1000Hz sine) not found.gif

Try this to get the hang of it:

  1. Generate a Rhythm track (default settings)
  2. Select the entire track (the first beat is at the start of the track)
  3. Apply Patter Match with default settings.

Applying pattern-match to Rhythm track, (or Rhythm track+ censor bleep), just labels all of the selected audio…

Rhythm-track & censor bleep (1000Hz sine).gif
Maybe I’m not using it correctly : is it a 2-stage process like noise-reduction ?.

As a thought experiment:

Compress the model or goal file to very high data reduction and then do the same thing to the show. It doesn’t matter if you can understand the music or words any more. That’s not the goal. Then, with greatly reduced data go ripping through the search. Even if you get multiple hits, you should be able to complete the search in jig time. Then expand the accuracy/quality of the show and the model and see which of the accidental matches is “real.”

You should be done and eating lunch before other techniques finish.

Comparing every leaf and flower at the beginning no matter how well you do it isn’t going to be efficient or fast.

The only shortcoming I can see is original work that’s already been compressed. If the model or goal is taken from that work, it’s still going to succeed.

If the show is highly compressed and the model is clear, perfect, real time, it’s likely nobody’s matching is going to work.

Koz

Maybe you’re using the wrong plug-in.

PatternMatch.png

It already does that. If you feed it a 48kHz stereo track, it gets converted to 8 kHz mono before processing. That makes the processing considerably faster than it would otherwise be, but unfortunately does make the pattern matching a little less effective (Not too bad at 8KHz, but gets noticeably worse with lower sample rates.)

Sorry about that. I’ve got the right one now …

Find othercensor bleep (1000Hz sine).gif

That makes the processing considerably faster than it would otherwise be, but unfortunately does make the pattern matching a little less effective

But you don’t get multiple hits, right? You insist on getting the hero hit first time out? I’m after multiple passes. And I don’t just want to wreck the sound with a low sample rate, I want to MP3-type compress it. That should retain much more of the intelligence, up the hit rate and lower the data load.

Convert to bad MP3 > collect multiple possible hits > Choose between the multiple hits in original high quality.

Or possibly, read out the multiple hits in the case of more than one hero hit.

Koz

This is an “Analyze” plug-in. It outputs labels and does NOT modify the audio track (it just uses a lower sample rate internally so as to reduce processing time).

This is it working on a Click Track that 3 mins and 20 seconds long. Note that it labels each click that matches the first click (the first 100ms of the selection):

pattern-match.gif

Hello I would like to know if it is possible with this plugin to find my voice saying a particular sentence like for example: “yes, I did that”, lets assume I have audioA.mp3 and audioB.mp3.

audioA.mp3 contains the file with me saying : “yes, I did that”
audioB.mp3 contains the file with 1 hour long in which there are parts that I will be saying “yes, I did that”.

is it possible to find with the same pitch and everything the way I said "yes, I did that " with audacity with this plugin or is there another plugin and/or another software?

Thanks for the help

From the original post:

Don’t expect this effect to match spoken words, other than exact copies of the word. There is usually too much variation in natural speech to achieve a match.

ok so I might be lucky in just 1 word then if it is the same exact copy, as per my example I was asking.

but other than that this is the only solution at the moment that could make it possible for finding sounds within a sound, aren’t there any other solutions around?

thanks