Can We Remove Repeated Sections of A Track?

I don’t mean noise reduction.

I just mean like you might cut out a section of track.

Select a section that represents a sound repeated often on the track and have audacity recognise it and go down the length of the track and remove every instance of it.

i.e. matching that exact waveform?

Like noise removal works differently somehow, I don’t know the technical details but to remove ‘this’ that exists ‘there’ it will also remove whatever components of ‘this’ that exist in any other place. Won’t it? Doesn’t it?

So if you’ve got ‘pops’ on your track and you tell noise removal to learn that and then remove you’ll lose the pops alright but all other waveforms will diminish, too. You can see it. Hear it. They’ve all been modified.

Audacity does not have anything built in to do that.

In the simplest case of a short, simple sound (such as a “beep”) that is “identical” in each occurrence, it would be possible to write a “Nyquist script” to search for the specific pattern.

For more complex cases of “real world” sounds (such as a repeated song chorus), the task becomes very much more difficult. There was some work done on this in Audacity, called “Audio Diff”, but that work has been abandoned for many years due to a lack of developers to progress it.

That’s a shame. Ah well.

I wonder if they’re trying too hard - the programmers, when they looked at implementing this feature?

Because Audacity displays these sounds graphically as waveforms no problem at all, doesn’t it?

So a graphical matching or ‘mapping’ would do fine. That’s the way I do it myself. Not auditory but graphical.

I look at the display and say ’ ah, there it is, I don’t want that ’ and then I look down the timeline identifying each further instance of it and delete them.

Unfortunately, clear and simple as the idea seems to be, my programming skills are totally inadequate for it.

But it does seem straightforward, doesn’t it? The pattern, the graphical pattern, is there, Audacity has made it. The whole track is one great long graphical pattern.

All that has to be done is identify the ‘sub pattern’ we don’t want and go down the track matching and deleting.

Relating the recurrences back to the audio track by time location, I guess, and just remove that segment. Replacing with what I don’t know - replacing with whatever Audacity currently replaces with when you delete a section. Nothing, I think, is that right? It is, it just closes up. Well do that.

Next trick would be to replace with a chosen sound or ‘pattern’ so’s the track length remained the same.

I wonder if there are any programmers willing to go down that path? Or at least have a look at it?

Much easier you’d think than the clever removal of frequencies all down the track as in removing noise.

“Pattern matching” is something that the human brain is exceptionally good at. We do it instinctively. We can even do it when there isn’t actually a matching pattern (such as seeing animals and faces in clouds and the Rorschach Test).
For a computer it is much more difficult as computers have no “instinct” for pattern matching, other than carefully designed algorithms that have to be programmed through measurement and logic.

Well I don’t think that’s quite right, is it?

computer are made for simple tasks and what we’re talking about here is simple. Much simpler than, say, face recognition.

There’s no comparison here between ‘nearly the same’. Here we’ve got ‘exactly the same’.

The track is a visual picture made by telling the video chip to display a pixel here and a pixel there. Each pixel defined exactly by x,y co-ordinates.

I’ve forgotten the name for it but something like the video memory has the complete VDU ‘picture’ in it ready to throw out to the screen.

Even in BASIC you could write to the screen by specifying locations like that. And write to video ram or whatever it was called.

So let’s make it up what the prog to do this would have to do:

it has to keep a record of the whole track as visually represented. With all the x,y commands. Locations.

Well it does that already. We know that because we can travel backwards and forwards along the track and it is not recalculating for every movement I’ll bet. It has processed the audio and created this ‘video file’, I’ll call it.

Then it has to know where the different locations are. Well it already does that, too, it is indexed by time for the full length.

So we tell it that the section we are interested in is between this time and that time.

So it finds that alright.

And it takes a note of that ‘pattern’.

In particular the value of the first ‘x’.

And then it runs along the track doing if/then/else all the way.

find first x to match the reference X.

Mark that location as the Start.

move to next x. If it is the same as the next reference x mark it as start + 1

if not then mark it as Start.

See? You increment Start until you reach End and then delete that section.

And start again.

Every series of tests that is not the pattern you want will fail before reaching End.

That’s the basic idea of this ‘pattern matching’ for a computer.

The waveform is symmetric about the horizontal axis, so it only needs one side to be tested.

Couldn’t be much simpler, I think. What do you think?

No it’s not a “simple” task.

“Exactly the same” would be a special case that would hardly ever happen in real life. Even for “identical” waveforms in a lossless audio format, the sample values will probably not be identical because the (analog) waveform will usually not be aligned to the (discrete) sample positions. On top of that, there may be numeric differences due to dither, and on top of that there will be further difference if the audio was not from a lossless format file (for example, if it was an MP3. Just because two things sound identical does not mean that they are identical.

Perhaps you mean “frame”.

Audacity creates a number of image representations of the waveform, that are cached in RAM for more efficient access. The images are recreated when the associated section of audio becomes “dirty” (modified).

The image is an approximation of the audio data in graphic form, based on peak and RMS measurements of blocks of samples (called “summaries”)
If the block alignment of two identical sections of audio are the same, and the mapping of that data to pixel values is the same, then the graphic representation will be the same. If the block alignment is different, or the mapping of that data to pixels is different, then the graphic representation will be different.

Audacity calculates these “summaries” at several zoom levels. At the lowest level there is the individual sample values. At higher levels they are sequences of samples, with the largest being a “block file” (the “.au” files in an Audacity project). Depending on the zoom level, there could be multiple blocks of data contributing to a pixel, or multiple pixels representing one block of data.

The problem is far more complex that you suggest, especially as I expect that you want to match sections that “sound” the same. A very simple example is that a 1000 Hz tone with a specified amplitude will look identical to a 2000 Hz tone of the same amplitude, (assuming that you are not zoomed in really close), but sounds completely different. More subtle examples are that a cow-bell will look much like a snare drum, and a fog horn will look much like a flute. Even if we take some account of frequencies, a group of people talking could look very much like traffic noise, or a waterfall.

Thanks for your response but no, that’s not what I am talking about.

I am talking specifically of finding visual waveforms that look the same - that are the same - on one specific current representation, one specific ‘timeline’, i.e. the one right before me now.

You say the problem is far more complex than I suggest and you say that on the basis of you making it far more complex.

That’s not valid.

And a part of your objection relates to inaccuracy I think - saying that a representation here will not be the same as a representation there so that an identical thing will not be displayed identically. Right?

Well then you just build that level of tolerance into the prog.

How well the thing would operate remains to be seen. It would doubtless operate better in some circumstances than others. To find circumstances where it can be forecast to not perform well is no basis for rejecting the idea.