Is there a way to compare and segmentation two audio files based on content similarity?

I have two audio tracks, let’s say audio track a and audio track b where track a has the length Ta and track b has the length Tb.

Track a is composed with segments like
Sa1Ta1 + Da2Ta2 + Sa3Ta3 + Da4Ta4…

Track b is composed with segments like
Sb1Tb1 + Db2Tb2 + Sb3Tb3 + Db4Tb4…

Where segment Sa1Ta1 has the duration Ta1, segment Sb1Tb1 has the duration Tb1, segment Da2Ta2 has the duration Ta2… so on.

Segment Sa1Ta1 is similar in content to people’s hearing with segment Sb1Tb1 but the length of Ta1 is not equal to Tb1, in other word the 2 segments is different in speed.

Segment Da2Ta2 is different in content with segment Db2Tb2 and the duration Ta2 is different with Tb2 too.

(Abbreviation: S for similar, D for different, T for time, a for track a and b for track b)

And so on.

Now I want to compare and split the 2 audio tracks into segments. Sa1Ta1, Da2Ta2, Sa3Ta3… for track a and Sb1Tb1, Db2Tb2, Sb3Tb3… for track b.

After that I will build a 3rd track, track c, which compiled from segments Sb1Tb1 + Da2Tb2 + Sb3Tb3 + Da4Tb4 … where Da2Tb2 is the segment Da2Ta2 stretched the length to Tb2.

After that I will has track c with the audio content similar to track a but synced in time with track b.

Here are the 2 audio files for track a and track b. The first file is the audio descriptive track of the movie. The second is the movie video audio. The 2 tracks is different greatly in time. I want to build a third track from the audio descriptive track so that the 3rd track is synced in time with the movie video.

Is there a way to do that automatically. I’m tired of manually marking, cutting, stretching and joining in Audacity.

Thank you for the time.


Track a

Track b

Audacity can’t analyze content. It can do comparisons, but the two samples have to be surgically perfect and the available work never is. For example, we can’t compare two MP3 files because MP3’s job is to make files smaller by adding distortion. Any difference at all kills the job.


Can you guys just write some Audacity plug-ins or Windows software to do these the thing. By using audio finger print technology or something. I don’t know but I really need them. It takes me too much of time manually fixing those the audio files.
Hope some available programmers can help.

I think you are massively underestimating the difficulty and the amount of development time that would be required. Sound recognition is something that has developed in humans and animals over millions of years, but it is still a massively difficult technological task for computers. It is still an area that is being actively researched by companies like Google and Universities.