What feature do you want to examine precisely?
I am asking because the simple substraction may not tell you very much about the difference between the two files (especially with regard to the harmonic content).
All you will do is removing all content that is 100% correlated.
If the two tracks were aligned correctly, you could simply use the Voice Removal effect (simple mode). The preliminary step would be to import the two files, then (if they are mono) to make them stereo (just “make stereo track” from the track drop down menu of the upper track).
The result after the voice removal is the difference between the two channels, scaled by 0.5.
You can split the new stereo track - both channels are identical and amplify one by 6 dB to get the real difference.
You can also try this plug-in:https://forum.audacityteam.org/t/karaoke-rotation-panning-more/30112/1
The preliminary steps are the same but you’ll end up with the rest of the first wave file in the left channel and the rest of the second file in the right channel.
What’s more, there’s a control that let’s you shift one of the two channels by an exact amount of samples (“delay”).
But we now have to face the fact that this offset is not known in the first place.
You’ve already proposed a method that works with the slope of the signal.
In Nyquist, that’s really easy,
(slope s)
returns this value (multiplied by the sounds samplerate). s is the global variable that holds the input sound passed from Audacity either mono or stereo.
For the log you can either choose ‘(s-log )’ or directly convert to dB ‘(linear-to-db )’.
Quantization is also available ‘(quantize )’.
However, I don’t wanna introduce you to Nyquist but it may be that we end up with some code that you can directly try from the Nyquist Prompt.
I am not sure how you want to find the exact alignement from the values obtained so far.
The question is how much a signal differs from the other. You’re lost if there are any phase shifts or so introduced. We currently assume that the recording starts in both cases with a slope that is practically identical.
There are several approaches that seem to work better.
Firstly, I would work with the integral of the absolute sample values of a little portion of the two sounds. This gives a wavy, ascending line.
You can compare the two lines with a least-squares calculation. This gives you two values:
A - the y-axis crossing and B the slope, i.e. Left-Channel = A + B * Right-Channel.
If the two are aligned correctly, you’ll should get the values 0 and 1.
This can be done repeatedly with a different delay until you’ve found the minima (or when the correlation has its maximum).
We can shorten this process by remapping the values such that the signal value instead of the time line serves as x-axis.
In this case, the A-coefficient wil tell us how much latency offset we have.
That’s like shifting our ascending line left/right instead of up/down until the error is minimal.
(that’s why we must integrate the signal, the recursive method would work with the pure signals)
The final algorithm really depends on the similarity of the two signals and if there is any noise in the beginning or pure digital silence.