How to compare two audio files ?

The problem is that computers have no idea what things “sound” like. They are able to store and manipulate audio samples, but are totally incapable of “hearing”. Specialist speech recognition software is able to apply complex pattern matching algorithms that can calculate a probability of a sound being a particular word, but really a computer has no idea if a sound is an angel singing or a dump truck applying its brakes.

I’m not sure what you are referring to there. Do you mean the “Audio Diff” proposal? Missing features - Audacity Support

The way that I would approach it would be to run specific tests with synthetic audio samples that will provide easily measurable results. For example, send a 1 kHz sine tone, and then look to see if the result still has a frequency of 1 kHz, how much harmonic distortion is there, how much noise is there. Try sending silence - does that come back as silence, or is there added noise, if so, how much. Try sending pulses of pink noise, what comes back? And then after those tests, do some listening tests with real speech.