Thanks for the sound file. I did some testing with it. First I opened it in iTunes (so as not to see the waveforms) and gave each section of sound a volume rating from 1-5. The first sound in each version was rated at 3, then I increased or decreased the rating depending on whether the following sounds were louder or softer. Then I opened the file in Audacity to see if my volume ratings correlated with the “density” of the waveform I could see. And they did to a reasonable approximation.
In what follows, I have labelled the “hissing” sounds A, B, C, D, and the tones T1, T2, T3.
Here are my volume ratings (waveform unseen) for the three versions (Note - my extra spaces to make numbers line up have been deleted):
A B C D T1 T2 T3
3 5 4 3 3 4 2
3 3 3.5 4 3 4 2
3 2 3 4 3 4 2
For Sound A it was tricky to estimate how loud it would be just by looking at the waveform (due to the superimposed very-low frequency component), so I have not included it in the images below. The first image is how B, C and D looked in Audacity (first version), and if I was asked to estimate the volume of B, C and D just by looking at the waveform, B would be loudest, followed by C, then D. Which is how I rated them in iTunes (5, 4, 3).

For the second version, my ratings were 3, 3.5 and 4, and in Audacity they look like this:

There’s not much between B & C, but C does “look” slightly louder because of the peaks.
For the third version, my ratings were 2, 3 and 4. Here they are in Audacity:

Again, B and C look pretty similar, but on close inspection, C “looks” louder.
I’m intrigued by how closely the volume of some sounds can be estimated just by looking at the density of the waveform as displayed in Audacity. It seems to give a better approximation to perceived loudness than any other method other than actual listening. Is it possible to add to the Wave Stats plug-in a calculation that mimics this “density”? Could it be as simple as the average of the rectified waveform? If it can be coded, I’ll test it on a wide range of music to see how it performs as a measure of perceived volume.