Help with Science Fair Project Involving Audio Compression

moonlitknight · February 6, 2014, 8:07pm

I’m conducting a scientific investigation in which I test which factors effect the audio quality from certain genres of music. I need some advice and feedback regarding a method of testing I have explained below. Thank you.

So far, I’ve tried out the mathematical difference test on one of my favorite Genesis songs. I ripped the song from the CD as a WAV and as a 320 Kbps MP3 and imported them both into Audacity. I aligned the tracks (MP3s have gaps in the beginning and end) and used the “invert” effect on the MP3 track. I exported the two tracks (inverted MP3 + WAV) so that when they got merged together, the inverse MP3 track would cancel out with the frequencies it shared with the WAV track. What I was left with was the “difference,” or what the MP3 was missing compared to the WAV. There didn’t appear to be much data compared to the MP3, but it was definitely noticeable on playback and on the waveform graph. My next thought was to look at the file sizes of the “difference” track, which I expected to be very small, and the original WAV file, which I expected to be very large. But, for some reason, the difference track has a file size of 25,225,324 bytes and the original, unmodified WAV has 25,225,444 bytes. A difference of only 120 bytes does not seem reasonable based on the results I’ve gathered. Am I assuming something incorrectly about how file sizes are calculated or am I doing something wrong in Audacity?

Also, what sort of programming could be done to help me with this project? I’m mainly focusing on how compression factors and bit rate affect audio quality.

kozikowski · February 6, 2014, 11:56pm

There’s a lot of problems. 320 quality MP3 should have been indistinguishable from the original. That’s way beyond the minimum needed to product an almost perfect copy. However, the listening machine has different pathways to play a WAV and an MP3. Your two pathways are not sound identical. Also, your playback software may have “real” Fraunhofer MP3 decoding, but Audacity does not use Fraunhofer for generation. Fraunhofer is paid software, so Audacity uses Lame which is an open-source alternative. It’s very uncommon, but people have posted that they like iTunes MP3 generation better than Lame. iTunes uses real Fraunhofer.

The music on an Audio CD and the WAV format do not change with content and there is no compression. A 1.2 MB sound file is going to stay 1.2 MB no matter what filters, effects or changes are made to the work or how many flutes are in the orchestra. WAV doesn’t care. Please note that Audacity makes new files every time it works on something, so 120 bytes could be something as exciting as a different filename or small changes in file housekeeping.

This is the reason an Audio CD will hold 78 (80) minutes of show and it doesn’t matter how much you mess with compression or processing. It’s WAV format and it does not change.

Audacity Exports add a tiny amount of dithering noise to avoid aliasing and sampling errors (having sample bits and holes line up in very unfortunate ways). Scientists that try to use Audacity for super-accurate bit-level analysis always run into the same problem. Audacity internal sound format is very probably different from both the import or the export. Photoshop works the same way. Pictures inside Photoshop are not in “GIF” or “JPEG.” They’re in ultra-high quality “L-a-b.” You can turn dithering off in Audacity Preferences if your input and output files are identical format. Bad idea if they’re not.

And just to cut to the chase, you may get analysis results that don’t make sense. MP3 (part of a video format if your research hasn’t gotten there yet), does the neat trick of scrambling the waveforms in sometimes very surprising ways in order to hide the compression damage it’s causing. Peak tips in the blue waves are frequently higher after MP3 compression than before producing clipping and overload damage where the original show didn’t have any. It can also change the manufacturer of a violin (one reason artists hate it). A trio of two ordinary violins and a Stradivarius before compression turns into three ordinary violins after. No more Stradivarius special varnish and finely cured woods. All gone.

The other thing you’re going to get is two identical sounding performances with different waveforms (if you get your monitoring problems solved).

You sure you don’t want to change your project?

Koz

steve · February 7, 2014, 2:26am

To try and clarify a few points:

Do you mean that it is “very uncommon” for people to post saying that they prefer iTunes MP3 to LAME MP3? If not then what are you saying, and on what basis are you alleging it?

I thought that by default iTunes used AAC encoding.

Audacity applies dither when reducing the bit depth (bit format). This is to avoid harmonic distortion which would otherwise occur due to quantization errors.

“Aliasing” is a form of distortion that can occur when attempting to represent frequencies that are more than half the sample frequency. Anti-alias filters are used when reducing the sample rate (not bit depth) to prevent aliasing.

Regarding the question:

Unfortunately that procedure is flawed.
MP3 compression (and particularly the LAME implementation) is not attempting to produce identical waveforms. The intention is to reduce the file size whilst retaining the “sound” as closely as possible.

As an illustration of the distinction, here are two short WAV files that sound virtually identical, yet the waveforms look very different, and “subtracting one from the other” will indicate a considerable difference:

MP3 encoding applies a psychoacoustic model (Psychoacoustics - Wikipedia) to determine what information can be discarded with minimal impact on the sound quality.

As Koz implied, Audacity does not operate directly on files. When you “import” an audio file, Audacity makes a high quality copy of the audio data from the file. Internally Audacity works in “32 bit float PCM format”, and by default Audacity copies MP3 files in this ‘high definition’ format. When you “export” a file, Audacity copies the audio data from the current project to create a brand new file (in whatever format has been selected).

kozikowski · February 7, 2014, 4:28am

kozikowski wrote:
It’s very uncommon, but people have posted that they like iTunes MP3 generation better than Lame.
Do you mean that it is “very uncommon” for people to post saying that they prefer iTunes MP3 to LAME MP3? If not then what are you saying, and on what basis are you alleging it?

It has been posted more than once that an identical MP3 quality value in iTunes will (according to them) produce a slightly better product – and certainly different. I know of no postings that claimed the reverse (and it would be almost impossible to find the postings again).
I haven’t tried the test. It’s better than even chance I won’t be able to hear the difference even if there was one.

kozikowski wrote:
iTunes uses real Fraunhofer.
I thought that by default iTunes used AAC encoding.

It does. But MP3 is one of the native selections (Illustration). There used to be more. It used to support everything QuickTime Services supported which was quite a list. I think iTunes 10 in its Grand Simplification Move took care of that.

Koz
Screen shot 2014-02-06 at 8.16.43 PM.png

steve · February 7, 2014, 5:36am

The most recent ABX listening test results that I’ve been able to find (more than one listener) date back to November 2008.
http://listening-tests.hydrogenaudio.org/sebastian/mp3-128-1/index.htm

These are the final results:

Information about interpreting the results (emphasis from original document):

How to interpret the plots: > Each plot is drawn with six codecs on the X axis and the rating given (1.0 to 5.0) on the Y axis. The number of listeners used to compute the means (average ratings) and 95% confidence intervals are given on each plot. The mean rating given to each codec is indicated by the middle point of each vertical line segment and the value is printed next to it. Each vertical line segment represents the 95% confidence interval (using ANOVA analysis) for each codec.
This analysis is identical to the one used in Roberto Amorim’s listening tests.

One codec can be said to be better than another with 95% confidence if the bottom of its segment is at or above the top of the competing codec’s line segment.

Important note: > These plots represent group preferences (for the particular group of people who participated in the test). Individual preferences vary somewhat. The best codec for a person is dependent on his own preferences and the type of music he prefers.

and the result summary:

They show that > all encoders are tied on first place> , except l3enc which of course comes out last being the low anchor.
What is interesting to see is how the MP3 codec actually evolved since its first days (l3enc was the first MP3 software encoder back in 1994 when it was released) and how it is still competitive with newer formats like AAC or Ogg Vorbis.
Another very interesting thing, which was also one of the goals for this test, is that Fraunhofer and especially Helix, which both outperform LAME in terms of encoding speed, are still very competitive. While statistically being tied to LAME on first place, Helix actually even received a higher rating than LAME 3.98.2 - and this at 90x encoding speed! Even FhG received a slightly higher score at least against LAME 3.97 which was the recommended encoder by the Hydrogenaudio community for a long time. > But again, statistically, they are all tied so there is no quality winner> .