Determining bit depth of audio on a .MP4 file


I have a video file in .MP4 format that I want to extract the audio from. I was trying to determine what quality the audio is before I did anything to it. So I looked at the properties from the right click menu and it shows a bit rate of 253kbps, stereo, and a sample rate of 48000 Hz (below). It doesn’t show bit-depth.

I think the formula for bit rate is sample rate * 2 (stereo) * bit depth. So if I solve for bit depth using the properties from the original file I get 253000 / 2 / 48000 = 2.6354. Shouldn’t I be getting something like 8, 16, 24, or 32?

When I import the video file into Audacity it comes in as 48000Hz and 32 bit float. If I then export it from Audacity at 32-bit float, it shows up with a 3,072,000 bit rate. And doing the math like above, I get a bit depth of 32, i.e., (3,072,000 / 2 / 48,000 = 32). So that makes sense.

What is wrong with my approach where I am getting 2.6354 for a bit depth? Is the audio that is connected with the video compressed? And even if it was, shouldn’t I still be getting 8, 16, 24, or 32 for an answer?

I’m confused.




MP3 & MP4 files don’t have a traditional bit depth because they don’t store individual samples. The bitrate is somewhat related to “quality” because the lower the bitrate the more data has to be thrown-away.

A 253 kbps file is likely transparent - It should sound like the original no-matter the original format/resolution.

By default, Audacity works in 32-bit floating point. Floating point is better for processing/editing.

32-bit floating point.

More specifically, 32-float doesn’t overload. If you apply an editing filter that causes the sound to go over 100% by accident, it doesn’t trash the show. The sound is still up there waiting for you. You just have to run another effect to bring the volume back down within normal range and everything is good.

Contrast that with regular bit depths where if you overload the sound channel, that’s the end of the show. If you don’t have safety backups, you’ll be recording it over.

You would think that 32-float should be the normal way to produce digital audio. It would be except every computer and sound system on earth knows what the other standards are, fewer accept 32-float. You can’t send a 32-float show to a client, or post one on-line and automatically have everything be OK.


What is wrong with my approach

I don’t think you can do that. The encoded and compressed formats play serious games to get apparent good quality with very small files. They don’t lend themselves to static analysis like that.

For two common, simple examples. Some formats watch the left and right channels and if they’re the same or close enough, they get turned into mono saving half the bits—but only during that time. Then it switches back to stereo when it needs to.

MP3 rips the sound apart into individual overtones and harmonics and deletes the ones you’re not likely to notice. This, when overused, can turn a Stradivarius into a children’s toy.

They all work on Perceived Damage. How much damage can we cause without anybody noticing.

Newer technologies with higher processing power are much better at hiding.


The audio stream in MP4 files is often (but not always) AAC, which is a “lossy compressed” audio format. Unlike PCM encoded digital formats (such as WAV and AIFF) which have a fixed bit-depth (usually 8, 16, 24, 32 integer or floating point), lossy compressed formats may have different bit-depth from one sample to the next.


Bit depth

Along with sample rate, there is also bit depth to consider. The bit depth is the number of digital bits of information used to encode each sample. In simple terms, bit depth measures “precision”. The higher the bit depth, the more accurately a signal can communicate the amplitude of the actual analog sound source. With the lowest possible bit depth, we only have two choices to measure the precision of sound: 0 for complete silence and 1 for full volume. The higher the bit depth, the more precision one has over their encoded audio. As an example: CD quality audio is a standard 16-bit, which gives 216 (or 65,536) volumes to choose from.

Bit depth is fixed for PCM encoding, but for lossy compression codecs (like MP3 and AAC) it is calculated during encoding and can vary from sample to sample.

Thanks DVDdoug, Koz, and Steve - that’s very helpful - I appreciate your feedback.