set mp3 kbps for mono to half of stereo to get same quality?

dev · April 29, 2014, 6:24am

Hi,

after reading a discussion on itunes mp3 conversion vs audacity mp3 export:

my understanding is that kbps is per second, no matter what happens in that second. If it is a stereo recording and there are two channels, and I export as mp3 at 96 kbps, I could say that each channel gets 48 kbps. If I have converted that file to mono first, I need to export at 48 kbps to get the same quality. Is that so? If, then the itunes way of defining bitrate of mp3 conversion as “96 kbps if file is stereo/48 kbps if itunes finds that the file is mono” would then be “mp3 conversion for dummies”?

thanks

edit to add: Audacity 2.0.5 with LAME 3.98.2 on Macbook Pros running either 10.8.last or 10.9.2.

dev · April 29, 2014, 6:36am

Along the same line, if exiftool says the things below for the original recording, it means that it is 96 kbps for both channels and one could convert it to mono and export as mp3 at 48 kbps, and it is the same quality, minus the deterioration for one mp3 compression.

(Joint stereo might also mean that both channels are the same and thus the stereo is redundant and one might be better off recording as mono, but that is another story and I may not be able to change that)

http://www.sno.phy.queensu.ca/~phil/exiftool/

File Size : 22 MB
File Permissions : rw-r–r–
File Type : MP3
MIME Type : audio/mpeg
MPEG Audio Version : 1
Audio Layer : 3
Audio Bitrate : 96 kbps
Sample Rate : 44100
Channel Mode : Joint Stereo
MS Stereo : On
Intensity Stereo : Off
Copyright Flag : False
Original Media : False
Emphasis : None
Duration : 0:31:36 (approx)

Gale_Andrews · April 29, 2014, 9:17pm

Yes for constant bit rate (CBR) and for stereo, not joint stereo. The relationship may be less exact for other bit rate modes and for joint stereo.

Is that a feature request to Audacity? We can’t do it like that because our MP3 options include all valid bit rates but some of those don’t produce a valid bit rate if halved.

However we may at some future time offer a button in the MP3 options to force encoding to mono (at the chosen bit rate) and a choice that is the same as LAME’s “automatic” mode if no bit rate or bit rate mode is specified (128 kbps CBR for a stereo track, and 64 kbps CBR for a mono track).

“Joint stereo” means that the encoder can switch between “Stereo” (which just encodes the left and right channels independently) and Mid/Side stereo. Mid/Side stereo processes the Left and Right channels as two different signals: a “Mid” or “Sum” channel (Left plus Right, mono) and a “Side” or “Difference” channel (the difference between the two channels, Left minus Right). In tracks with little stereo separation, this can reduce bit rate when encoding with average or variable bit rate.

If you choose “stereo” then the Left and Right are always processed independently.

LAME does not support the “intensity stereo” method of joint stereo encoding ( Joint encoding - Wikipedia ) which effectively makes lower frequencies mono.

Gale

steve · April 30, 2014, 12:21am

kbps does indeed mean “kilobits per second”

For “2 channel stereo”, yes, but as Gale wrote, stereo MP3 is often encoded as “joint stereo” Joint stereo - Hydrogenaudio Knowledgebase

When using VBR (variable bit rate) “joint stereo” mode, where both channels are the same, the file size will be not much more (perhaps 10 to 15% larger) than if it were a mono track.
When using VBR “stereo” mode, where both channels are the same, the file size will be almost double (perhaps 90% bigger) than if it were a mono track.
(it doesn’t work out exactly because there are various other overheads in addition to the actual data).

Unlike CBR (constant bit rate), VBR aims for a “quality” level rather than a strict bit rate. The “kbps” for each VBR setting is intended as a rough guide to the expected sound quality rather than a strict specification of the actual bit rate.

That could be one interpretation

Personally I don’t see that the iTunes interpretation makes any sense other than as marketing hype (“our files are smaller than your files”). Logically, 96 kbps could mean that the file is “96 kilobits per second” (regardless of how many channels), or at a stretch it could mean “96 kilobits per second per channel”, but that is the opposite of what they mean. There interpretation is like trying to avoid a speed ticket by saying that you though the 160 km/h on your speedometer meant that you were actually doing 40 km/h because your car has 4 wheels.

dev · April 30, 2014, 6:13am

okay, so in principle it is like that.

The recorder puts Joint Stereo into the metadata (it seems to use MS mid-side channel and not intensity stereo, which is good for quality according to your links), whether it uses variable bit rate or constant, the metadata do not say, but I assume bit rate is constant because when I just open, tracks>stereo tracks to mono, and export at 48 kbps constant bit rate, I get roughly half the size. If bit rate were variable, the difference would not be that large, from what I understand.

it’s what I did for a while - for the longest time I was thinking kilobits per second and ear, kbpse, and wondering why if I convert to Mono and export at the same bit rate the file size does not change, it should be half. I’m not sure how many people have that going on, if it is worth making the manual longer. But no, not a feature request. Now, the Audacity way makes sense and the itunes way doesn’t make sense to me any more.

Gale Andrews:

dev:

Joint stereo might also mean that both channels are the same and thus the stereo is redundant

“Joint stereo” means that the encoder can switch between “Stereo” (which just encodes the left and right channels independently) and Mid/Side stereo. Mid/Side stereo processes the Left and Right channels as two different signals: a “Mid” or “Sum” channel (Left plus Right, mono) and a “Side” or “Difference” channel (the difference between the two channels, Left minus Right). In tracks with little stereo separation, this can reduce bit rate when encoding with average or variable bit rate.

If you choose “stereo” then the Left and Right are always processed independently.

LAME does not support the “intensity stereo” method of joint stereo encoding ( Joint encoding - Wikipedia ) which effectively makes lower frequencies mono.

Thanks for the information. I thought Joint Stereo means a mono recording doubled to make it stereo. (That’s how new I am to this. Mono recordings do come out on both speakers : )

Where the question came from was not only the mono/stereo kilobits per second and channel confusion but also the quality of the recording. From what I find here, 96 kbps stereo should not sound half as good as these recordings do. So I thought, what if the recordings are really 96 kbps per channel. Maybe they are good because the mic is quite good (AKG perception series with this spider-web rubber band suspension)

deva

steve · April 30, 2014, 6:36am

Probably best not to assume
I’m not on Mac so I don’t know what tools are available to you. Perhaps if you have QuickTime, that may give you more information about the format.
If you’d like to post a very short test file to the forum I can check the format for you.

Keep in mind that most MP3s on the Internet are really badly encoded, or, very often have been re-encoded several times. Even at quite a low bit rate, MP3s can sound quite reasonable - it depends a lot on how demanding the audio is and how well it has been encoded.

If you intend to do any editing / processing of the file, it should be in the best quality possible, ideally in WAV format. Audacity can only work with uncompressed audio, so when an MP3 is imported, Audacity must decode it. If you then export from Audacity in MP3 format, Audacity (or Lame to precise) re-encodes it and in so doing adds a bit more damage.

MP3 encoding damage cannot be undone or repaired. Some of the audio information is lost each time it is encoded. Best to use a “lossless” (uncompressed) format such as WAV throughout, then only encode at the end (if MP3 is the format you need).

dev · April 30, 2014, 7:01am

I dont get along with quicktime, but thought that exiftool would be the best to dig out metadata? It’s available for windows too, command line. The second post in this topic here is the exiftool readout. ExifTool by Phil Harvey

okay: to make it short but keep the metadata, I started an ftp download and interrupted it right away. hope this works. thanks! maybe I should add that the content is what I would cut away, but it has a voice piece on the quieter side.

it’s not from files posted on the internet, I ftp the original recordings and prepare them for posting. trying to be as undestructive as possible. going through wav or aiff always, mp3 only for the final export. Played with bit rates, 32 does not work, but 48 is fine so far.

yes, recording them as wav would generate too large files, I get that they are recorded as mp3s. mp3s are 80 MB anyway because of the length.

thanks
deva

dev · April 30, 2014, 7:11am

maybe I should consider exporting with VBR. I thought it would destroy the quiet parts, but it seems to be good for them, according to an ancient post

I thought VBR would destroy the quiet parts by disregarding it as “almost nothing”.

steve · April 30, 2014, 12:48pm

Here’s the file info:

First file:

Format                                   : MPEG Audio
File size                                : 520 KiB
Duration                                 : 44s 355ms
Overall bit rate mode                    : Constant
Overall bit rate                         : 96.0 Kbps

Audio
Format                                   : MPEG Audio
Format version                           : Version 1
Format profile                           : Layer 3
Mode                                     : Joint stereo
Mode extension                           : MS Stereo
Duration                                 : 44s 434ms
Bit rate mode                            : Constant
Bit rate                                 : 96.0 Kbps
Channel(s)                               : 2 channels
Sampling rate                            : 44.1 KHz
Compression mode                         : Lossy
Stream size                              : 520 KiB (100%)

Second file:

Format                                   : MPEG Audio
File size                                : 265 KiB
Duration                                 : 22s 569ms
Overall bit rate mode                    : Constant
Overall bit rate                         : 96.0 Kbps

Audio
Format                                   : MPEG Audio
Format version                           : Version 1
Format profile                           : Layer 3
Mode                                     : Joint stereo
Mode extension                           : MS Stereo
Duration                                 : 22s 595ms
Bit rate mode                            : Constant
Bit rate                                 : 96.0 Kbps
Channel(s)                               : 2 channels
Sampling rate                            : 44.1 KHz
Compression mode                         : Lossy
Stream size                              : 264 KiB (100%)

Yes I get that. Just commenting that MP3 often has a worse reputation than it deserves due to the abundance of dreadful examples that are commonplace on the Internet.

dev · April 30, 2014, 4:28pm

true, that I forgot to mention. I really thought 48 kbps must sound close to a cell phone, because some people recommend 128 mono for a podcast. Good to know.

thanks for the metadata, they don’t say whether constant or variable bit rate, do they. But next time I find a critical piece I try export with VBR vs. CBR and see if VBR can save some more space.
(what were you using to get them? I use win occasionally, but don’t know enough to use a command line tool)

thanks for helping out, both Gale and Steve!
deva

steve · April 30, 2014, 5:06pm

I’m on Linux (Debian).
I used “MediaInfo”, which is a cross platform program (MediaInfo - Download). I don’t often recommend it these days because the Windows installer includes “bundleware” (MediaInfo). The Linux version doesn’t, and it’s a good program. I don’t know about the Mac version.

Gale_Andrews · April 30, 2014, 7:09pm

So I think that means that if you halve the bit rate for a “joint stereo” file that you made mono, you will almost certainly make it sound worse.

On the other hand, if you have a stereo track and export it as stereo or joint stereo at a given bit rate or quality, then make the track mono and export that at the same bit rate or quality, the mono version will have better quality than the stereo or joint stereo export.

In my experience, Koz’s statement (with reference to file size reduction) and your concern (about quality) are both correct. VBR is not kind to extremely quiet but complex sounds like pianissimo strings unless you use a high quality setting. If you only want a small to moderate size file, CBR may actually be better for classical material with extreme dynamic range.

Yes, “Overall bit rate mode = Constant”.

Gale

steve · April 30, 2014, 11:46pm

Yes, though how noticeable that will be depends on the audio and what other settings were used.

For a given bit-rate yes.
For a given VBR quality setting, converting a stereo track to mono will reduce the bit-rate and hence the file size but the sound quality should be about the same.

That’s an interesting point.

VBR will reduce the bit-rate for very quiet audio and will increase the bit-rate for loud complex audio. The assumption is that very quiet audio is less audible and so slight deterioration of the sound quality should not be very noticeable, whereas deterioration of loud audio will be more noticeable.

I’ve just tried a test.
Starting with a piece of music that contains a wide range of frequencies and plenty of sharp attack (making it more difficult for the encoder), I first converted to mono so as to disregard any effect that stereo / joint stereo might have. The music was taken from a classical piece and I selected a part where the loudness did not vary too much so that I could control the overall level more effectively.

Export as MP3 VBR “Standard” quality
Normalize to 0 dB then export again (same settings)
Normalize to -30 dB (peak) then export again.
Normalize to -40 dB then export again.

The actual bit-rate for each was:

85.7 Kbps
92.0 Kbps
76.8 Kbps
64.9 Kbps

Even for the very quiet sample, the sound quality is no worse than the Audacity default.
For the very loud sample, the bit-rate was 40% higher than for the very quiet sample.

I then re-imported the 0 dB sample and checked the peak level. The Amplify effect reported the peak level to be 0.0 dB. Nyquist reported the peak level as -0.039 dB. Zooming in and carefully inspecting the highest peaks showed no sign of clipping.

The conclusions that I would draw from this is that for classical music with a wide dynamic range, the audio should be normalized close to 0 dB before encoding, and use a quality setting of at least “Standard” (-V 2).

Gale_Andrews · May 1, 2014, 6:29pm

The normalize to 0 dB might help but if you don’t want that (for example you are applying Replay Gain) then you might still prefer “insane” 320 kbps (that is, CBR).

If there are wide variations in amplitude within a piece then under VBR this translates to variations in quality within the piece. The variations tend to be noticeable.

Gale

Robert_J_H · May 1, 2014, 7:58pm

I would definetly not go for 0 dB as this often results in a higher peak in the mp3, especially when mode is VBR and the average is 96 kbps.
One should at least reload the created mp3 to be sure.
The quiet passages could probably be encoded with a higher min kbps value. Audacity’s standard seems to be 32, no matter what the quality is.
The exact command for mono with V2 is:

Encoding settings                        : -m m -V 2 -q 3 -lowpass 18.6 --vbr-old -b 32

I’ am not sure why the old VBR mode is chosen, I seem to have some mp3s with the new setting–perhaps a question of mono or stereo.

steve · May 2, 2014, 12:14am

Personally I’d not trust normalizing to 0 dB, but “close to” 0 dB definitely helps to get the best out of VBR.

The issue with Lame encoded files having a higher peak level than the original is most evident when dealing with audio that has been heavily compressed / limited. For audio that has a large dynamic range (ie. not compressed or limited) I see little evidence of the issue.

I don’t see that using Replay Gain is a reason to not normalize to a fairly high level. It should not make any difference to Replay Gain.

kozikowski · May 2, 2014, 6:12am

Those of us who used and loved QuickTime 7 were relieved when, in the face of forcing everyone to use QuickTime X, Apple quietly made QuickTime 7 available as a Utility.

Enclosed, the same music file in the two different media inspectors. The darker one is QuickTime X. Please also note that QT7 has barely visible spectral sound meters to the right.

Koz
Screen shot 2014-05-01 at 11.10.03 PM.png
Screen shot 2014-05-01 at 11.05.33 PM.png

steve · May 2, 2014, 8:40am

Why the different “data size”?

Gale_Andrews · May 2, 2014, 11:32am

Yes. I don’t work with “compressed to death” tracks (or I “fix” them first - why I like your Expander / Compressor ).

I agree “close to 0 dB” is a better general recommendation.

The only point I meant about ReplayGain was just if you actually normalize to the analyzed ReplayGain level rather than use a ReplayGain tool to encode the gain to the metadata. You might do that if you still have a player that is not ReplayGain capable.

Gale

steve · May 2, 2014, 11:59am

Ah yes, I see, but for music with a large dynamic range, the Replay Gain level will be fairly high, so that should not pose a problem.