big lo-fi audio files for wide audience: what codec?

I’d like to encode a bunch of long (~1-2 hours) files to make available over the internet to a diverse audience. Diverse as in, including people with 10-year-old PCs who think Internet Explorer 6 is “the internet”. The files are low-fidelity speech. I’m guessing I don’t have much choice but MP3; am I right about that? Can I use variable bit rate, or should I stick with constant? Thanks! --Allen

Some of the other guys with more eperience might be able to be more comprehensive, but I would recommend first converting the audio to mono, which will cut the file size in half. You can use variable bit-rate, but for speech I usually use constant 64kpbs. You can play around and see if you find the results acceptable. Chances are, if you the audio producer can’t tell that it is a low bitrate, then your audience won’t be able to either. Do keep your ears open for the “garbley” sounds resulting from bad/too much compression.

Make sure you have the latest version of Lame if you are encoding with Audacity, or it will sound trashy no matter what the setting. If you can clean up the audio before encoding (using lo/hi pass filters, noise reduction) to eliminate extraneous audio, then the chances of losing valuable audio during compression will be reduced.

Pretty much as orgelquaeler said.

MP3 is probably the best choice for a broad and diverse audience as it is widely supported.
“Speex” would give you much better compression, but is not widely known about (perhaps you could offer the downloads in both formats?)

I would go for VBR rather than CBR. Quality setting at 9 (smallest file size) should be good enough. You will probably be able to get smaller file sizes with low CBR settings, but the quality will be lower - you will have to experiment to find the optimum trade off between file size and quality.

Running the audio through a high pass filter (around 100Hz) and a low pass filter (around 4 to 6 kHz) will help reduce “gargling” type distortion and produce more intelligible mp3s with high compression ratios, and if using VBR will probably also reduce the file size a bit.

Definitely convert to mono.

Thanks, this is great advice. I’m confused about the mono/stereo thing, though. I’m actually working with the files in mono, but was thinking I should convert back to stereo because Koz told me last year that some software chokes on mono files. So earlier today I was testing… I encoded a mono snippet to MP3 CBR, and then I doubled the track to make a stereo file and encoded it to MP3 CBR, once with Joint Stereo selected and once with Stereo selected. To my surprise, all resulting files were the exact same size. I thought the stereo files would be double size. --Allen

That is not usually an issue with mp3 players. Koz was probably talking about CD burning software, some of which require files to be stereo (the data on audio CDs is always 16bit 44100Hz stereo).

That’s because you are using CBR (constant bit rate).

If you export at say 128kbps (128 kilobits per second) then the resulting file will have 128 kilobits for every second of audio. If this is a mono file then there is only one audio channel, so all 128 kilobits can be used to encode each second of that sound. If it is a stereo file, then the 128 kilobits are shared between the two audio channels. With a stereo file, when you select 128kbps CBR stereo, then effectively each audio channel is being compressed to 64kbps. With joint stereo there are some savings as the encoder partially combines the two channels to conserve band width. The result is that a mono file at a particular bit rate will be of the same quality as a stereo file at twice the bit rate (and about the same as joint stereo at 1.5 times the bit rate.

If you are using VBR, then the encoder is able to change the bit rate according to the complexity of the audio. Since mono files only have half as much data as stereo files, VBR is able to reduce the bit rate without affecting the quality.

Using a low pass and high pass filter may also helps to simplify the data for the encoder, so with VBR the file size can be reduced a bit more without affecting the quality. Note that this is not always the case - with some audio filtering may actually increase the file size but this is rare.

A more significant decrease in file size of VBR files can be achieved by reducing the sample rate. For voice you can probably take it down as far as 11025 Hz. Going lower than this will make the voice sound quite muffled as the sound will automatically be low-pass filtered a frequency a little below the sample rate.

When using CBR, very low bit rates require that the sample rate is quite low and you may get prompted to select a new sample rate.

The highest compression for intelligible voice is probably using mono files at 16kbps and 12000Hz sample rate. It will sound a bit warbley but still be relatively clear. (no need to pre filter as downsampling to 12kHz will do that). This will achieve compression of almost 11:1

For better voice quality you could use VBR quality setting 9 (optionally downsampling to 22kHz). This will give you a similar file size to 40kbps CBR but marginally better quality. (much better quality than 16kbps, but larger file size).

This is great advice, thanks; I hadn’t thought of decreasing sample rate. --Allen

Exactly that. AM radio top audio frequency is 5 KHz. The transmission can actually go way above that, but the radio receivers usually wack it off at 5K to avoid interference with the next radio channels up and down. Radio licenses in each city are chosen to avoid collisions, but at night, you can hear radio stations from four time zones each direction.

My license for Veritas CD burner demands stereo audio files. You’re mileage may vary. Consult your local listings. I actually produce everything in mono, but produce two-channel mono (stereo-ish) for burning.

A common error is to try and “help” the lame compressor out by pre-compressing the sound ahead of lame. Really Bad Idea. You should approach lame with the highest possible quality sound–the only exception might be to restrict the frequency response to 5KHz…maybe not.

MP3 and other audio (and video) compressors are designed to make maximum use of the existing quality. If the performance is the highest possible quality, then lame will produce an MP3 file preserving as much of that quality as it possibly can. If you messed with the show and created muffled trash ahead of time, then lame will try to preserve as much muffled trash as possible.