Peak normalization for M4A (ALAC lossless) files

Joshua277456 · September 26, 2014, 4:30pm

Hello,

First of all I’m using Audacity 2.0.5 on Windows 7 Professional 64-bit

What I’m actually trying to do is just normalize the peaks of several M4A loss-less files. Or, in other words, amplify the waveform until the highest peak of each track is just under clipping. I want to do this because some of the songs in my music collection is just so much quieter than other songs.

Now I know how to do this by selecting the track, and then clicking effects > amplify > and then deselecting “Allow clipping”.

But the thing is, I was reading about the different effects than can be achieved by selecting different dithering options under " edit > preferences > quality " and the people were saying something about “square dithering” will add a tiny amount of noise to the completely silent (zero amplitude) portions of the track but will also help with harmonic distortion and dynamics.

I don’t want to do any of that. I don’t want to add any noise, improve dynamics, or help THD. I don’t want to “re dither” the songs, I just want to normalize the peaks by amplifying the waveforms without any sound quality degradation

My question is, what do I set the dithering options to if I don’t want to add or remove any noise or change the dynamics of anything. Just amplify the waveform. Also, I read that you can export the tracks to M4A loss-less by exporting to an external program by selecting "file > export selected > then choosing “external program” and then typing in ffmpeg -i - -acodec alac "%f. Will doing this using the command prompt affect sound quality or anything like that?

I just want to import M4A lossless, amplify without clipping, and then export M4A lossless without changing the waveform and without sound quality degradation.

steve · September 27, 2014, 1:54am

OK, that’s clear.

“Allow Clipping” is deselected by default. Leave it deselected.

Strictly speaking, and by that I am talking about mathematical exactitude rather than necessarily audible, that is not actually possible unless you are amplifying by a power of 2 (amplifying by x 2, or x 4, or x 8 … or x 1/2, or x 1/4, or x 1/8, …).

The reason is because the precision of digital audio is limited by the bit-depth of the format. For 16 bit ALAC lossless, sample values have a range from −32,768 to 32,767, whole numbers only. Let’s say that you amplify your original audio by 1.5 dB, that’s amplifying by 1.1885022274370 on a linear scale. That means that each sample value will be multiplied by 1.1885022274370. Let’s say that one of the original samples has a numerical value of 100. After amplification by 1.5 dB, the numerical value should be 118.85022274370, but you are then exporting as 16 bit ALAC, which only supports integer values, so that will be rounded up to 119.

Now let’s say that another sample in your audio has a numerical value of 999. After amplifying by 1.5 dB, the sample value should be 1187.31372521, but again we are saving in a 16 bit integer format, so that will be rounded down to 1187.

The result of these rounding errors is “noise”.

The question “to dither or not to dither” is not a question of whether to introduce noise or not introduce noise, it is a question of “what sort of noise do we prefer”.

The noise created by simple rounding (not using dither) includes a large proportion of “harmonic noise” (arithmetically related to the frequency of the original audio).
The noise created by “rounding with dither” is randomised so as to avoid harmonic patterns.
The noise created by “rounding with shaped dither” is randomised so as to avoid harmonic patterns, and to shift most of the noise into the high frequency range where it is least audible at normal listening levels.

Attached are two files, one that contains “rounding noise” produced by amplifying a 440 Hz sine wave by 1.5 dB. The other contains shaped dither from the same process with the same 440 Hz tone. The original 440 Hz tone has been removed from each. Note that both files are in 32 bit float format so as to avoid adding more noise.

If you wish to turn dither off in Audacity, go to “Edit > Preferences > Quality” and set “High quality conversion → Dither” to “None”. As described above, processing audio will then produce rounding errors (quantization noise) rather than dither noise.

You’ve missed off the final quote ("). The command is:

ffmpeg -i - -acodec alac "%f"

kozikowski · September 27, 2014, 2:09am

I just want to normalize the peaks by amplifying the waveforms without any sound quality degradation

Maybe not in Audacity. Audacity is not a “Wave Editor.” It doesn’t work on the original music. It converts your song to its own very high quality sound format and edits that.

The problem comes when you have to make a new song with your corrections in it. Audacity has to make a whole new sound file and downsample to get it. Some of the super high quality digital samples are going to go into the cracks between the new coarser, less-accurate samples of the new file and create distortion. That’s what we’re preventing with the dithering process. Dither makes it unlikely that any of the errors accumulate/line up. You can certainly turn it off in Preferences.

Audacity > Edit > Preferences > Quality > High Quality Conversion.

You’re not likely to get away with doing this anyway. I’m betting M4A-alac plays tricks to get there. I wonder what happens if you try to alac an alac file. Most compression systems depend on a perfect show going in. You will not have a perfect show going in.

Did you try and export an M4A-alac file just to see if you could?

And last step. If you allow either Amplify or Normalize to clip, that will create distortion. Flat Guaranteed. Whatever sound wants to clip (because it got too loud) will acquire a harsh, grittiness to it. That’s what clipping sounds like.

Whatever music you have that doesn’t match is likely to be produced that way; one song will be denser than the other, not just louder. You can’t easily make up the difference by just turning up the volume on the quiet ones. You might get away with reducing the louder ones. That might work. I think that’s what iTunes does.

Koz

DVDdoug · September 27, 2014, 4:02am

Amplifying the file digitally is no worse than analog amplification. Either way you are going to turn-up existing noise & distortion along with turning up the signal. If you have a noisy original recording the noise may become noticeably worse.

You are not going to hear the rounding errors, and at 16-bits or better you are not going to hear dither or the effects of dither (or the lack of dither) under any reasonable-normal listening conditions anyway. If it was me, I’d leave the dither turned-off.

I want to do this because some of the songs in my music collection is just so much quieter than other songs.

Normalizing may not work… It may work with a small selection of songs, but normalizing your entire music library won’t make all of the songs equally loud. It will of course, make all of your files as loud as they can be without clipping.

Koz was trying to address that… I know what Koz was trying to say but I’m not sure if it was clear. So, I’ll take a shot at it… The peaks don’t correlate well with loudness. Many quiet-sounding songs have normalized/maximized 0dB peaks. That means to match the volumes of your files, you generally have to reduce the loud files since you can’t boost the quiet ones.

Something else you might try -
[u]WaveGain[/u] is a variation of ReplayGain that works on WAV files to match the loudness. It has an advanced algorithm to analyze the “loudness” and adjusts your files to a standardized volume. You’d have to convert your files to WAV and then back to ALAC.

As I mentioned above, in order to match volumes you have to reduce the loud tracks. ReplayGain, WaveGain, an MP3 gain will reduce the volume of many (maybe most) of your files. That’s the only way to do it without using dynamic compression or otherwise monkeying with the sound.

WaveGain and MP3Gain change actual files (similar to what you’d do manually in Audacity). The advantage to these is that you are changing the volume of the file so you don’t need a special player or any particular player software for them to work.

The “original” ReplayGain version works differently and it doesn’t touch the audio data. It just adds a tag to the file, and if your music player software supports ReplayGain, the volume will be adjusted up or down automatically at playback-time. Apple’s Sound Check is similar to ReplayGain.

Joshua277456 · September 27, 2014, 7:38pm

The noise created by simple rounding (not using dither) includes a large proportion of “harmonic noise” (arithmetically related to the frequency of the original audio).
The noise created by “rounding with dither” is randomised so as to avoid harmonic patterns.
The noise created by “rounding with shaped dither” is randomised so as to avoid harmonic patterns, and to shift most of the noise into the high frequency range where it is least audible at normal listening levels.

I’m under the impression that the amount of noise added when converting, amplifying, exporting, dithering, etc. is basically inaudible at lower volume levels when listening. Is this correct?

So what dithering and conversion option(s) would I choose to add the smallest amount of audible noise and keep the waveform as close as possible to the original (after amplification/conversion)? Are you saying that there is way to do all this and pretty much never hear the difference after what I am trying to do.

Thanks for your help

kozikowski · September 28, 2014, 6:05am

Are you saying that there is way to do all this and pretty much never hear the difference after what I am trying to do.

You will hear the difference. The song will be louder.

But you won’t hear the corrections, no. The Audacity default dither is the one least likely to be observed in action (attached). It’s slowest because it takes the most care in application.

We may be struggling to get past the words. It’s called “Dither Noise” because it’s a very precise signal that is intentionally random. Random signals are called “noise” by convention.

Koz
Screen Shot 2014-09-27 at 22.57.29.png

steve · September 28, 2014, 12:23pm

Download the files that I posted here: Peak normalization for M4A (ALAC lossless) files - #2 by steve
The files can be played in Audacity.
Do they sound “silent” when you play them at normal listening levels?

“Shaped” dither is generally considered to be the least obtrusive (and is the default in Audacity).

Joshua277456 · September 30, 2014, 12:20am

Download the files that I posted here: > Peak normalization for M4A (ALAC lossless) files - #2 by steve
The files can be played in Audacity.
Do they sound “silent” when you play them at normal listening levels?

“Shaped” dither is generally considered to be the least obtrusive (and is the default in Audacity).

The shaped dither is completely silent at normal listening levels and is just a bit lower than the noise floor of my headphone amp anyway, and much lower than the noise floor of my music.

You say that shaped dither is the least obtrusive, do you mean that it adds the least amount of noise, or, after conversion/amplification, it keeps the waveform as close as possible to the original even though it will never be exactly the same

steve · September 30, 2014, 12:43am

If by “least amount of noise” you mean “sounds quietest”, then yes.

Technically, it gets a bit complicated. The “peak level” of shaped dither is a little higher than other types of dither (such as triangle or rectangle dither), but that is because it is “shaped” to move noise frequencies away from the 3 - 4 kHz range, which is where hearing is most sensitive to low level noise. The noise is shifted so that it mostly occurs above 12 kHz, which is where hearing is least sensitive to low level noise. Thus the resulting audio is more “truthful” to the original sound (it sounds most like the original - so much so that in “double blind trials” listening to high quality music, no difference is detectable).

Joshua277456 · September 30, 2014, 1:50am

Thus the resulting audio is more “truthful” to the original sound (it sounds most like the original - so much so that in "> double blind trials> " listening to high quality music, no difference is detectable).

That is what I was trying to get at. I just really wanted to know if what I am trying to do is going to alter the original sound significantly enough to be audible. I don’t claim to have golden ears, but I do believe my ears are at least slightly more tuned than that of an average, casual music listening. I’ve been listening to music on relatively high-end audio equipment for years now. Such as headphones costing $150+, DACs costing $100+, phono cartridges costing $100+, etc; You get the point.