FLAC size and conversion accuracy

n.kontaras · March 9, 2019, 1:57pm

Hi! I have two possible “issues” I’d like to report, with maybe some help possible from your side or simply are “bugs” of Audacity.

The first one is this peculiar occurrence: If I import a FLAC file and then export it as another FLAC with the same exact settings, the new file will be larger. The particular file I used went from 265MB to 269MB. Doing this process another time results in a file of 271MB. I didn’t continue this, but my guess is that the file keeps increasing, maybe by smaller increments every time.
Observing this difference in sizes, I wondered if the conversion is indeed bit-perfect, as FLAC is supposed to be. Using the Invert->Render->Amplify method I found in this forum, I saw that indeed the 2 FLAC files are not exactly the same, an amplification of around 20dB reveals a very low-level even-volume electronic noise. Exporting the initial FLAC file as a WAV 16-bit results in a similar phenomenon. Exporting in 32-bit WAV is a perfect match, and the only export format in Audacity I’ve discovered to be so.

Using another software to convert from FLAC to WAV and back to FLAC, and comparing again in Audacity results in a perfect match too, so this is not the FLAC itself causing this (good news!).

I hope this helps, and hopefully you can enlighten me if there is any way to alleviate this, thank you .

Greetings,
Nick

steve · March 9, 2019, 2:16pm

Audacity works internally in 32-bit float format.
FLAC files are either 16-bit or 24-bit.

There is therefore “down sampling” when exporting from Audacity to FLAC (the sample values are rounded from 32-bit floating point values, to 16 or 24-bit integer values.

By default, Audacity applies “dither” when reducing the bit format, so as to avoid “quantization noise”. This behaviour may be turned off in Preferences (“Edit menu > Preferences > Quality → High-quality conversion: Dither = None”).

More information about dither in Audacity: https://alphamanual.audacityteam.org/man/Dither

n.kontaras · March 9, 2019, 2:35pm

Hi Steve,

Thank you for the reply. I verify that this option you suggested resolved both the size difference and the noise. This brings two questions in mind:

If the Dither Shape option has such a downside, there must a purpose for it right? Or to put it differently, what is one losing by not having this option to Shape? Particularly in my case where the purpose is to simply merge audio files together and export to mp3 or FLAC.
Is there a way to import directly to 16-bit 44.1Khz? In this way the down-sampling would not need to occur… I did choose the 16-bit option in preferences, but this did not resolve it, so apparently if I follow what you say the 32bit initial import is inevitable in the current version.

Thank you for your help,
Nick

steve · March 9, 2019, 3:26pm

It’s a fairly small downside really. For 16-bit audio, the dither noise is around -84 dB RMS, which at normal listening levels is virtually inaudible.
Yes there’s a purpose to it.

(I’ll refer to 16-bit for simplicity, but the same also applies to 24-bit)

For most Audacity use cases, audio is not only edited (cut / paste / delete), but also “processed” (fade in / out, amplify, normalize, effects, …). If there is any processing of any kind, then sample values will almost certainly NOT be exact 16-bit values, but will mostly lie between the 16-bit integer values. When exporting to 16-bit, the sample values have to be rounded to exact 16-bit integers. The difference between the “true” value and the rounded value is called the “quantization error”.

There are several ways that rounding could be done: rounding to nearest, rounding up, rounding down,…
The problem with all of these rounding methods, is that for tonal sounds, the errors form patterns in a similar way to moiré patterns. The result is that discordant tones are produced. This “quantization” noise is pretty low level, but quite unpleasant. What “dither” does, is to randomise the quantization errors, so preventing these discordant tones from forming. In effect, it is replacing one kind of noise with another, less intrusive kind of noise.

The effects of quantization noise and dither noise are more obvious at low bit-depths. Taking it to an extreme, these two files are simply a 440Hz tone that is fading out, and have been reduced to extremely low bit-depth. You will notice that the dithered version has constant hiss, but the un-dithered version has very noticeable “distortion”.

In real life (16 or 24 bit formats), the advantages of dither are greater than in the above example, because hearing is less sensitive to high frequency hiss when it is at a very low level, compared to lower frequency tones at a similar level.

steve · March 9, 2019, 3:30pm

Yes, but don’t do it. Even if you import as 16-bit, Audacity still works internally at 32-bit float, so there is still down-sampling during export.

If you are only editing, and not doing any processing at all, then to avoid adding dither, you currently have to turn it off in Preferences.

I’m hoping that in the future there will be an option to turn off dither in the Export dialog so as to make it more convenient, though there has been some opposition to this idea.

n.kontaras · March 9, 2019, 3:56pm

Hi Steve, I appreciate the response. I will keep the Dither to None if I don’t do any processing and Shape if I do . Is it easy to explain how quantization errors enter the picture when importing at 16-bit file and doing no processing at all? Because if there is no processing, the other 16 bits of the 32bits should be zero right? And then when exporting nothing would need to be truncated

steve · March 9, 2019, 4:03pm

Floating point format is a bit more complicated than that, but the basic idea is correct - when converting from 16-bit to 32-bit and then back to 16-bit (without modifying any sample values), the conversions can be done exactly without any need for rounding.

n.kontaras · March 9, 2019, 4:15pm

Regarding another “option” to this, in line with the added option for exporting with Dither None as per your suggestion, it could be automatically set to None if there was no detected need for it in a Project ( for example there was no processing involved or other operations that require it etc.). Very enlightening discussion, I now am maybe 1% less ignorant of what is going on in Audacity . Regards, Nick

steve · March 9, 2019, 4:43pm

I’ve thought about this myself, but it is difficult to detect if dither is required, without making the export process slower (which may not be acceptable for people that work on very large projects).

One difficulty is that Audacity makes use of a large number of “importers” (for importing different file types). There is no guarantee that an importer passes data to Audacity in exactly the same bit format as the file, for example, MP3 does not have a fixed bit format (MP3 is not PCM encoded).

Another difficulty is that the project may have been saved, Audacity closed, and then the project reopened, in which case Audacity would not know where the track(s) came from, or what has been done to them.

I think the only way this could work, would be for Audacity to scan all of the “blockfiles” (the .AU files in the “_data” folder) to see what format they are, prior to exporting, and if they are ALL the same or lower bit-depth than the export format, then disable dither.

n.kontaras · March 9, 2019, 9:02pm

It is certainly something to look at, and I support the initiative to somehow change or modify the Dither option such that the default Audacity settings can produce bit-perfect audio files . As an experiment, I used a Fade in and compared the results: Dither None is a perfect match with no noise, Dither Shaped, the noise is here, so apparently Dither Shaped is not always necessary, it may be be nice to really look at this because Dither shaped may not be needed by many users, and I noticed it adds up the more you import and export tracks, after a few times of doing this the noise already becomes audible… also adding overhead to the file size when exporting with FLAC, and likely other codecs. It’s a well-made software and I hope it helps improve it even further, Nick

steve · March 10, 2019, 12:46am

That’s an interesting phenomenon. The amount of “overhead” depends on the complexity of the original sound. If the original sound was highly complex and had a large significant of high frequency content, then there would probably be very little, if any, increase in file size.

n.kontaras · March 10, 2019, 1:23am

Indeed, I can share with you the exact file for examination if you like. It is certainly not “mainstream” music, it includes harmonic chanting which might induce frequency content not commonly used. After some more investigation, it seems things are not as straightforward indeed. Namely, Dither None seems the best option when using wav files ripped from a CD, but Mp3’s are a different story. In the latter case, noise appears anyway, and Dither Shaped seems the more preferable option. I guess Audacity developers knew something… You did mention that different importers create interesting phenomena, what is going on with the Mp3 importer that creates this? Are the 17-32bits populated in this process? The only thing I do is import the Mp3 and then export to FLAC and there is already a difference, no matter what Dither setting I choose.

In another note, because space is not the issue (I am interested in this high quality for only a handful of albums), would 32bit WAV or 24bit Flac be valid options to play from my phone (Samsung Galaxy S8 android 9.0) or would it cause more issues than the supposed benefit of higher audio quality?

Regards,
Nick

DVDdoug · March 10, 2019, 3:13am

would 32bit WAV or 24bit Flac be valid options to play from my phone (Samsung Galaxy S8 android 9.0) or would it cause more issues than the supposed benefit of higher audio quality?

I assume you can play anything on an Android device, depending on what player software you’re using.

The guys who do [u]blind ABX tests[/u] have pretty-much demonstrated that there is no audible difference between a “high-resolution” original and a copy downsampled to “CD quality” (44.1kHz, 16-bits). In other words, 44.1/16 is better than human hearing.