Importing AC3 5.1 In Audacity vs FFMPEG

Hello,
I am using Audacity 2.0.0, installed through exe installer, on Windows 7 64-bit.
I have a multichannel 5.1 AC3 movie track that I am trying to transcode into another format through an external command line encoder. I tried this through Audacity, and it worked fine. Output had a bitrate of 198 kbps (disregard that it’s a low bitrate because that’s beyond the point). However, when I do the same thing through ffmpeg using the same quality settings, I get a 220 kbps bitrate output stream. Through some tests, I came to the absolute indisputable conclusion that Audacity is applying some filtering to the AC3 input which is leading to this discrepancy, whereas ffmpeg is simply decoding the AC3 untouched. I checked my Audacity options, and I couldn’t find anything indicating input filtering of any kind.

Also worth noting:

  • Command used in Audacity: external-encoder.exe -quality n -options - “%f”
  • Command used in ffmpeg: ffmpeg -i input.ac3 -acodec pcm_s16le -f wav - | external-encoder -quality n -options - output.xxx
  • When doing the same experiment using a 2 channels stereo track, the output files of Audacity and ffmpeg were identical.


    So my question is what is Audacity doing, and how can I prevent it from doing whatever that is, when loading the AC3 5.1 audio?

OK, I got my answer in another forum, and it’s the “Dither” option.

Now I read the Audacity wiki regarding “dither”, and I still have some questions:

  • How could “dithering” lead to input needing a lower bitrate when encoding in VBR mode if it’s harmless? Wouldn’t that be the equivalent of smoothing out a picture or video before compressing?

  • Wiki states that Audacity does its calculations in 32-bit float, and so when working with 16-bit or 24-bit input, it’s best to use dither for better accuracy when rounding. My question is: can I set Audacity to use 16-bit calculations internally when I know beforehand that I will be working on 16-bit input, as to avoid this whole rounding in the first place? If yes, is this what the “Default sample format” option is in the Quality tab??

  • Wiki also states that when working on 16-bit audio with simple editing (cut, paste, trim…) rather than processing (amplifying, equalizing, filtering…), then it’s better not to use dither on exporting output. My question is: when downmixing 5.1 to stereo, I usually use a -3db gain on the Center and LFE channels. Is this considered “processing” and thus would be needing “dithering”?
    (This question is really moot if I can set Audacity to do its internal calculations in the same bit depth as my input material, as I prefer the less accurate calculations then the whole rounding + random noise thing).

Before I saw this post I replied to your other question here: https://forum.audacityteam.org/t/audacity-appears-to-be-increasing-the-bit-rate/22513/15
However there are a few points relating to your specific job so this is addition to my other reply.

See my other reply.


It’s a similar process to what is done when compressing a large uncompressed TIFF or BMP image to a smaller JPG or PNG image. So as to avoid “steps” where there should be smooth curves, the steps are “smoothed” by “dithering” edges between colours, or in the case of audio, to smooth the “quantize steps”. The overall impression of the image is (usually) a subjective improvement to the picture quality - similarly the overall impression of the sound is (usually) a subjective improvement to the sound quality. However, if you zoom in close on the dithered image you will notices some “blur” along edges - similarly if you turn up the volume really loud on the dithered audio you will notice some “blur” (hiss) near the threshold of silence.

No. Audacity always performs processing calculations in 32 bit float.

Processing in 16 bit would not avoid rounding - in fact it would make rounding more of a problem because every step of every calculation would be restricted (rounded) to the nearest 16 bit value. This is precisely why 32 bit float is used internally. Ideally, all processing calculations should be done precisely, then rounding is only required once at the end (when you export).


The “Default sample format” just sets what format will be used by default when recording into a new project. This preference option is somewhat redundant in current versions of Audacity as the results will invariably be better with 32 bit float (with or without dither), but it was a necessary option in Audacity 1.2.x (obsolete) because of the limitations in using multiple sample formats at the same time.


Yes, that is “processing” so dither is likely to be beneficial, but if you want to keep absolute silence as absolute silence, then you should either turn dither off (not recommended) or use “Rectangle” dither. Rectangle dither is probably a little “louder” than shaped dither (but still pretty quiet) but has a softer “shhh” sound rather than the “ssss” sound of shaped dither, and has the advantage that absolute silence remains as absolute silence.

Steve,
Thx for your replies. Very informative stuff.
Though as discussed in the other thread (let’s continue this here from now on), the bitrate discrepancy is not what one would expect. I mean as I said, the lower bitrate corresponds to the dithered audio, and higher bitrate to the untouched audio. So that’s what I’m having a tough time rationalizing; how can dithered audio need less bits to represent when compressed?

Anyways, as I said before, I am only transcoding the 5.1 audio, with no processing, a case which the Audacity wiki suggests to be the exception (where it is recommended to turn off dither for Lossless output, even in PCM audio, i.e. dithering a 16-bit untouched unprocessed audio would unnecessarily change that audio == lossy operation).

Also, I am using Fraunhofer AAC encoder, not MP3. I didn’t want to mention the encoder at first to keep things in general terms, but since the bitrate discrepancy might be a function of this particular encoder, I’m not sure anymore if it’s by design, or a bug. Should I question FHG AAC, or should I trust it even still?

Finally, regarding the testing I did, I still can’t understand why the stereo track yielded the same output (or bitrate for that matter) with and without dither. Did it have to do with 5.1 vs 2.0 or simply the difference in the actual audio content?

You’re painting a very different picture to what your other post suggested.

How are you doing that? What is the the original format, what is the final format, and how are you accomplishing the transcoding?

Steve, check my first post in this thread. I explained what I’m doing there.
But to recap, I am transcoding AC3 5.1 (.ac3 file) to AAC 5.1 (.m4a file) through external encoder (piping). As I said, the commands used are mentioned in my initial post.

Also, please disregard the part regarding 5.1 vs Stereo. Upon further testing, the same thing is happening also for stereo 2.0 tracks. When I say same thing, it means transcoding 16-bit input (ac3, wav, mp3) to FHG AAC, without applying any modification to the audio, leads to the following:

  • Dither on = lower bitrate in output
  • Dither off = higher bitrate in output

So again, to me, this means that either the FHG AAC encoder is compromised, or applying dithering in audacity where there hasn’t been audio signal processing is “damaging” the audio.

To clarify:
My confusion is because of your question in the other topic: “I have come across a 16-bit audio track that is being compressed with a medium quality variable bitrate compression to 198 kbps with shaped dither on and 220 kbps with dither off. How can this be explained in terms of Audacity’s dither and compression encoder??” which sounds like you are comparing export from Audacity to MP3 (the other topic is about MP3 compression) with and without dither, and comparing the results, which I now see is not what you are doing.

You’re missing out important steps in the description:

You Import a ??? format track into Audacity,
which is imported as a 16 / 32 bit track ??
“Preferences > Import Export” is set to use “Custom Mix” so that you can export a 6 channel file.
You are then exporting to AAC 5.1 (.m4a file) through the external encoder option with the command

external-encoder.exe -quality n -options - "%f"

??? That is not a valid command??? What is the actual command that you are using?

Again, what is the actual command that you are using?

Note that transcoding is usually not lossless (unless transcoding to a lossless format or just changing the container without decoding / re-encoding).

OK, I thought I had mentioned that input was 6 channels 16-bit AC3, but maybe that wasn’t enough, so here’s some more specs: AC3, Audio Coding 3, CBR, 448 Kbps, 6 channels, 48 KHz.

Import/Export = Custom Mix (I can successfully export to 5.1 audio, no problem there).

Not sure, but it’s a 16-bit track. How can I set it to be imported as 16-bit or 32-bit? Only option I came across was the Default Sample Depth which is for recording. Anyways, channels panel on the left says 16-bit.

Actual export commands (didn’t know it mattered since I’m using the same command in Audacity and ffmpeg):

  • fhgaacenc --vbr 3 --ignorelength - “%f”
  • ffmpeg -i audio.ac3 -acodec pcm_s16le -f wav - | fhgaacenc --vbr 3 --ignorelength - out.m4a
    or
    ffmpeg -i audio.ac3 -acodec pcm_s16le -ar 48000 -ac 6 -f wav - | fhgaacenc --vbr 3 --ignorelength - out.m4a

Both ffmpeg commands yield the same result, so I’m just confirming that nothing wrong on the ffmpeg side.

I’m well aware that I am doing a lossy transcoding. It’s just that I was expecting the bitrate to be higher on the dithered material rather than the other way around. So I’m worried that either the FHG AAC encoder is broken, or Audacity is damaging the audio signal when applying dithering (in this case unneeded dithering since the audio signal hasn’t been processed or modified).

This is a limitation of Audacity’s FFMpeg import - it is always 16 bit integer.

When you import the file, does the track say that the sample rate is 48000 Hz?

Sample rate is 48 KHz after import, and after export also (both in Audacity and ffmpeg).

I’m on Linux so the commands on my machine are different, but seeing that you are using fhgaacenc for one conversion and ffmpeg for the other could explain why you are getting different results.

What (text) output is displayed when you export from Audacity?