audacity resampling sounds a lot better than ffmpeg - why?

tensorfoo · July 8, 2021, 6:09am

I am using ffmpeg-python because i have a lot of audio to process and it’s very tedious to do all the operations by hand using audacity. But the quality difference between using audacity to resample (and normalize) is disappointing.

Just now came across Secret Rabbit Code, which i might try wrapping to use if it will help.

But would it be possible to find out what Audacity uses underneath and if i could maybe borrow that code instead?

steve · July 8, 2021, 8:30am

Audacity uses “libsoxr”, which is the sampling library used in SoX http://sox.sourceforge.net/

I’ve not tried it, but there is a Python wrapper available: https://pypi.org/project/sox/

tensorfoo · July 8, 2021, 8:53am

Thanks steve, i am trying the libsoxr python binding now. I’m finding resampling to 16khz clips my output but audacity doesn’t do that. Any ideas? It seem audacity is doing something smarter.

edit. oh! it turns out the clipping happens even before i resample. It happens when the data is loaded. So need to figure out how to tell ffmpeg not to clip my data.

steve · July 8, 2021, 10:28am

How much is it clipping? Does your input file go right up to 0 dB or have you left a little headroom?

If the input goes all the way up to 0 dB, then it is very likely that there will be a tiny bit of “clipping” when resampling (the audio may not actually be “clipped” even if Sox or Audacity detect it as clipped. It may just be “touching” the max / minimum values.)

Slight clipping is more likely when up-sampling to a higher rate, but may occur (with both Sox and Audacity) with any resampling if the input is at or very close to 0 dB, especially if the audio is heavily compressed (dynamic compression / limiter effect).

The solution is to allow a little headroom.
I’ve not used the python wrapper, but with the command line version of Sox you can do:

sox input.wav -r 16000 output.wav gain -1

which resamples “input.wav” to 16000 Hz “output.wav” and reduces the gain by 1 dB (giving you 1 dB of headroom to avoid clipping a 0 dB input)

This is what happens to a square wave (a “worst case” example), generated at a peak amplitude of 1 (0 dB) with a sample rate of 44100, resampled to both 16000 and 192000 Hz (without any headroom).

This is a square wave resample to 192000 Hz using Sox (original 44100 Hz track at the top, resampled version at the bottom)

sox input.wav -r 192000  192.wav gain -1

tensorfoo · July 8, 2021, 4:59pm

Yes, you nailed it. It was going to 0 dB. By the way i still haven’t got used to the idea of using negative db values … it seems counterintuitive to me? Thanks so much. I have managed to clean up my audio by reducing the level as you suggested before resampling. Now it’s nice even at 16khz! Was going crazy for the last few days.

Will study the rest of your post but it’s a bit over my head at the moment. Appreciate it.

steve · July 8, 2021, 5:20pm

yes it does look a bit weird at first, but really it has to be that way when dealing with signals.

The dB scale is a logarithmic “ratio” rather than a “unit”, so dB is always measured relative to an absolute reference level. The absolute reference level is the “0 dB” level, so everything above that level is a positive value, and everything below is a negative value.

When measuring “Sound Pressure Level” (the level of a sound in air), the 0 dB level is set at “the threshold of hearing” (“20 micropascals” in SI units). Thus most “sounds” are measured with positive values (though audio labs for scientific research may be much quieter than this).

When measuring signals, there is no direct equivalent to “threshold of hearing”, but it is essential to have a reference level. The reference level that everyone uses for audio signals is “full scale”. That is, the full height of an Audacity track, or a linear value of +/- 1.0 is the 0 dB level. This scale is sometimes written as “dBFS” (dB with reference to Full Scale). As “valid” signal levels are below the 0 dB reference, they are negative.