I am using ffmpeg-python because i have a lot of audio to process and it’s very tedious to do all the operations by hand using audacity. But the quality difference between using audacity to resample (and normalize) is disappointing.
Just now came across Secret Rabbit Code, which i might try wrapping to use if it will help.
But would it be possible to find out what Audacity uses underneath and if i could maybe borrow that code instead?
How much is it clipping? Does your input file go right up to 0 dB or have you left a little headroom?
If the input goes all the way up to 0 dB, then it is very likely that there will be a tiny bit of “clipping” when resampling (the audio may not actually be “clipped” even if Sox or Audacity detect it as clipped. It may just be “touching” the max / minimum values.)
Slight clipping is more likely when up-sampling to a higher rate, but may occur (with both Sox and Audacity) with any resampling if the input is at or very close to 0 dB, especially if the audio is heavily compressed (dynamic compression / limiter effect).
The solution is to allow a little headroom.
I’ve not used the python wrapper, but with the command line version of Sox you can do:
sox input.wav -r 16000 output.wav gain -1
which resamples “input.wav” to 16000 Hz “output.wav” and reduces the gain by 1 dB (giving you 1 dB of headroom to avoid clipping a 0 dB input)
This is what happens to a square wave (a “worst case” example), generated at a peak amplitude of 1 (0 dB) with a sample rate of 44100, resampled to both 16000 and 192000 Hz (without any headroom).
Yes, you nailed it. It was going to 0 dB. By the way i still haven’t got used to the idea of using negative db values … it seems counterintuitive to me? Thanks so much. I have managed to clean up my audio by reducing the level as you suggested before resampling. Now it’s nice even at 16khz! Was going crazy for the last few days.
Will study the rest of your post but it’s a bit over my head at the moment. Appreciate it.
By the way i still haven’t got used to the idea of using negative db values … it seems counterintuitive to me?
yes it does look a bit weird at first, but really it has to be that way when dealing with signals.
The dB scale is a logarithmic “ratio” rather than a “unit”, so dB is always measured relative to an absolute reference level. The absolute reference level is the “0 dB” level, so everything above that level is a positive value, and everything below is a negative value.
When measuring “Sound Pressure Level” (the level of a sound in air), the 0 dB level is set at “the threshold of hearing” (“20 micropascals” in SI units). Thus most “sounds” are measured with positive values (though audio labs for scientific research may be much quieter than this).
When measuring signals, there is no direct equivalent to “threshold of hearing”, but it is essential to have a reference level. The reference level that everyone uses for audio signals is “full scale”. That is, the full height of an Audacity track, or a linear value of +/- 1.0 is the 0 dB level. This scale is sometimes written as “dBFS” (dB with reference to Full Scale). As “valid” signal levels are below the 0 dB reference, they are negative.