Sample rate downsampling/upsampling explanation.

This is coming from a newbie, so don’t laugh if it sounds ridiculous.

How many ways are there to reduce the sampling rate? For example: 96.000 > 41.000 (and such). Are the bits of information erased, or combined?

The way I see (the theoretical) combining of samples is: 12 15 13 17 16 27 23 24 22 21 > (1, 5, 3, 7, 6 become part of 1) and (7, 3, 4, 2, 1 become part of 2)

In other words, can a professional explain to me the mechanics of how the sampling rate is downsampled? I exported a few 96.000 files at CD quality, and I want to know how the program does it. And if there is any better(?) way to do.

Sorry if this sounds like insane gibberish to you, but this is someone who does not understand digital audio editing trying to understand digital audio editing :confused:

CD Quality is 44100, 16-bit, Stereo. Basic television sound is 48000, 16-bit, Stereo. They’re cousins of each other. Digital TV came later, so they had a little more room to work with.

The first number, sampling rate is basically the number of times a second that the system looks at the analog wave to see what’s there and assign it a number. You can see this in action by magnifying the Audacity blue waves enough. It will show you the sampling points (attached).

You can never change the analog sound faster than the digital system can look at it. The sampling will simply miss part of the sound.

Somebody else can do downsampling. Not only do you have to not damage the sound too badly, but you have to take steps to keep the two sampling rates from interfering with each other.

Koz
Screen Shot 2014-10-03 at 5.10.37.png

Let’s say that we have a simple waveform like this:
1.png
When we digitize the waveform, what we are doing is “measuring” the amplitude (vertical position in the drawing) repeatedly many times per second, as indicated by the green lines:
2.png
These measurements give us a series of points, indicated by the green dots. These are called “samples” or “sample values”.
3.png
Keep in mind that in the digital format, the red line joining the dots does not actually exist. All that actually exists in the digital representation is a series of numerical values that were recorded, and will play back, at regular intervals indicated by our “sample rate” (the number of samples per second).
4.png
Resampling:

Now let’s say that we want to “resample”. That is, we want to use a different sample period (as indicated by the blue lines below).
Notice that our green dots no longer match up with the sample positions that we want to use:
5.png
So what we need to do, is to calculate new sample values, from the old sample values, that match up with the new sample rate.
The blue dots represent the new samples:
6.png
So that from the new samples, played at the new sample rate, we can reconstruct the same (or very similar) waveform:
7.png
The exact way that the new samples (blue dots) are calculated from the old samples (green dots) is a complicated mathematical process called “interpolation”. There are many different ways to do this calculation. Some methods are quicker to compute than others - some are more accurate.

Audacity provides 4 methods of computing the new sample values. These are described in terms of “speed to compute” and “accuracy / quality” in Preferences:

  • Low Quality (Fastest)
  • Medium Quality
  • High Quality
  • Best Quality (Slowest)

(see here in the manual for details: http://manual.audacityteam.org/o/man/quality_preferences.html)

The conversion is performed using a software library called “soxr”. More information about this library can be found on their web site: http://sourceforge.net/p/soxr/wiki/Home/

The answers is that there are dozens if not hundreds of different algorithms to do this.

(In your example you said 41 kHz, the standard CD rate is 44.1 kHz which is probably what you meant, but I’ll use your 41kHz number for the sake of the argument).

The simplest is to just discard samples to get your new sample rate. In your example you would discard every other sample, except about every 12’th output sample you would discard two input sample in a row. Even though this sounds horrible, for most audio recordings I expect that most people wouldn’t notice a resample done in this manner.

A sampled system can only represent frequencies up to 1/2 the sample rate (this is called the “Nyquist rate” after one the folks who worked out the theory). So 96 kHz sample rate can represent up to 48 kHz, 41kHz can only represent up to 20.5 kHz. If your input signal has information between 20.5 kHz and 48 kHz then that must be filtered out in the resamplng process, or it will “alias” and appear in the 0-20.5 kHz output. (And be very evident if you used the “discarding samples” method) Filtering methods vary, but most are based on a weighted sum of the surrounding samples, if you want a “sharp” cutoff then a very wide window, possibly a 100 samples is needed.

Once the input has been filtered then an interpolation is needed to create the new output samples. If your example had been 96kHz to 48kHz then it would simply be discarding every other sample,
but if you want to go from 96 kHz to 41 kHz then you have to somehow take a block of 96 samples and create 41. So in general you have to create a new sample that represents a point in time somewhere in between two of your input samples. (Edit: Steve posted some marvelous pictures while I was writing this)

In practice this interpolation is folded into the filtering so that both operations happen at the same time.

I don’t know what method Audacity uses, but a simple test indicates that it’s “high quality” conversion has a filter that starts to attenuate at about 90% of the new Nyquist frequency. The “low quality” conversion does not appear to have any filtering.

For further reading this web site might be helpful: Digital Audio Resampling Home Page

Thank you for explaining. Only one question left: what method does Audacity use? Talking about “best/slowest” here.

Would be nice if someone responsible for Audacity actually explained what method Audacity uses on Audacity forums.

I used that word 3 times in one sentence.

I used that word 3 times in one sentence.

“I tell you three times.”
— Robert Heinlein

My hat says Forum Elf, not developer or programmer. We can divine how it works from testing and reactions. To get the real reading, we’ll have to distract one of The Developers. You know what happened last time we did that… :frowning:

Koz

You can always download the source code and read it for yourself :slight_smile:

I’ve already given that information.
Same thing in different words: Resampling is not handled in the Audacity code but in “libsoxr”. Audacity uses 4 of the presets in libsoxr. For detailed information about libsoxr you will need to look to their web site and if you have further questions, refer to their support channel. The SoX Resampler library / Wiki / Home

I am sorry, there was so much text I kind of skipped some of it, and missed your explanation. Thank again!

Hmmm… it was mentioned above that the algorithm Audacity uses is… 90% correct, is that the correct way of putting it?

Well, I read this here thing
https://ccrma.stanford.edu/~jos/resample/What_Bandlimited_Interpolation.html

and it looks like there is a way to perfectly convert between sampling rates.

Am I comparing apples with oranges here, or are these the same things? Is the article even talking about what I was asking?

What I said was that Audicity in it “best quality” mode seems to start filtering at 90% of the Nyquest rate. One can easily argue that this is 99% or even 100% “correct”. If you consider the typical down-conversion is from some higher rate like 48 kHz or 96kHz to 44.1kHz. The Nyquist rate of the output is 22.05 kHz, 90% of that is 20kHz, the generally agreed upper limit of human hearing.

If you read the rest of his paper (which I admit might be a bit tough if you don’t have an engineering background) he describes implementing the filtering using a FIR filter who’s coefficients are a windowed sync function. I suspect (but don’t know) that Audacity’s “high quality” conversion does use such a filter as that is one of the more common methods. The question is how big to make the “window”, a “perfect” conversion requires an infinitely large window.

This “window” is the number of input samples you have to examine to compute a new output sample. It grows rapidly as you try to make that 90% closer to 100%, and the conversion gets ever slower to compute. As with most things in engineering what is “perfect” and what is “practical” are not the same and “practical” wins the day.

As I previously posted, if anyone really wants to know the technical details of how Audacity does resampling, they need to look to libsoxr, not to Audacity. It is libsoxr that handles resampling and has little to do with the Audacity code.

and it looks like there is a way to > perfectly > convert between sampling rates.

There is no mathematically perfect resampling algorithm. For example, if you re-sample from 48kHz to 96kHz and back again (or vice-versa), the sample values (bits & bytes) will be different.

But any competent DSP programmer can make an audibly perfect resampler. …As long as we stay above 16-bit 44.1kHz… Of course if you downsample a CD to 8kHz, you are going to loose some high frequency information.

I’ve never heard any difference when upsampling or downsampling between 44.1kHz (CD) and 48kHz (DVD) no matter what software I happen to be using.

Generally I’d agree, but if you are applying heavy processing after resampling, then the difference may become clearly audible.

As a (totally artificial) example: Generate a 5 kHz sine tone at 44100 Hz sample rate, then resample to 48000 Hz, then apply a notch filter with the centre frequency set to 5000 Hz (q = 1). The result should of course be silence, but if you use Audacity’s lowest quality resampling, harmonics are clearly (though fairly quietly) audible.