downsampling from 16kHz to 8kHz


I’d like to downsample wav files created and exported with Audacity 1.3.9 from 16kHz to 8kHz. How to do it?


Use “resample”. On 1.3.9 resample is on the “Tracks” menu (4th from top).

Bear in mind that by downsampling from 16KHz to 8KHz you will lose all frequencies above 4KHz: it will sound like a telephone.
(On the plus side the files will be halfed in size )

Thank you!

This is exactly what I wish to accomplish. In general I’d like to use Asterisk + Sphinx4. Asterisk is for receiving speech from phone to server. Sphinx is for automatic speech recognition. I need to create speaker-dependent acoustic model. Previously I thought about creating application for mobile phone (PocketSphinx needs to be ported to Symbian) but because it is not ported to Symbian yet, I decided to use calling from mobile phone to server with Asterisk. For using Asterisk with Sphinx I definitely need to have quality of phone talk ( But for Symbian application I don’t really know if I need 16kHz or 8kHz. I will need to ask many people to record their samples. I don’t want them to repeat the same task later (when PocketSphinx will be ported to Symbian) so I would ask them to record it with 16kHz, I will downsample from 16kHz to 8kHz. After some months (when PS will be available for Symbian) I will use original, not downsampled version. (Or do I need 8kHz for PocketSphinx as well? I think the answer to this question doesn’t require knowledge about PocketSphinx but rather about mobile phones in general).


A quick google found this 2006 paper …

Contemporary Speech Communication Systems
The problems resulting from limited bandwidth in speech communication are widely recognized today. As a result, speech communication systems with expanded bandwidth are in growing use. In video conferencing, for example, audio connections commonly have a bandwidth of 7 kHz, and sometimes 14 kHz or higher. FM and television carry sound with 15 kHz bandwidth. IP telephony is moving to 7 kHz bandwidth using compressed and uncompressed codec techniques as described in TIA 920-200, and even the cellular telephone network is expanding to enable 7 kHz audio via the G.722.2 speech codec as called out in 3G.

By extending telephone bandwidth to 7 kHz and beyond, it is clear that one can markedly reduce fatigue, improve concentration, and increase intelligibility.

Thank you for your answer!
In other words you suggest me to downsample from 16kHz to 7kHz rather than to 8kHz?
I found info about 8kHz here: , however I also asked on Sphinx forum and they suggested me to use something else for integrating Sphinx with Asterisk, i.e. “mrcp via cairo/zanzibar”.

BTW if you are trying to simulate a telephone you’ll have to lose the frequencies below 300Hz too,
(see Fig 3, on page 8 of the PDF I linked to in my previous post).

Audacity’s equalizer will enable you to do that.

No. The article is saying that there is a very marked increase in intelligibility if the audio bandwidth is increased from 3.3kHz (typical in telephony) to 7kHz. To achieve an audio bandwidth of 7kHz requires a sample rate greater than 14kHz (see Nyquist theorem). In other words, intelligibility (specifically consonants) is considerably improved with the band with afforded by a 16kHz sample rate.

Just to make sure that any frequencies > 4 Hz don’t fold back (due to roll off of low pass filter in Audacity) it is better if FIRST you pass your data through a lowpass filter (bandwidth 4khz) and THEN do the downsampling as suggested here.

(I’m saying this because for frequencies > 3.5 KHz such fold back problem occurred when I did resampling)

That shouldn’t be necessary if you have Audacity Preferences set to use the highest quality conversion. In Audacity 1.3.12 the relevant setting is the “High Quality Conversion” setting in “Edit menu > Preferences > Quality tab”.

If you are using Audacity 1.3.12 and you find that there are problems with high frequencies folding back during conversions (with High Quality Conversion set to “High Quality Sinc Interpolation”) it would be very useful if you could provide steps to reproduce the problem as it should not happen.


Thanks for pointing it out. It’s my mistake actually.

I had a speech recording sampled at 16K and I was downsampling it to 8K. I compared spectrum.txt (exported from plot spectrum) for both the files. What was happening was frequencies > 3.5 KHz were greatly suppressed (by about 20 dB).I did not notice the negative sign on magnitude and thought that they are actually getting enhanced due to back folding of frequency!