How to reduce WAV file, freq, etc, still understand speech

Hi, i’m not sound engineer but in my small electronic project i want to use short samples of speech (less than 1sek each) in WAV format (i can’t use MP3).

I’m looking for knowledge how to manipulate sound to:

  • reduce WAV file,
  • reduce frequency (low, middle pass filters, etc)
  • reduce noise which arises in this processes

I understand that speech will much worse then original but i’m looking for a compromise between the length of WAV file and the possibility of understanding speech. I spent a lot of time trying to achieve a satisfactory result, but noise is to big.

I used good quality IVONA (ivona.com) samples of sound and WavePad Sound Editor (nch.com.au/wavepad).

Do you mean that you want to reduce the size of the WAV file?
Why can’t you use MP3? What are the format restrictions?

We’re not quite sure of your direction. Reducing the size of the WAV file is pretty simple, but you can’t do it with audio filters. Also, what kind of noise are you making? None of the filters or file construction tools create noise like hiss, buzzing, or popping. Are you complaining about voice distortion?

What is the job?

Koz

More details:

  • mono sounds
  • 8 bits per sample
  • max time of one speech sample 0.7 sek

I have a microcontroller with small memory 32kB and i must put in more then 10 sound, 0.7sek long each.
This microcontroller can’t use MP3 formats of sound.

It is voice calculator for blind person (each digit and other functions have own 0.7sek speech file).

So, I must find a solution to reduce speech samples to ca. 2kB per 0.7sek (one speech).
I am aware that it will be compromise between the length of WAV file and the possibility of understanding speech.

I was trying many combinations, at various order, to use:

  • low pass filter (ca. 0,8-1,4 KHz)
  • reduce sample rate
  • normalize sound
  • reduce noise
  • etc,
    and still have big noise.

I’m looking for knowledge what should i do with speech samples to gain satisfy effect.

Sorry for my english :slight_smile:
(polish to Mr Kozikowski: Jeżeli mogę po polsku to z chęcią napiszę)

Click on “Options” when you save as “other uncompressed files” and you can select to save as 8-bit depth WAV …
Audacity has the 'option' to save as 8-bitWAV.png
If you want to minimize the file size you’ll need to reduce the sample rate from the default 44100Hz, to something lower like 8000Hz, e.g. attached.
As you mentioned there is a trade-off between quality and file-size, the smallest file sizes are the most distorted, (this is inevitable).

To achieve that I think you are going to have to use 4-bit depth @ 6000Hz, which will be coarse.

Maybe yes? (i will try)

Maybe i wrote not clear or my english is not enough to explain my problem.

I know how to convert sound to 8bit mono sound. There is many of programs to do this.
I’m asking for method how to prepare sound (low pass, normalization, etc.) before convertion to 8 bit.
In other words, what to do, to extract from speech unnecessary frequency (which ones), or other techniques, to gain the best sample to convert to 8bit and no more 2kB for 0,7sek long sound.

Low pass filtering isn’t necessary : when you reduce the sample rate from 44100Hz to say 8000Hz the high frequencies (above 4000Hz) will automatically be lost.
There are mobile phone codecs which optimise intelligibility of voice with low bit depth, e.g. http://en.wikipedia.org/wiki/G.726 , (Audacity has 4-bit verison of the similar G.721, but it may not be compatible with your program).

BTW attached is a 6000Hz 4-bit version, (which is 3.0Kb/s, a bit over what you specified: 2.85Kb/second ], I don’t think you can’t afford 8-bit depth and meet your specifications, 4-bit or even lower will be necessary.

You cannot fit 10 seconds of reasonable quality sound into 32 kB without some form of data compression.

Here is an example of 10 seconds recording time with a sample rate of 8 kHz and a bit format of 8 bits.

File size = 78.2 KB (80044 bytes)

As the bit depth is reduced, the noise level increases. 8 bits is about the minimum for “telephone quality”.
As the sample rate is reduced, the frequency bandwidth is reduced - the minimum frequency bandwidth for “telephone quality” speech is about 4 kHz. The sample rate required is double the frequency bandwidth, so the the minimum sample rate will be about 8 kHz.
So for telephone quality you need about 80000 bytes per second of uncompressed audio.

Although you cannot use MP3 compression, can you use any other form of compression?
For example, BZ2 or 7ZIP can compress the above file to about 34 kb.
Or could you use ADPCM?
For the best quality speech with minimum data size there is a purpose designed (open source GPL) codec called “speex” http://www.speex.org/
Here’s the same recording using speex - the .SPX extension is not allowed on the forum, so I’ve converted it back to WAV again, but you will still hear that the sound quality is much better than the 8 bit 8 kHz WAV PCM file, but the file size for the speex file was only 26.6 KB (27212 bytes)

The problem with microcontrollers is that probably it can’t handle floats. Therefore you need an implementation of a codec that will work with integers only, am I right?
Speex is very good for speech, and I think integer only implementation of it are possible (search in the speex website). It might require you some extra work to be able to make your microcontroller decode speex. That might be too much work though… Any chance you could use a different controller with more memory?

Filters won’t help you reduce the size of the wav files…

Maybe i wrote not clear or my english is not enough to explain my problem.

One of the nice things about English is you can create great damage and everyone will understand your idea.

(polish to Mr Kozikowski: Jeżeli mogę po polsku to z chęcią napiszę)

I went to school in New York City and stayed with my first generation Polish grandparents in Greenpoint, Brooklyn. I learned many words, but I only remember two: “Good Morning” and “Pickles.” So no, I do not speak Polish. Sorry.

Koz