Muffled voice: suggestions?


As an experiment, the other day I recorded a colleague with a digital camera hidden under my coat a couple of meters away (playing spy is fun!) and I am trying to clear up the recording now (very muffled) so I can understand the words (no need to have good quality beyond that). Unfortunately I haven’t been able with the tips I have found here and there (including this forum)… I tried the equalizer, low and high pass filters, cleaning up noise, even a few of these in sequence, and things improve a bit but I still can’t understand at all (I haven’t played a lot with the parameters since for some reason Audacity is acting up today: 50% of the times I do preview, or “play” in the main window, it doesn’t work).

I don’t really care about this piece of audio, what I want is to learn: to know if it’s me not doing it well enough or if the audio is too bad for Audacity’s capabilities. Of course if anyone gets a decent result I would love to learn how!

Thanks a lot

There’s a number of serious problems in there. It’s not just muffled. The sample rate is 11KHz which means the best you can ever do is AM radio – and the quality is far worse than that. I hear compression artifacts and honking.

Oh, and it’s muffled, too.

There is nothing to rescue. The speech frequencies and tones are damaged or missing. If it was a very high quality sound channel and the only problem was muffling under a coat, then we’d have a fighting chance.


Oh, I didn’t notice the rate, and the other things you mention are new for me. Then the lesson to learn is not to trust my camera for anything but the best conditions as an audio recording device.


IMO the muffling is the worst culprit rather than the low (11025Hz) sounds-like-phone sample rate,
(22050Hz is adequate for good quality speech)

Sounds somethin like " they asked what you’d like … inspecting their deeds, yeah sure you did"
but it’s very very muffled.

Beware there is a phenomenon of “Rorschach audio” (sound rather than inkbots) where people see/hear meaning in random noise.

Oh! That sounds quite good. May I know how you did it?


No really it’s 'orrible.

Adding harmonics (higher frequency versions) of what sound there is on the recording.

If you are on Windows there is a free plugin for Audacity which adds harmonics …

After adding harmonics remove all frequencies below 200Hz using the equaliser.

However I think you are flogging a dead horse try to salvage something from this recording.
You need to re-record with the mic in a better position to avoid muffling which is the main problem.

No really it’s 'orrible.


And to be clear, there’s more than one problem with the recording. Trebor should not need to “add harmonics” to the show. Just running a high quality microphone and sound system under your coat should be a simple equalization problem, done, go home. You have a very seriously damaged sound track, plus you muffled it.

I’ve resisted ordering a digital “personal recorder” because they all make certain assumptions about my performance, and none of the assumptions deal with very low volume. I have an analog tape recorder – real tape – I hide in my jacket and even though noisy and muffled, I never have problems understanding what’s being said. It’s “just” muffled. Not damaged.


Should I frisk you if and when we meet in person? :smiley:

– Bill

Unfortunately I use Suse. But I will keep the info for future reference. I should read about harmonics in audio editing (I do know the physical concept from the Fourier series lecture I have given once :slight_smile: )

Sure, after all I just wanted to learn how far can one go in these conditions. Of course for a serious recording I would use something better - and would probably not need to hide the recorder!

Thanks a lot to all

What does one gain by killing the lower frequencies in general in these types of recordings, as I think you suggest? Is this an efficient, unsophisticated way of killing some noise? Or it serves another purpose?



There is virtually no useful information in audio below 200 Hz to contribute to the intelligibility of speech. The useful sound for understanding speech is in the range (roughly) 200 Hz to 7 kHz.
Removing frequencies outside of this range will silence irrelevant audio which can help in picking out the (relevant) words.

Unfortunately your recording virtually cuts off at about 750 Hz (0.75 kHz), so the vast majority of relevant speech information is missing. Without the higher frequencies it is impossible (for example) to distinguish between a spoken “F” and a spoken “S”. The greater the loss of high frequencies, the less the intelligible the speech. In the case of your audio sample, there are no high frequencies at all other than noise. Here is your audio sample filtered to leave only frequencies above 2000 Hz (2 kHz) and then amplified to bring it up to an audible level.

Let me take it again. Digital compression systems work by “throwing away” quality in favor of long performances or storage efficiency. They depend on the show volume and initial quality being “normal” and not particularly difficult.

Compression damage can be very hard to hear and a highly compressed talk can be perfectly understandable. See: cellphones. Compression algorithms only work if the original material is perfect. This is what makes it so difficult to edit MP3 files. The compression damage is invisible in one pass (downloaded from the internet), but in multiple editing and production passes it adds up until the performance starts honking and bubbling and becomes useless.

In this case, it sounds to me like your recorder’s compression failed. It has no idea how to handle a very low soft voice, and it reacted by doing its job and throwing away “unneeded quality.” Except in this case, all that “trash” is the higher frequencies like around 3KHz that give voices their intelligibility. The harmonics are not just muffled, there aren’t any. They’re been compressed out.

That’s why the straight equalizer failed (there’s nothing at 3KHz to boost) and when logically placed harmonics are forcibly jammed back into the voice, it seems to help, but since they’re not real, you don’t get actual words.

It’s like a foreign actor “faking” an American accent. It can drive you crazy because it seems like you should be understanding what he’s saying – he’s perfectly loud and clear – but you just can’t… quite… do… it.

I describe digital compression like this. Good compression can turn a Stradivarius and two 7-Eleven violins into three 7-Eleven violins. All those expensive, delicate, rich overtones? Gone.

Same notes, though.


Thanks to everyone for the useful info! :bulb: I have made another recording of a colleague and I talking casually in the office, with a different device (an MP3 player, not much better probably!) so I can have another example to look at those frequencies and play with Audacity. Well if it’s still too bad I will just grab someone’s recorder :slight_smile: I am curious now to see the interplay of those frequencies and how one can improve/change things. Thanks again!

Best regards

The sample rate and bit depth can be adjustable on such devices, both affect the sound quality.
A sample rate of 22050Hz and a bit depth of 8 provides good clear speech recordings.
[some of the codec options on such devices, like ADPCM, only have a bit depth of 4, which sacrifice recording quality for smaller file size].

There are standard upper numbers.

FM radio has music between 20Hz and 15 KHz, which is why FM always sounds so much clearer than AM radio which only goes up to 5KHz. Classic Land-Line telephones only go up to 3KHz – just barely enough to get words to go through — and no more. Lower than 3KHz and it becomes really difficult to pull spoken words out of the work. Lower than that plus digital compression can destroy the show.