Reducing Harshness of Booming Vocals


I’m having a problem recording an audio course. I’m in Windows 7 on a Lenovo Z570 laptop. The mic is an AT2020 USB condenser. The following file is recorded at 10% input volume with the mic 14 inches from my mouth and slightly below it. Then it was normalized and amplified to give it a normal volume. No matter what I do and how close to the mic I am, the voice comes in very harshly. It’s rather startling even at normal volume. I’m trying to record a meditation course and one person’s feedback is that the voice is “too sharp”. Is there a way to smooth out the voice so that I don’t jolt people off of their meditation chairs? Maybe there’s something I can do while recording or during later processing. By the way, the mic is in a milk crate isolation box, though this does not seem to make a difference in the harshness.

The file is at


Your recording is suffering from over-recording. Look at all those peaks on the blue waveform that are touching the top and bottom edges. You need to do one of two things: move the microphone further away from your mouth or turn down the level of the input signal. The latter is easiest to do. You should be aiming to get the initial recording level so that the peaks of the blue waveform are running at about three-quarter height.

Also, if your microphone is directly in front of you, you should move it. Imagine you are looking at a clock face with your mouth opposite the dead centre. For best results, your mic needs to be at one of 10-11, 1-2, 4-5 or 7-8 o’clock. If you are reading from a script, your tendency will be to look down at that script, so set the mic in one of the upper positions. Good mic technique is to talk past the mic and not into it. The mic of course, should be pointing directly back at your mouth.

Thanks. Actually, the initial input volume was very low. I amplified it in Audacity to get the waves to span the entire space vertically. I just tried another test and amplified it less, and there are still some harsh qualities to the voice. Any ideas on how to process it to take the edge off of it?

Record something typical like your first published test but don’t do anything to it. Doesn’t matter what the level is. Post that somewhere.

If you throw enough technology at it we can’t tell where the problem stops and your patching starts. You can get into serious trouble with Normalize and Amplify. Koz

Gotcha. Here’s another file with no processing whatsoever. Not even noise removal.

OK, now we can hear what you really sound like. The level of the recording is a little too low (those waveforms are not quite big enough). It would be a good idea to increase the mic level a little more. You should aim to get the peaks of the waveform to about the 0.5 level at the time of capture (a little over 0.5 would be OK). Second, your voice is naturally sibilant (somewhat hissy). That you will either have to live with, or wait for someone else, with different knowledge to me, to offer advice on how to improve that in post-production.

Forget about Noise Removal, there’s hardly any noise to remove. Your silent sections are running at about -50dB: that’s near enough to inaudible for most purposes. You have created a recording environment that gives you a good, clean sound. All it needs is a little tweak on the mic level as I indicated in the previous post.

OK. A little better at .5 or .6 input volume. Those settings bring the average wave to 0.5 or a bit above. The only thing I’m worried about is that there’s a very busy street a block away and a bunch of screaming kids at recess 50 yards away plus cicada season and crickets at night. That’s why I had it at 10% input volume before. It seems to keep those extraneous sounds from getting out of hand. Right now, at 2am, everything’'s dead silent. My house is made out of cinderblocks, but those sounds are picked up easily on condenser microphones, especially this one. Not sure what tomorrow will bring, but at 2am, it’s quite easy to record something that doesn’t pick up crazy traffic and large obnoxious bugs.

Ah! When I Amplified (to peak 0dB) a piece of silence between your phrases I thought I detected a rhythmic sound. I thought of tree frogs but cicadas will do very nicely! I stress: these became audible only after amplifying the silence. They were not obvious in the raw recording.

This is your clip after simple, gentle processing (wait until 17 seconds).

The file is much bigger now because you can’t repeatedly compress MP3.

Amplify to -1 target (Never Amplify to 0. That can cause problems later). I used the first silent patch (thank you for that) as the Profile for noise reduction, and then applied the reduction at -12dB (very gentle) starting at 17 seconds.
So everything remaining is just you at normal volumes, levels and background noise.

Since your original posting had very serious distortion and damage, this should be enormously better, even though I didn’t directly address the booming or harshness. I would not get closer to the mic. That gives you directional microphone proximity effects and that will give you a boomy voice.

The microphone seems to give a very slightly bright voice and that can be desirable in a spoken show. I’d leave it just like that.

Now for presentation, you still have “room” in the show. I can tell you’re recording in your mom’s kitchen and not an acoustically dead room like an overstuffed living room or carpeted bedroom – or a studio. There’s no filter for that. We can’t get rid of echoes.

Styrofoam is a terrible sound absorber. Go with blankets, pillows or quilts and put something up on the wall behind you. That’s where a majority of the echoes are coming from. We have people at work that record in a closet with quilts on the walls.

A visit to the packing and shipping store might be good.

Work on pronouncing “Button.”


OK. Thanks. That one sounds pretty good. Did you just do noise removal and a normalize at -1 and that’s it? The volume on it seems almost as high as I would want it optimally. By the way, I screwed up when I said “styrofoam”. The box is lined with cardboard on all sides and cushion foam (the stuff with the sparklies in it) on four sides. Yes, it’s very much like mom’s kitchen. I threw some clothing over the box and duct taped a coat on the wall in back of me and came up with the following track. At 10 seconds, it’s normalized at -1. The amplitude is slightly lower than I’d like it to be, but some of the peaks from hard consonants are already hitting the top. I’m comparing the volume to Pandora (probably a pretty good volume standard) and another audio course I did. This is slightly lower than both in volume. After 10 seconds I did about four rounds of noise removal to get rid of all the cicadas, and at the end I kept in the car that is about 100 yards away. Anyway, now I’m a whole lot closer to resolving this and now know how to experiment to get it right. Thanks, y’all. New file:

That better?

I followed my procedure with the Compressor tool.

I did not Normalize. You need to pay attention to the order of operation because the tools can interact with each other.

From the original work.
Effect > Amplify > New Peak Amplitude > -1dB.

Effect > Noise Reduction
Get the Profile from a voiceless segment of the show.
(from the top):

Effect > Compressor
(from the top)
1.0 sec
[X] Make-up Gain

The Make Up Gain seems to stick at 0dB which I don’t like. You can leave Make Up Gain unselected and re-apply Amplify to -1 after you compress.


That’s great. I tried your approach with compression, etc and made four files: A .6 input volume both compressed and uncompressed plus the same for .7. Very hard to choose between the four, but I’ll listen to them over and over again. Noncompression gives the voice a slightly more natural sound, and it’s unclear which approach gives a softer edge. My ears aren’t well trained, but listening to them all for an hour or two should narrow it down. Thanks a lot, everyone. I also do animation on the side, and these tips will certainly come in handy for that, too. Hope this thread helps some other schmuck who upgrades to a real microphone after a lifetime of cheap mics.

You can change the Compression ratio from 5:1 to a lesser number and get less dense and intense, more natural sound. Of course, with no compression, and the other tools just helping a little bit, that should give you the most natural sound of all, but may not be able to compete in volume with other people’s voice submissions.

It’s always a decision what’s important to you – or the client.

OK now after all that, you get to show us what an actual performance sounds like. Not the whole thing, just a 20 second or so segment of what you do – for real. As a deliverable to the client.


None of them should change the edge. That grit came from the distortion and damage of the original work. Your voice appears to have a slight natural sibilance. If you’re intent on changing that, then we need more processing. I personally would leave it just as it is. You are a product. The object is not to sound like everyone else.


Koz, I’ll send a sample. I’ll be recording this week. I studied up a bit on compressors and what each setting does. Anything that keeps the amplitude more constant without occasional extremes will probably be nice because I tend to have some high peaks followed by much lower amplitude. Playing around with the compressor might help me to up the volume a bit without occasionally smacking people in the face with sound. Other than that, I like natural and don’t care for too much processing anyway. I made a much bigger audio course with a $40 AT dynamic mic and the voice sounded great. The problem with that one was that there was plenty of noticeable hiss. This will just be cleaner and more digital sounding. Looks like .7 input volume is a nice sweet spot. Sounds pretty good with or without extra processing other than noise removal. The compressor just might help to flatten it enough to increase the volume slightly without the sudden extremes. I played a clip over some Pandora music and the voice wasn’t drowned out by the music. Every word was clear. I think it will be a good volume.

It’s good but I can hear a problem with plosive Ps and Bs …

“so I’m sPeaking about the same I way I was Before on the other file”

it’s fixable with the equaliser (attached) but better to prevent it in the first place with popshield .

[ IMO de-essing is required too, the popshield may help with that too ]

Maybe de-essing, but the pop shield won’t do any good for the “s” sounds since they’re not plosives. As long as they don’t get gritty, I think that’s just what the performer sounds like.

Noise Removal is required before compression (in this case) because the whole object of compression is to bring the the high, powerful sounds closer to the lower gentle ones. Low and gentle is where the noise lives.


But yes, pop shield is always good.

I bought an AKG microphone and it came with a pop shield in the case. I was surprised to find that the thing had two layers of women’s black stockings stretched over the hoop instead of only one. My other stand-alone shield only has one layer.


It’s good but I can hear a problem with plosive Ps and Bs …

How? How are you listening? Headphones? I admit I didn’t do the Worst Case Listening Experience with Computer Speakers, but I guess that’s warranted. The sibilant highs are where modern listening systems live.

Those are two Boston Acoustics CR6s driven by a BGW 250 on the floor under the table. The preamp is a Hafler HD101 and the computer is a Mac Mini. The amplified bass cabinet (also under the table) is a KLH 10".

So it’s a full-on wide band music system.

I have a set of Koss Pro4 AA headphones, although for a good test, I should probably use earbuds and a low-end speaker system.