Making voice recording sound awesome

I’m doing a lot of homegrown voiceover these days, and as wonderful as Audacity is, I can’t figure out how to take my excellent recording and process it so it sounds truly professional.

I’m running an AT4040 Cardiod Condenser mic through a Lamda Lexicon interface unit. I’m recording in my room, which is fairly noisy for an audio booth. Once I use Noise Removal and Normalize the track to get it nice and loud, I get a nice, clean recording that sounds just like me, but is several steps of polish away from sounding truly pro.

I’ve tried running an Equalization pass with the very lowest and highest sounds dropped out, and that seems to help, but I was wondering if anyone out there had any ideas or tips on how to make the recording go to the next level.

I’ve include a sample of my voice with only Normalization and Noise Removal applied, and would love to get some feedback.

(I do have to say that I have to Normalize the audio to get it up to a usable volume. I wonder if I’m recording at too low a volume. I have the Recording input slider all the way up in Audacity, and my Lexicon Mic1 level is up at about 3/4. Any higher and it starts to peak. The ‘low cut’ switch on my mic is on the ‘—’ setting (as opposed to the ‘/–’ setting) and the ‘Pad’ is at 0db (not -10db). So I think I’m recording at a high enough level, but I’m not sure because I always have to turn it up thru Normalizing or using the track volume db adjuster to get it to a usable level.)
Orc dialogue (192 KB)

Anything you can do about that?
If you have to use noise reduction, you’re putting yourself one notch behind before you’ve done anything else.

What is your background noise level metering at, and what is your peak recording level (before Normalising, or anything else)?

For voice, I would set it to /–
This will apply a gentle low frequency roll off, below the frequencies of your voice.
It should not affect the sound of your voice, but will help avoid background rumbles and other unwanted sub-bass.

What is your recording distance (microphone to mouth)?
Do you use a pop shield?

I also use the AT4040. I’ve been experimenting with distance and angle. Some recommend that you record slightly to the side with the AT4040, though I like the sound with it directly in line with my mouth.

Listening to the mp3 you provided, I could hear quite a bit of the room and not so much of your voice; the sample doesn’t have the “tight” sound that is desired for voice-over work (if reverb is desired, it can be added after the fact). If you’re not overly concerned with aesthetics, I recommend you purchase moving blankets from a place such as Lowe’s and hang them on the walls. That’s what I did, and the difference was night and day.

Thanks for the responses, guys!

I AM using a pop-shield, and I’m keeping my mouth between 6-10inches from my mic. I’ll experiment more with talking to the side of it, too. My background noise barely shows up, but I’ll see if I can get some sort of db reading on it.

And I think I’m going to move my mic setup into my closet, put up some blankets (or cleverly hang clothes) and see what it’s like. I’ll do that soon and have some more samples for you soon.

Thanks for the ideas and feedback. It’s hard to do this alone, and hearing others’ ideas and suggestions (and similar problems) is reassuring and helpful.

Something I forgot to mention is that the AT4040 picks up a lot of room noise anyway, so you may need to move in a touch closer to get less of the room.

How are you listening to your recordings? I just bought the Audio-Technica ATH-M50 headphones, and they are good at revealing details you wouldn’t hear otherwise, especially room noise (that’s how I noticed it with your sample).

Interesting that the mic is ‘known’ for certain behavior. I wasn’t aware of that. Where do you get that kind of info? I’d love to see a ‘best practices’ list for this particular mic…

I listen to the final Audacity product through some nice Creative earbuds. I may invest in some studio headphones down the road. I don’t monitor my input. I prefer to just record for a minute or so, do a bunch of different takes without worrying too much about how it sounds, then listen and adjust as necessary, occasionally re-recording.

From what you’re saying, it sounds like I’d be well served by two main things: 1) Quieting my recording environment as much as possible (duh) and 2) getting in closer to my mic (and turning down my input gain a little to compensate).

getting in closer to my mic (and turning down my input gain a little to compensate)

On input gain, read this. To sum it up, with 24-bit recording, you have more headroom on the low end and don’t need to be worried as much about recording too softly. It’s easy to normalize the track to whatever level you want.

Interesting! Does that mean it’s easy to fix clipping/distortion/peaking if I go too hot? I don’t know how to do that…

My main problem is that I peak out and distort when I get louder, which happens semi-regularly in voiceover recording. Some lines are very dynamic, and have quiet parts interspersed with louder things within 5 seconds of each other.

Where is a good level to set the input at? Should I be adjusting it for every single different recording I make, or is it better to set a limit where it peaks only when I’m almost shouting, and then just compensate for low volume with Normalization and so on? (I am pretty much just making it up as I go along, so the more basic, general recording advice you can give the better; stuff like how to set input levels. I’m ONLY doing voice recording, so that makes it easier.)

No, quite the opposite.
It means that when recording with a high bit depth (24 or 32 bit) you can afford to leave lots of headroom without sacrificing any dynamic range.
You should aim to have your analogue hardware (including the analogue inputs of your sound card) working within the range they are designed for, but when you hit the digital domain, the dynamic range of 32 bit float is so huge that if your highest peak is at -12 dB, there are still more than enough bits below that to accurately reproduce every detail.

There are really 2 parts to this subject - the design sensitivity of the analogue equipment, and then the digital realm.

In the old days of audio tape, it was necessary to try and squeeze every last dB possible onto the tape by running as close to the red as possible. That was because of the limited dynamic range of audio tape. By recording as loud as possible, the noise floor of the tape (background hiss), would be relatively quieter.
Today, with 32 bit digital recording, Even if we “waste” the loudest 12 dB possible, the digital noise floor is still way below the noise floor of any of the analogue equipment that is being recorded.

However, that does not mean that you can just record everything quietly and boost it up with Normalization and expect a top quality recording. Microphones, pre-amplifiers, mixing desks and sound card inputs all have there own noise floor. Each link in the audio chain should ideally be running within its design parameters.

Some years ago, I wanted to record the ticking of a wrist watch. At that time, the best microphone that I had was a Sure SM58 dynamic vocal microphone and a cheap realistic mixer. Even if I recorded onto a state-of-the-art digital recorder, my results would have been poor, simply because the self noise of the microphone and mixing desk were almost as loud as the ticking watch. What I would have needed would be a very sensitive, low noise microphone, and a high gain, low noise pre-amplifier.

Similarly, if you try to record a bass drum from a rock drum kit with a highly sensitive condenser microphone, it is very likely that irrespective of what volume you set the microphone pre-amplifier at, the recording will still come out sounding distorted because the microphone is being overdriven.

When recording, I try to use the right microphone for the job, and assuming that I have it plugged directly into a mixing desk, I will adjust the input gain so that the pre-fade (input) level is peaking close to the optimum level (my mixing desk will go happily up to +12dB before clipping, so I can drive the input close to 0dB with most material. I will also adjust the output from the mixing desk so that it suits the input of the recorder, which in the case of my sound card is to peak no higher than -3dB as an absolute maximum. Finally I set the recording levels for the digital audio recorder (Audacity), and here I can finally relax a bit - with a recording level at around -16 dB, I have lots of headroom, and the (very low) self noise of the analogue equipment is still faithfully reproduced.

Excellent breakdown, thank you!

For someone like me, with my AT4040 and my lonely Lexicon Lambda interface device (no fancy mixers here!) I only have two input settings to worry about. The mic-in to the Lamda, and the recording level of Audacity, which I believe receives the direct input of the Lambda (no separate output setting).

What I’ll try doing is setting the Lambda Mic-in so it gets reasonably close to peaking during normal, energetic speaking, then setting the Audacity recording level to peak at about -12db. I’ll see what that’s like, then Normalize as needed.

How can I adjust the recording volume to increase it
so that the blue wave-form is taller? In my first attempts,
the blue wave form (I don’t know what else to call it) only
goes to a maximum of about 0.02 and thus the recorded
result has to have the playback volume turned all the way
to maximum to hear it. I can get a little taller blue wave form
by holding the microphone about one inch from my mouth,
but that creates other problems.

So, how can I turn up the recording volume while keeping
the mic at the recommended 6 to 8 inches from my mouth?

Adjust the recording volume with your sound cards control panel/mixer application.

If you are using a pre-amp or mixing desk, you may have the input gain too low.

I was really excited by this question too, Joe, but it turns out that on my Dell XPS420 I have a really crappy SigmaTel HD audio driver that has NO options for recording gain. I think some serious playing around with my Lambda Lexicon audio interface (in both software and hardware) is in order.

Thanks for all the help, Steve. You’re a real mensch.

Any chance you have (or would consider writing) a beginner’s guide to using some of the more advanced techniques within Audacity? =)

There’s quite a few good tutorials in the Audacity wiki (link at top of page), and also many other on the internet including some video tutorials. Unfortunately not all of the tutorials are completely accurate, and some may be a bit out of date, so you need to use a bit of common sense.

The Tutorials in the Audacity wiki seem to be pretty accurate, and there is also a very good “Tips” section which should be essential reading for anyone using Audacity.

I’ve also started building a web-site with tutorials and other useful bits and pieces for Audacity users that I hope to launch early in the new year. This will not be to replace the Audacity wiki, the official documentation, or this forum, but will hopefully compliment them. Assuming that the content is up to par, I’m hoping that the Audacity team will allow me to announce the launch through this forum (possibly in my signature), though I’ve not actually asked them yet - but still early days yet.

I agree if you need the noise removal right at the top, you have problems, but, having listened to the clip, what happens when you put the voice into the production? If you’re going to add background music, special effects, and other dialog, I suspect what you got will be fine. When most people complain about sound quality, they sound far worse than you do. ("Wait, isn’t that noise the 108 Marina del Rey Metrobus…?)

When was the last time you heard a top quality studio theatrical recording for an eventual mix down into a show? The individual pieces of the dialog sound an awfully lot like your track. Try not viewing the track so closely that you lose sight of the show.


This is excellent news! I look forward to hearing more about this!

@Koz: Thanks for the perspective, man! It’s good to hear that I’m in pretty good shape, especially for a beginner. As I get better with this stuff, I bet I’ll solve a lot of the “problems” I’ve identified. The main “problem” seems to be the expectation that I should sound like a professional from Day 1!

Since making that recording I’ve moved into my closet and begun stringing sound-dampening materials all over the walls. I’m getting a lot less noise, and am able to get a lot more detail into my recording. I’m hoping to mix something up soon for you to listen to (and offer tips on!). Thanks again.

Hey guys! I have a sample of a recording made in my new sound booth: Download Link

It’s got no audio, and very little processing (Normalization, I think). Tell me if there’s any issues you notice.

Here’s the final file I made with it, adding music and a bit more processing (though still no EQ or Noise Removal). It sounds a teeny bit quiet to me in final mp3 form, but PERFECT in Audacity before I export it. Any thoughts? When I turn up the gain before exporting into mp3, the mp3 is a little hissy.

There is a bit of noise on the voice recording, much of which is low frequency interference, which can be “cleaned” substantially (by about 10dB) by using a high pass filter to remove subsonic noise. Also there is a noticeable peak in the frequency response in the region 4-8kHz. This is probably due to a “presence boost” by the microphone (and exaggerated by MP3 encoding), which I find to be a little excessive and would tame with Equalization. dropping this range a little will make the voice sound a little less “sssy”. Turning it down too much will make the voice sound dull. I also find that the vocal has just a little too much bass, and would bring it down just a little (how much is a matter of taste, and I appreciate that you are aiming for a low intimate kind of sound).

After Eq’ing, I would use a bit of gentle noise reduction to drop the background hiss a little. With the Eq adjustments already made, the background noise will already be substantially lower, so not much noise removal is needed.

The overall quietness of the track is because there is a big peak at about 25 seconds which means that you cannot amplify/normalize much without clipping that peak. You could use a compressor, limiter or leveller to bring that peak down, which will then allow the entire mix to be brought up to a higher level.

Awesome! Thanks for the wonderfully detailed response! Now help me understand what the heck it means! =)

I used the High Pass Filter in 1.3.5 on its default settings (3db Rolloff, Filter Quality .6, Cutoff Frequency 1001Hz) and it made everything sound thin, attenuated, with none of the punch I want from the voice. Is this what it’s supposed to do?

How do I do this ‘taming’ of which you speak? And what is that range? Is that the 4000-8000Hz range? Are you suggesting I go into the Equalizer and drop the levels in that range a bit? And drop them down to what? -3db? -6db? (I’m using the basic view in 1.3.5: Draw curves box ticked, Linear Frequency Scale UNticked).

The way I interpret this is to drop the lower frequencies on the Equalizer by some amount. How much should I aim for? (and yes, I am going for that intimate sound, but it’s great to learn how to do this stuff, anyway!)

Yay! I know how to do this!

And by “brought up to a higher level”, you mean Normalized up as far as possible, right?

This one has been difficult. I got the waveforms to go almost to the top by using a leveler (set at -6db) and then Normalizing (to -1db), but there’s some noticeable distortion at the parts that used to be ‘too loud’.

And I can’t get a compressor to work for me. Everything that’s NOT that peak is around -12db at the highest, so I set the compressor to Threshold -12 and keep everything else at default because it scares me (Ratio 2:1, Attack Time 0.2 seconds, Decay time 1 second), and tick the “Normalize to 0db after compressing” box. The waveform (db) view seems to get a tiny bit bigger, but there’s still that big peak there, and the overall loudness seems to barely have changed. I’m not sure if I’m using the tools incorrectly.

Use a steeper filter and a lower cut-off frequency, perhaps 12dB per octave at 40Hz. This should then only affect the frequencies that are below the voice and not the voice itself.

By the way, there is an error in the current high-pass and low-pass filters. The slope (“Roll Off” amount) is incorrect. If you select 3dB, the actual filter is 6dB per octave. Selecting 6dB s actually 12dB… each setting has been incorrectly listed as half of the actual value. This will be corrected in the next Audacity release.

Yes, exactly so. Again, this is partly down to taste, but to me it sounds a little “zingy” with that peak in the response.

Yes, with the equalizer. How much? Not a lot. Perhaps gently rolling off the bass from about 200Hz, but with a very gentle slope so that you are down by about 3dB at 50Hz, then you can go a bit more steep below 50Hz. You will know if you have overdone it because it will start to sound “thin, …, with none of the punch I want from the voice”.

I don’t generally go “as far as possible”, but up quite a bit. I like to leave at the very least 0.1dB headroom, and more usually go for -0.3dB. This is as close to 0dB as makes no audible difference, but can avoid possible clipping which can occur when normalizing to 0dB.

While I’m working with the audio (in 32 bit format) I generally use a nominal level of around -6 to -3 dB. This allows enough head room to avoid the possibility of clipping while I’m working. If you are doing a lot of processing (which you probably will not be doing on this particular project), there is no harm in normalizing or amplifying several times to keep the level in a usable range. The precision of 32 bit audio is so high that amplifying is virtually lossless, even if you do it a dozen times or more.

You are probably just using it a bit too aggressively. Try setting it a bit lower (higher level), say -4dB. If you set the threshold too far it will cause noticeable distortion as you describe. Try setting it at -20 dB just to see how bad it gets - that will give you a very loud recording, but it will sound terrible.

The standard compressor that is built into Audacity is not really suitable. It is a very simple dynamic compressor, but the attack time is too slow, so it misses the sudden peaks. There is a compressor called SC4 available as a plug-in which is much better (I think it is one of the Steve Harris plug-ins). With the SC4 you can set the attack much faster so that it will catch the peaks. Also, you can set it for “peak” rather than “RMS” which will be better for this as you are just wanting to stop the peaks going too high.

There is another plug-in called “Fast Look Ahead Limiter” which is pretty good.