Understanding sound levels

I’m preparing to record a book I’ve written. I’m using a Blue Snowball Mic, Voicemeeter and Audacity on a Windows 10 OS. I’m using my walk-in closet (to avoid background noise) as a makeshift studio. My gut is telling me I’m okay, since I can play the sample file I’ve recorded through my desktop system set at the same volume I normally use while at my computer, and the voice quality sounds pleasant.

I’m confused by instructions in the ACX guide “Each file must measure between -23dB and -18dB RMS.” I want to be sure I’m recording close to that so it won’t end up overprocessed after I’ve put in all that effort recording 100,000 words. The problem is that I don’t know how to read what I’m seeing in Audacity as I record. I’ve attached a screenshot of a bit of the recording’s waveform on Audacity.

I’m seeing the bulk of the recording being at .25 to -.25. Spikes go no further than .5.
What does this mean relative to the ACX guide calling for -23dB and -18dB RMS?

I am new to all this jargon and trying to catch up as quickly as I can so I can get this job done. Hopefully, I don’t need a degree in audio recording to figure all this out adequately to accomplish this task.

Thanks for any help.

I don’t need a degree in audio recording to figure all this out adequately to accomplish this task.

You [tapping gently on shoulder] are the recording engineer. You don’t have to be. You can do this for real by showing up at an actual studio with a real recording engineer and walk out with an excellent voice track easily convertible to ACX specifications. People do that. You don’t need a studio big enough for the Boston Symphony Orchestra, either. A local company here has hire-on studios just big enough for what you’re doing.

A poster not too long ago was giving Home Recording a shot after doing it the legacy studio way for their books. It was a surprise. “So that’s what those people behind the glass do.” You are warned not to do experiments in the middle of a book. ACX puts great stress on chapter to chapter matching.

Less obvious is room noise and echoes. Having a good quiet room is a Really Big Deal.

Your illustration is probably fine. The goal is blue waves roughly half-way tall (0.5) or a little less. Viewing the bouncing sound meter, that works out to -6dB to -10dB. Note shooting specifications and submissions are different. Very few people announce right into submitted chapters.

We publish both tools to change your sound to match ACX and measuring tools to see how you’re doing. And there’s always the forum if it all goes into the mud.

Make a test clip of about 20 seconds or so and publish it on the forum. It is strongly urged that you work in mono (one blue wave) and not stereo (two blue waves). There’s instructions how to get there in this post.


Room Tone is the noise the room is making with you making no sound. ACX includes Room Tone as part of its submission specs, so you have to get that right. -60dB Room Tone (from the specifications) means the background sound needs to be about 1000 times quieter than your voice. That’s significant and that’s why you’re in the closet (so to speak).


I don’t want to have to get an audio recording degree, but I do already have a filmmaking degree. I would thoroughly enjoy learning some new techie biz, though, so I AM wanting to learn this. I just want to limit it to what I actually will need to learn relating to my project at hand. I also do not have the budget for anything other than DIY at this point.

I just discovered the bouncy green bar. Playing back my sample, it’s averaging around -12db to -20db. Is that too quiet? I also discovered the Recording Volume and Playback volume sliders. Should I experiment with the recording volume until it is at -6db to -10db range? Should the playback volume then match whatever the recording volume turns out to be?

Thanks much for the response.

We call them the “meters”. Meter Toolbars - Audacity Manual

When recording, it’s the “maximum peak level” that you need to be most concerned about.

On the whole, the waveform in the picture looks to be a little bit on the low side, but I notice there are a few peaks that are much higher than the rest. See around 1.5s, 5.2s, 12.5s and 23.0s.

My guess is that the first one is from you switching on the mic, or something like that (and can probably be ignored).

The others I’d be a bit more concerned about - if I had to guess, I’d say that they are “plosives” (“P” or “B” sounds that have caused “wind blast” on the microphone). Do you hear a low pitched “thud” at those points (when listening back on good headphones)? If so, then you may need to use a “pop shield” (Pop Shields: Why You Need Them), and/or reposition your microphone.

if I had to guess, I’d say that they are “plosives” (“P” or “B” sounds that have caused “wind blast” on the microphone)

But we won’t have to guess if you post a sound sample.


Once we find out what you’re doing, we can post pages of information about how to get from where you are to where you need to go and why.

Tangent question. Why Voicemeeter?



ACX’s goal is a pleasant reading at good volume with no distractions. The RMS (loudness) value makes sure all your chapters match loudness and they match every other audiobook. The peak value keeps you out of distortion from getting too loud. That one is important because overload distortion produces harsh noises and is permanent. High Background Noise is counted as distraction.

Audiobook Mastering will only get you past the technical specifications. You have to pass theatrical qualities, too. When you submit to ACX, they send you through The Robot which works similarly to ACX Check, part of the mastering process. But then it goes on to Human Quality Control where you die if you can’t read out loud, have a terrible voice, stutter or have any other theatrical quality problem.

Everybody Knows you need to take your breathing sounds out of the performance. I’m not so sure. I don’t think ACX has ever bounced anybody for normal human sounds. The metaphor is listing to somebody tell you a fascinating story over cups of tea. They’re probably breathing as they speak. There was one forum poster who sounded clinically asthmatic. That would probably not work, it was uncomfortable to listen to, but they never posted back, so we don’t know.


This is the ACX Audiobook Mastering process.


It’s not the only one, but it seems to work more often than not. That posting is pages and pages of description how to apply three tools. Equalizer is a rumble filter to get rid of very low pitch sounds that can create problems in post production, RMS Normalize, which sets the loudness and Limiter which gently pushes the tips of the blue waves into compliance.

That’s it. If your theater was OK and you shot in a quiet room, it should be a matter of carefully editing out your fluffs, Export a WAV safety copy, make the MP3 to the correct specifications and post it for submission.

You have to add Lame software for Audacity to Export an MP3, so that can be a surprise.

Scroll down for extra plugins.



Okay. I’ve done a sample and learned right much along the way. You asked why Voicemeeter. I found this clip on Youtube, https://youtu.be/S9vznpXS_so, thought it was a vast improvement, so I installed it and am trying to figure the thing out. The thing I just realized is that the button called “Audibility” is actually a noise gate. I had zero background noise before i started speaking because I’d inadvertently cranked the thing way up. Now, it’s just up a tad and a buzz I can hear when using theBlue Snowball naked is gone. All the time. But when I start speaking, it seems to transition naturally. It appears to be quite useful at reducing background noise.

Using Voicemeeter also means that I have more than twice the volume than what Blue Snowball gives me by itself. My voice also sounds much richer. I’m happy with it and hoping you or ACX won’t find a multitude of technical issues to it for an Audiobook.

The sample’s attached. It’s 22 seconds, but I thought I should finish the sentence. I do have a pop filter on. The mic is on a boom, about six inches from my mouth.

I’m so pleased I got to make some progress with this on Superbowl Sunday evening. All thanks to you. Can’t follow sports to save my life. No attention span for it. Like watching paint dry. This has been so much more stimulating.

You can go over 20 seconds in a posting, but you can’t go much over 2MB digital filesize. That’s when the forum cuts you off.

Oh, this is good. I can listen to a story in that voice.

I sent it through mastering and it still came back too noisy. I applied moderate, simple noise reduction 8, 6, 6 and it passes ACX conformance. First three readings and sentence 2/3 down.

Screen Shot 2019-02-03 at 17.36.01.png

Depending on what voicemeeter sounds like, that may be all you need. It has to be subtle enough so ACX Human Quality Control can’t find it. They hate “Processing.”

In my opinion, it’s too sharp and crisp. It’s very hissy and it’s emphasizing every little mouth tick. There are tools for that, but that’s theater and I’m less competent there. Others will post.

No, that’s not normal and I’d be reaching for a wool sock to pull over the microphone, but I know a sock won’t fit over a Snowball.

The mastered version should sound exactly like you except maybe very slightly louder. That’s the idea of the tools. They sound natural.

I’ve used a Snowball and it didn’t have a harsh, crisp sound. Something else is happening.


I made a correction called “WoolSock.” It’s a plugin setting for Effect > Equalizer.

Play the last two sound clips and see if you can find the differences. The second one should be less “essy.”

We’ll take you through how to apply all these tools and what they do.

How are you listening? I have a good quality music sound system that does tend toward gentle crispness. So getting it to sound harsh is saying something.


Are these tools you’re using accessible to me? I’m sure the sharpness is coming from the settings I have on Voicemeeter and can be adjusted, though I may have to learn how to ‘hear’ it. Next, I have to actually read that Voicemeeter manual.

I’ll shop for a mic cover. There are lots on Amazon. Should I get furry or foam? The product description for the foam is talking more about reducing plosives.

Thanks so much for the encouragement about having a voice that can tell a story. I think it will be very satisfying getting to ‘perform’ what I’ve written.

Will it be better to leave the final noise reduction for post-production?

I need to test this without the ridiculous plastic plate I’m having to wear while I wait for a dental implant to take root. That may be a lot of the problem. Damn thing makes me lisp. Or should I say, lithp?

Thanks for running the sample through that filter. I’m not hearing the change though. I don’t know if it’s something learnable, or something that’s just not happening with my hearing.

I re-recorded that sample, this time without that hard plastic plate in my mouth, and the settings on Voicemeeter set more towards a mellower bass. It’s attached.

You asked how I was listening. I’m using an AKG 52 headset I paid $40 or $50 for. It covers my ears completely and is padded.

Are these tools you’re using accessible to me?

Every one. And instructions how to use them.

By “How are you listening,” I mean do you have good headphones? Good speakers? Other?

Laptop built-in speakers need not apply. It also doesn’t count if you can hold both of your computer speakers in your hand at the same time.


Posing for that picture was the most valuable thing those speakers ever did. They sounded dreadful.

You have to resolve speakers or headphones before you start making changes. Nothing like flying blind…or deaf.


Should I get furry or foam? The product description for the foam is talking more about reducing plosives.

The furry one (widely called a Dead Cat) is for wind.

The foam one is for plosives and mouth noises, but I didn’t hear any.

This is Chase with the worst plosives on earth.

He’s doing everything wrong. Too close and popping his P sounds. I didn’t hear you doing anything like that. Steve was working from the appearance of the blue waves before you posted the actual voice. The two filters I applied, Low Rolloff for Speech and WoolSock may have taken care of those.

So you may not need your checkbook…yet.


We crossed posts, so I answered some of your questions above.

I re-recorded that sample

And you didn’t include the two-second ‘don’t move’ dead spot at the beginning. The analysis tools use that spot to measure noise.

Will it be better to leave the final noise reduction for post-production?

Darn good question.

When you finish a read—no corrections—you should export a WAV protection copy. That’s for when Audacity or the computer or both goes into the mud during editing and takes the chapter with it. Audacity Projects are not recommended for this.

I never thought this all the way through and it’s making my head hurt. (I’m not an out-loud reader)

You might take a chapter to final sound form but leave the fluffs and mistakes. Apply the correction suite including custom filters and noise reduction and then start cutting it up. That way you can listen to each edit in its final form right there on the timeline and not get surprises later.

Apparently, it’s fairly common to clap loudly at a mistake creating intentional overload and a tall blue wave. It’s a snap to go back later and look for all the thin tall waves to identify the correction points. But you can’t apply the correction suite with the tall waves there.

I don’t think…

I never did the research on this particular variation.


I put the room noise back in the beginning. It had gotten too long.

I was using the clap to mark where I flubbed a word. Hasn’t anybody come up with a bright idea on how to replicate that with something that won’t mess up the filtering process?

Is my headphone, the AKG-52 adequate? It’s in a $40 - $50 price range and completely covers my ears.

Room Tone at the beginning is pretty important. If the clip goes much over 20 seconds, chop off words at the end.

Hasn’t anybody come up with a bright idea on how to replicate that with something that won’t mess up the filtering process?

It may not be that bad. This is where I dig into process and theory and try to divine what’s actually going to happen.

Unless you’re making a mistake every ten seconds, there should not be that many spikes in the waves. RMS Normalize works on broad, sloppy measurements and may not even see the claps. Limiter will certainly see them. It’s job is to manage them, but if it reduces everything quieter than -3.5dB (and it does) it may effectively erase them from being useful.

You could try it.

I think you still have to download and install RMS Normalize. The other two come with Audacity. ACX Check has to be installed, too, so you can check your work.


I’m not a Windows elf, so his part is between you and the instructions.


completely covers my ears.

Sealed headphones are important because they keep your voice from leaking back into the microphone and causing several bad sound effects because of feedback. You will not be listening to your own voice during recording because of the system delays and echoes. You can’t listen to the computer while presenting, and the Snowball doesn’t have a place to plug the headphones. So all you can do is listen to the recording later.

I have no opinion on the sound quality. I’ve never met them before. Maybe someone else will post.


There are Labels. You can place a Label in the show but on its own track either stopped or while you’re actually recording.


The problem, or at least the last time I tried to use them, is they don’t track the track. If you change the duration of the sound track, the label track won’t follow you. It throws off all the timings and intervals. So, yes, I think claps are the way to go.

As we go.


That’s what the oft misunderstood “Sync-Lock” is for (Sync-Locked Track Groups - Audacity Manual)
When Sync-Lock is enabled,changes in length or position of audio, automatically drag the labels with them so that they remain synchronised.

Hand claps are OK as well - whatever works best for you.

Yes indeed it would, but normally you would not be using the Limiter effect until much later in the process, after the hand claps had been removed.

Did you use Voicemeeter when recording that?
There’s several problems with the sound quality, (over emphasised lip smacks, hum, rumble …). If you want to fix them, it’ll be easier if you are not fighting against processing done by Voicemeeter.

I’m assuming that you would like the sound quality to be as good as you can get it, in which case I’d strongly suggest abandoning Voicemeeter, at least for now.

Could you post a sample that is recorded directly from the Snowball mic (without Voicemeeter or any other digital effects, but still using your pop filter).

Could the lip smacks be from being too close to the mic? What’s the correct distance?