Help me with RMS levels and mastering for ACX

Time for a wet blanket? It’s not at all unusual for someone to read through the first half (or more) of a book and find they got so much better through practice, they start over.


Did you write the book?


Can’t thank you enough for all the help and suggestions Koz. Feel like I’m making good progress, and hope you have more tips for me today! :slight_smile:

Sorry for the delay, I just got back to my computer (on my way to bed - almost midnight in NZ) and I’ve installed the RMS Normalize and the LF-rolloff for speech. One question: I record in Mono, (I had read somewhere that for spoken word you shouldn’t record in stereo, it just doubles the size of the file, and also that ACX ask that you submit in mono anyway, so am I correct in that?) Anyway, even though I’m recording in mono, on the RMS Normalize screen it asks me to choose between Linked Stereo or Independently. What do I choose, or is this a moot point since I’m in mono already?

Oh and to answer your question, no I don’t write. The sample I submitted was just an audition script that happened to be on my screen at the time. I just submitted some samples to my Profile on ACX, and auditioned for 2 or 3 books, and then the next day I got an offer. So I figured let’s finally give this a try!

Now, having heard my voice and my sample, do you have any other suggestions for me? I know they say you should avoid plugins and things like De-essers and noise gates and those sorts of things, but is there anything of that nature that you would recommend for me, that you know from experience doesn’t typically cause your audio to fail ACX final acceptance? I don’t want to do anything to jeopardise that, but I do want to sound as good as I can. For the past year, I’ve been processing the hell out of my recordings, I mean seriously picking through them word by word and using things like Fade In and Fade Out to manually soften all my plosives and heavy esses, and to soften my breaths, because I didn’t know any other way to fix them. But i’m afraid to do those sorts of things manually now because I don’t want them to accuse me of over processing, and it seriously takes FOREVER to be so anal about it all, but I’ve trained myself into being a harsh perfectionist and now every little imperfection in my voice stands out like a sore thumb to my ears. So hearing someone else’s recommendations or evaluation would really help.

Thanks and goodnight, I"ll be back around in about 9 hours and I’m going to start recording shortly thereafter, to see if I can knock this book out today as I’ve been tinkering for about 3 days and I’m coming close to my deadline.

(Also, I do not receive any notices when I get replies to my thread, even though I’ve gone into the User Control Panel and found that this is a Watched Topic, and my email address is correct, and I have ticked Yes under Edit Posting Defaults to say Notify me upon replies by default. So I don’t know what I could be missing here…)

Is it tomorrow there, or yesterday? I can’t keep track of what happens at the date line.

you shouldn’t record in stereo

Some people have to. Some digital adapters force you to record in Stereo and you have to take steps later to split your performance into Mono for processing and submission. You have one of the adapters that allows you to work directly in Mono and I have the other, Behringer UM2. There may be others.

ACX recommends work in Mono but they won’t come after you with a stick if you work in Stereo. They do say once you pick one you should stay there for a project.

RMS Normalize screen

I leave the setting default. It’s one of the headaches Stereo people have. If you link them, Left and Right corrections are applied to both tracks. This keeps stereo imaging correct (violins on the left). The other setting treats Left and Right as independent tracks. This can be handy if your digital adapter puts you on the Left and nothing on the Right. The blank track will seriously throw off linked loudness settings.

it seriously takes FOREVER

Right. You should solve as many problems at the microphone as you can. We assume you’re a business and the mantra is supply a minimum acceptable product for the least cost and labor. Correcting a presentation word by word gets really tired over a book-length show and generates many possible problems.

I know they say you should avoid plugins and things like De-essers and noise gates

I say that, too. Some corrections generate other problems which then need corrections…etc. I was very pleased to get you to submission quality without Noise Reduction. Tools such as that can generate Essing and that means you need a DeEsser…etc.

Harsh, gritting Essing drives me nuts and I didn’t hear any of that from your samples. You have one of the equipment suites recommended by ACX and in general works very well with few corrections.

As a personal exercise, I wanted to find out what I could get away with for a reading. I nearly made it with my iPod.

So no, a very quiet room and reasonable microphone work just fine. I did use a free download recorder rather than VoiceMemo. VoiceMemo has voice environment processing.

That’s not to say you got away clean. The metaphor for a book reading is not presenting before a joint session of the Prince Albert Academy. It’s telling someone an interesting/juicy story over cups of hot tea. I’m not sure if this was intentional or not, but there are natural places for three contractions in your sample and you carefully avoided all of them.

“It is difficult,” > “It’s difficult.” The presentation decision is a little rough if you’re directly quoting published works and the works are written in stilted university-speak. This is where you pay attention to your audience with glances to the person writing the checks.


I’m not sure if this was intentional or not, but there are natural places for three contractions in your sample and you carefully avoided all of them.

You’re right, I am probably a bit too precise when it comes to reading the script exactly as written. There are times that when I’m reading a script and I will slightly rephrase something slightly because it flows off the tongue so much easier but then I get paranoid and I say it again exactly as written. But I am hoping to relax into an easier rhythm and sound more casual so thanks for pointing that out, I needed to hear it and it will be fresh in my mind now.

I watched the video series on mastering on the ACX site, and I have one other silly question. They suggest recording 30seconds of room noise, and then cutting that into smaller chunks to paste into the start and end of chapters, between sentences where needed, etc. But why would you cut up a long session of room noise recording into all those pieces when you could just cut and paste the same section over and over again? I normally do about 6-10 seconds of room noise at the beginning (well I did this because I used it for the Noise Reduction - but no more!)

My next bit of tinkering is going to be to rearrange my environment a bit, move my mic so that it’s not in the centre of there room anymore, but instead is closer to one of the walls that I have treated with thick Autex sound panels. Right now the way it is situated, I have to speak sideways into it so that I can still see my screen to read copy. Up until now, 90% of my recording work has been short stuff (under 10mins or so at a time) so it didn’t bother me much, but I can’t read an entire audiobook out of the corner of my eye, I’ll go even more blind than I already am… :nerd:

I am probably a bit too precise when it comes to reading the script exactly as written.

Neither ACX nor Audacity have any horse in the race here. This is entirely between you and the client—and by extension, the audience. My favorite shows sound like the presenter is speaking to me, not me and 300 other students/legislators.

I have heard suggestions of pretending there’s a cup of tea in front of you and you’re speaking to someone on the other side of the table. I’m not a presenter, so I’m taking their word for it.

But why would you cut up a bit room noise recording into all those pieces when you could just cut and paste the same section over and over again?

I can answer this using SWAG technology (Scientific, Wild-Ass Guess). The mind has a remarkable ability to turn meaningless garbage into valuable information. Historically, it let us recognize hungry tigers hiding in the jungle. More recently, it’s responsible for subliminal experiments, etc. I wouldn’t be shocked to find after multiple repetitions that listeners begin to recognize that one chunk of background noise. It doesn’t have to be explicit. Just that niggling feeling that something’s not quite right. Anything that distracts from the story is to be avoided.

There are clear standards for submitting silence along with your voice work. It’s not all just start talking and then stop. One thing you can do is intentionally record silent heads and tails so all you have to do is cut the room tone down to the proper amount.

rearrange my environment

Fatigue also contributes to the beginning and end of a chapter not sounding the same. Not recommended.

I would not have picked the middle of the room. There are some actual scientific reasons not to do that.

You don’t always need expensive sound paneling. I did some perfectly fine presentation recording in a storage room surrounded with boxes of statements, billing and archival paper records. Given you have a quiet neighborhood, a stuffed garage can work for the same reason. Echoes don’t have a chance in the presence of all that acoustically dead, oddly shaped and positioned cardboard. The goal is to avoid sounding like you’re recording in a bathroom.

I recorded people in bare, live conference rooms using furniture moving blankets.


By the way, how do you change that exactly, I’d like to make the numbers easier to read on my weak eyes!

I’d like to make the numbers easier to read on my weak eyes!

Oddly, my change doesn’t do that. I extend the range of the sound meters with Preferences > Interface > Meter dB range: 96dB.

Then I grab the meter edges and push and pull until they fill the frame left to right. The other panels should skooch out of the way when you do that. I think you used to be able to make the meters taller… I don’t remember that one. You can undock them and push them around your screen whether or not they’re in the Audacity Window.

But that’s not why I do it. Nobody is going to be looking at -96dB sound levels. That’s down where atoms vibrating make noise, but I wouldn’t mind knowing what’s going on just below -60dB. That’s the area where turning off your noisy CFL desk lamp can make a beneficial difference, such as reducing noise from -65dB to -68dB. If your meters stopped at -60dB, you would never know.

Doing this dance to the meters is visibility neutral because the area around the -6dB colours doesn’t change size.


Ahh ok, well at least there is a broader color meter to catch my eye if things go awry! I often record very vibrant dialogues and scripts that go from near whispering intimacy to loud arguing/yelling and that is where I fear I am going to have the most trouble trying to get everything to meet requirements.

Also, just another question… so I’m doing great with just the RMS Normalise and the Speech Roll-off now that I have my gain turned up so high. However, I’m noticing in a few samples I’ve recorded and ACX checked, that my Peak level is exceeding limits now. What is the best way to rectify this without making other changes I shouldn’t? For example my ACX audition for that childhood obesity book I was working on is ready now, but I noticed that the ACX check shows:

Peak level: 0.911381 (-0.8 dB) << Exceeds ACX -3 dB max
RMS level: 0.101131 (-19.9 dB) … Passes ACX
NoiseFloor: 0.000263 (-71.6 dB) … Passes ACX

If this were my recording to be submitted to ACX for approval, I’d need to know best how to correct that. Should I Normalize? Use Limiter?

If if helps to see the recording you can access it here:

Oh, and the good news? I got my 2nd Offer today. I’m over the moon, and so glad that you’ve helped me to streamline this processing chain down to a science, it feels much less daunting now to think of doing such long recording projects now that I feel more in control of my mastering skills :slight_smile:

You didn’t follow the rules. Effect > RMS Normalize > Effect > Limiter > Effect > Equalization: LF Rolloff.


Oops! I totally missed the Limiter instructions… going back to find those now!

found it!

Effect > Limiter: Soft Limit, 0, 0, -3.5, 10, No. > OK.

Do you have any words of wisdom about how to go about these sort of chapters, will the settings you’ve already given me help sufficiently under these conditions? I know that I need to avoid clipping (red on the meter) at all costs, but should I start at a lower gain setting from the beginning to leave upper room? Or get further away from the mic during the louder segments? I am just worried that the normal speaking parts have to match the other sections/chapters, and how do I keep from getting too loud with the screaming/theatrical bits?

No rush on this answer if others need help… I just know that I’ll have to do this eventually, but not this week…

That’s it.

We never used to be able to set RMS (Loudness) directly with any reliable tools, so we always had to work around the barn with indirect tools and hope to goodness everything came out right eventually. Not any more. flynwill developed a proof-of-concept RMS tool and steve developed a finished effect. That would be SetRMS and RMS Normalize, resp.

In English: Adjust loudness, Limit those occasionally troublesome blue wave high tips (peaks) and check for Noise. The three ACX specifications in three tools.

Noise can be messy. That’s the step that can split depending on what you have wrong, and that’s the step that prevents us from designing a single mastering tool.

Rumble takes one tool, microphone hiss takes a tool and Yeti Curse Whine takes a third. Each one slightly damages the sound, so we can’t just throw 'em all in there. We have to find the appropriate one.

And to be clear, you make full ACX Conformance without the rumble filter, but it’s close. Since you have a woman’s voice and you have a little rumble, LF-Rolloff is perfect.


Do you have any words of wisdom about how to go about these sort of chapters

Traditionally, that’s done with microphone management. You can whisper or present sotto-voce, but do it really close to the microphone so you sound close and intimate at normal speaking volume. You can yell, too, but yell off-microphone, so the timbre of your voice is yelling, but the volume doesn’t go up.

Oh, and while you’re in close, take care not to pop your P’s. Since the whisper voice is intended to be slightly unnatural anyway, talk across the microphone rather than directly into it.

Until you get good at it, you’ll need to see the volume meters and script at the same time. After a while you just know. This is also where good, real-time headphones come in handy. You can hear your volume adjustments as they happen and rely less on the meters.

You can try the compressors, but they don’t fit well with the other tools in the suite. Much better to do it at the performance step.


If you’re doing a podcast, you might try Chris’s Compressor.

Chris is a full-on, completely automatic production processor. I use it to “tame” a download podcast and make it sound very like the processed broadcast version of the same show.

I change the first setting, Compress Ratio from the 0.5 default setting to a stiffer 0.77.


you might try Chris’s Compressor.

As with most automated processors, you can fake it out by accident. Make sure you have something other than dead silence for the first second of a show and try to avoid very large, unnatural volume swings. It doesn’t handle those well. That’s why it’s not a Universal, Fix-Everything solution.


Here’s a clip from one my favorite voice performers, Molly Wood from NPR Marketplace. I couldn’t easily find a self-contained clip without the music.

We had a graphic artist who could do that. He would give talks and travelogs over lunch and play to pack houses (screening rooms). SRO long since vanished. Squatting on the floor room only. There was a joke he could announce he was going to read the Burbank Phone Book and play to a capacity crowd.


I just joined this forum and saw this string, I have the same questions. I just got back from ACX that I have 10 (out of 19) chapters too loud and 1 too soft.

I have not found the lead note in the string, but I seem to have the same questions. I think I get what normalization and compression is doing to the wave form (I was a physics major so I get the math) , but my questions are, how do I get the RMS value in audacity?, and if I do the 19 files separately, will they all come out at about the same level or do I need to put all of them in one file, one big set of 19 tracks and normalize them all at the same time, or would that be so large a file that the computer will puke?

At least in this audio world, I’m really a total novice!

Oh, and to the author… Congrats on you second contract… I got one, a one hour book, then kept getting lots of rejections. Blasted out 10 auditions and got 4 offers in one weekend! 1st for a 13 hour book… I’m just got back all of the processing notes on that one… Getting the offers is exciting… Congratulations and keep going!


RMS means Root Mean Square or the area under the curve, which in audio is interesting to derive because of complex waveforms between 20Hz and 20000Hz. It also happens to correspond to loudness.

So when people tell you to set your RMS, they’re really complaining about loudness. ACX has very specific restrictions on loudness variations, -18dB to -23dB. 6dB is not that great a loudness variation and as long as you’re inside of that range, chapter to chapter variation should not be a problem.

Until relatively recently, we had no good way to directly set RMS. Now we do and the mastering process, assuming you get close at the microphone can shrink to a sentence or two.

Every time I write a Mastering Tutorial, something happens and I have to start over. This was the last pass.

You’re good all through Comments

Custom Tools changed. We are now using RMS Normalize with a setting of 20dB (it defaults to 18) in place of SetRMS. You need to be on Audacity 2.1.3 or later!!

Use the first three sentences in Process subbing RMS Normalize (20) for SetRMS.

If you have noise, that’s when the process falls to pieces. There’s different pathways depending on what your noise is.

I will eventually fix that publication…

Oh, sorry. Process chapter at a time. The new tools should keep you inside ACX conformance.

But the really bad news is the need to start with a raw reading. We can’t take effects and corrections out, so going back and fixing a badly processed reading is rough/sometimes impossible. We strongly recommend exporting your raw readings as WAV (Microsoft) 16-bit before you go on.


I have 10 (out of 19) chapters too loud

Do you know about ACX-Check? That’s an Analyze tool that automatically reads out (among other things) the three important AudioBook settings.

If you only missed loudness but hit the other two, it could be really rough to recover from that. Did you keep the original readings—before you processed them?

First reading?