My audio process for ACX

I see so many people that have problems with levels! It isn’t that hard truthfully. There are macros but I think it might help to actually know what is happening instead of just one more thing to run. I have been an audio\broadcast engineer for many years and have had the chance to play with many audio processors, you should play as well to learn how they can be made to work for your voice.

Generally speech has a wide dynamic range, from quietist to loudest and we have a hard limit in recording to digital. In my work I find one has to average no more than -28 to -25 to allow for peaks. To get to the -20 db we need to cleanly change the dynamic range. This means “compressing” it. Limiting and compression are the same thing really, just a difference in degree. The MOST important thing in processing is to have the cleanest quietist noise floor possible before you do anything. If the PC\laptop is making noise, get it out of the booth. If the dog is breathing too hard give them a treat and get them out of the booth. Ditto any fans or other things. We are looking at a MINIMUM or 1/1000 difference between your voice and the room tone before processing. That is the difference between 0db and -60 db. It’s got to be very quiet! I can’t overstate how important this is.

Here’s my processing path.

First get rid of the very low stuff. I use a gentler rolloff than the “filter curve” “low rolloff for speech” but that works very well. I get almost an instant 10 db better noise floor! There is no good reason not to since anywhere this audio will be played they will not reproduce 50-100 hz or lower anyway so just pull it out carefully.
Then I do use the NR but in the 666 recommendation, I may go to 8 db from 6 but no more than that. Listen and if you hear the slightest mechanical sound you’re using too much.

Next compression and limiting.
I used to use the compressor, in audacity, threshold -20 at no more that 2:1 ratio. This tames most of the peaks almost inaudibly. Then limit at -3.4 or -3.5.
I’m assuming all recording is done using 32 bit floating point so we can get the best work out of our DAW. This is important.
Use the “Loudness Normalize” at “RMS” and set at -20 db. Now don’t worry about the red peaks. Run the compressor. 90% or more of the peaks will go away.
Limit at -3.4 db and you are done! Export to WAV (for archive) and export to MP3. Now this will probably have brought up the noise floor but this happens. We have put the audio under a magnifying glass! But it should easily fit in the ACX spec.

Never ever IMHO process before editing. I at least find it much easier to edit un-processed audio and then process. Your experience may vary of course.
I routinely have a -68 to -75 db noise floor in my less than perfect booth when all done.

What I use;
I use the classic filter with a 3rd order slope set at 75 hz for low end filtering. After denoise I use the loudness normalize set at RMS and -20 db.
I found a very nice free VST compressor\limiter plugin that acts like a broadcast processor that I am familiar with;

I have it set at a 2:1 compression at -20 or -25 and set the final limit at -3.4 db and turn OFF the AGC. It didn’t sound good to me, pumpy and breathy but you might like it. Play with the settings and find what sounds good to you.
Good luck to you all!

Thanks for your comprehensive description.
I’ll tag it as “sticky” (stick to the top of the list) as I expect that other audiobook producers will find it useful.

Thanks. I didn’t realize it would turn into such a wall of text, and I had to clean up some typos. Thanks for this great forum!

If the dog is breathing too hard give them a treat and get them out of the booth.

Oh, no. Not the dog!

I routinely have a -68 to -75 db noise floor in my less than perfect booth

Describe the booth.

Examples/Sound Tests? The forum will let you post 20 seconds of mono WAV or 10 seconds of stereo. We usually insist you post clean, unprocessed work if we need to dig you out of a hole, but in this case I wouldn’t mind hearing a fragment of a fully mastered submission—without the pumping.

Scroll down from a forum text window > Attachments > Add Files. I can’t tell if the forum is going to let you do that or not.


Ok, I’ll give it a try. This is the intro for something I’m currently working on.I would appreciate any comments, sometimes the best and most painful tool is a mirror!
My booth is a typical PVC pipe rectangular frame 4 ft wide, 6 feet high and 7 feet deep, covered with moving blankets from home depot. My mic is a V67G by mxl to a behringer UMC 204HD interface. Let me know what you think.

covered with moving blankets from home depot

My Home Depot didn’t have the blankets, but the Harbor Freight did.

I went looking for “furniture pads” and the signs sent me to those little sticky fabric things you put on the table’s feet so they don’t mar the polished wooden floor. No cigar.

I can listen to a story in that voice. In my opinion it has just enough breathing to prove you’re human without gasping or rasping. Good interpretation and theatrical expression. I didn’t hear any background noise problems. That doesn’t have a noise gate, right? You’re doing all this with a good studio and gentle noise reduction?

And no dogs?


You do have a little crisp Essing. Punched SS sounds. You may decide to go with it the way it is. It’s not a job killer (in my opinion).

It’s rough to see in Timeline Spectrogram view, but it’s a more obvious in Analyze > Plot Spectrum.

You have the Essing Haystack.

That bump on the right is not natural and some microphones do that because it’s “more professional.” It’s not tone controls and it’s not equalization. It’s a dynamic effect, so it’s a little rough to suppress without throwing a blanket on everything else. I applied Trebor’s DeSibilator.

At these settings.

Screen Shot 2021-05-20 at 8.19.52 AM.png
And this is the new track.

Some microphones have gritty, harsh, piercing Essing that’s hard to listen to, but yours is probably good without any corrections.


It’s odd that effect doesn’t show up on the specifications of the microphone. They don’t have that bump. I think there’s something else going on there.

[remaining vigilant]


I don’t hear that well above 10 khz so I figure it’s better not to adjust what I can’t “see”. On home depot, they also had the clamps I used to hold the blankets up for like $1.50 each.
I was thinking the harshness may be caused by a reflection from my tablet to the microphone. That kind of path might cause some comb filtering at 5-10khz or so. Thanks very much for the feedback, it is appreciated.

On a back to back listening to your de-ssed and my original it turns my ss’s into a lisp, at least on my jbl monitors and sony headphones. Seems a little excessively de-essed to me. More like a processing artifact than a natural vocal sound. I do appreciate the thought though.
For $80 the V67G was one of the better buys I found a few years ago. While not an EV RE20 or Neumann (we use both where I work) I found it a pleasant sounding mic and for the cost the behringer seems to get the job done. The mic cable is a scrap piece of quad cable I put connectors on, since I’ve found a great many store bought mic cables are trash. I am still on the fence about USB mic’s so I got what I knew, an analog large diaphragm condenser.
Thanks again.

On home depot, they also had the clamps I used to hold the blankets up for like $1.50 each.

That’s how I did my portable, knock-down studio.

That’s one wall kit.

It’s a lot smaller than trying to do it with pipes. Assemble for a three or four wall studio. Come back in 40 minutes and we can start performing. I designed this when I got volunteered to record an important vocal performance in one of our shiny, bare-wall conference rooms.

It does bother me that the specifications of the microphone and the performance scans seem to not match. If you didn’t like the DeSibilator, you would have hated the regular DeEsser. DeEsser is notorious for FFing sibilance.

I am still on the fence about USB mic’s

Don’t be. You don’t buy a USB microphone. You buy a USB Mic and computer combination and there are combinations that hate each other. We publish a “Frying Mosquitoes” filter to get rid of the worst of the bad interaction noises.

There doesn’t seem to be a Behringer umc2404.


Excuse me, I typo’ed that. Behringer UMC204HD. Sorry.

Cool. And you did that to get the Midas preamps?

That’s a nice touch.

Did you have any trouble mounting a stereo interface as mono? Did you bother? Do you get that -6dB clipping error?

These are the kinds of forum questions we get, and you got it all to work. You’re a celebrity.

Can I have your autograph?


Yep. I heard good things about the preamps without spending a lot of cash. I don’t think I’m cheap, but frugal and I wanted a setup that could give me good audio before any cleanup.
I have no idea what you mean about the -6 db clipping error? What sort of problems would one have with stereo? Granted most of my recording is mono but I’ve done some multitrack things in audacity with stereo effects and never noticed a problem. I also have a couple of cheaper behringer mics I’ve used on a T bar for X-Y stereo and never had an issue. Can you elaborate?
I just put a mic in the left channel of the pre, and assign audacity as mono left channel for voice work.
I have done MS recording professionally and loved the flexibility but the cost of a good multipattern mic is substantial.

I just put a mic in the left channel of the pre, and assign audacity as mono left channel for voice work.

Right. But when some non-celebrities do that, the system naturally assumes a mono mixdown and reduces the volume of each channel by half to “make room.” That gives you a -6dB clipping point like this.

Here’s one in-the-wild complaint. Note the poster mixed up -6dB with 0.5 track height, but this is the error.

Please note the Operating System seems to have cause the error.

In all cases we recommend very strongly to pay attention to the SIG and PEAK lights on the interface.

There’s some discussion on what exactly is doing this. There are phrases such as: “When the interface clips,” or “When the system reduces the volume.”

In some cases it’s possible to get driver software which prevents this reduction.

One “solution” is go ahead and record assuming -6 is the new 0 and boost it later. This isn’t optimal because you still have noise from both preamps. Also see: Record in stereo at full volume, split to Mono, and delete the dead Right channel later. It’s more work, but it’s quieter.

Then there’s the built-in mic on my MacBook Air. In Mono, it records a mono track full volume and in Stereo it records full volume on both tracks.


I just put a mic in the left channel of the pre, and assign audacity as mono left channel for voice work.

That’s the metaphor, yes, but exactly what settings did you change?


Ok,for all I know it may clip there. I would never run my recorded audio level up that high. Always like to have headroom from my reel to reel days. I’ll have to give it a try. I think my peaks on raw record are like -8 to -10 db.

I just checked and it DOES look like it clips at -6 db on the input. My first thought is I could make a Y cable of 2 XLR males and 1 XLR female so I drive both inputs of my preamp. That probably wouldn’t work since it’s in single channel mono in audacity. I didn’t see anything in windows (I use a win 10 PC) or in audacity to change that. Have you?

I would never use that 0-1 vertical amplitude scale, I would ALWAYS change it to db.In my mind after years of seeing VU meters of all sorts calibrated in db it just makes more sense and lets you actually see more of the waveform during editing.
Of course always record at 32 bit floating point.
That -6db clip point is interesting, I just hadn’t noticed it before.

I’m sure you are aware of ity but others may not be, one other thing I like about the db scale instead of others in audacity is that it gives you an idea of what your RMS as well as peak level is. You can see the audio on this file pretty much fits acx spec. Especially in the dark profile it stands out well. I have seen some folks that edit in the spectrum screen and while I may try it, it looks so odd!

That -6db clip point is interesting, I just hadn’t noticed it before.

That occurred to me after I wrote that. If you’re conservative in your recording, you may never have caught it doing that.

There is yet another variation. I have a Behringer UM2 single XLR interface. It clips at -3dB/70% whether I’m in stereo or mono. This is a “mono” interface, so not having the level shifting is understandable. It’s job is to mix your guitar and your microphone—not likely to ever add in volume. But I do wish it had simple, ordinary, full volume range…


I don’t like the timeline being in percent. I want it in dB, but not that dB.

It’s exactly the same waveform as before, but the labels changed to more match what the sound meter is doing.

Quick, what’s -3dB in percent?

[put down four, carry three, anti-log base-10 of the doobly-doo]


I didn’t just make up that number. -3dB is supremely important to the audiobook people. That’s the point where ACX/Audible rejects your sound peaks.

it gives you an idea of what your RMS as well as peak level

The playback meters and the default timeline read in both RMS and Peak. RMS is the lighter color.

Screen Shot 2021-05-22 at 9.46.49 AM.png

Not to be too pedantic, the -3db is 0.7071,
On the db scale, everyone has their favorite wrench! I won’t argue that at all.
I think we should be able to drive to 0 db but I can certainly work with this.