ACX check robotics and RMS

Can anybody provide a link to a tutorial that will give someone a step-by-step creation of a complicated audiobook file?

I got six chapters into a novel of mine (published back in 1970 by McFadden in New York City) that I’m trying to turn into an audiobook that will provide more than just narration of a novel. Part of the problem with audiobooks has to do with the fact that novels, or books rather, of any kind were meant simply to be read and not listened to. A script, on the other hand, IS meant to be listened to and is written expressly to that purpose. I learned that much working as a voiceover artist on a number of Discovery channel television documentaries. I always marveled at what those broadcast engineers were able to do with not only my voice but all the music and sound effects that went with it. Those technicians were top-of-the-line and had to of had doctorates in that profession. If we are talking in just those terms, I would not of gotten out of elementary school.

I have six chapters (out of almost 30) done to perfection, complete with music and sound effects, character voices, etc., but all this requires a dozen or more audacity tracks to bring together. I have tried noise reduction to get rid of extraneous sounds because I cannot afford an expensive sound recording room that is free from all that. Instead I have rigged up a very small closet with egg crate type foam coating the inside of it which seems to give me an extremely claustrophobic environment where my A–21 audio technica microphone while sent to three quarters of the meter’s maximum range produces large green streaks on the recording meter when one is simply breathing inside that studio closet. My pop filter doesn’t help that. Although voice is picked up beautifully.
The ACX check robot seems to think I’m going about this all wrong, it passes some proportion of a five-minute tape and trashes others. It seems I am always either under -3 DB are over -3 DB RMS on portions of my tracks.
I find that applying any sort of “effects” such as reverb, amplification (plus or minus), change of pitch, anywhere within those dozen or more tracks will often cause Audacity to ignore the “mix and render to new track” command after all the tracks have been selected, and takes about two seconds to do what it calls a render and leaves me with nothing but a narration track. It didn’t used to do this before and I never had a problem with mix and render until I tried to go beyond the opening through chapter 6 (which has always worked beautifully) but it now refuses to mix chapters 7 and 8 which has its own project file.
I can’t seem to get my chapter lengths for sound much under a gigabyte of data. How is an individual file that wants more than merely pure narration supposed to come in at 170 MB or under? As I read the requirements for that from Audible’s list of them, the only workaround seems to be to turn one chapter into seven or eight or nine or ten mini-sections.
I am familiar enough now with Audacity to create the sound and the effects with music and narration to get what I want, but I doubt that I can ever get my files to where they will meet those very stringent requirements. If there is a workaround for this situation and anybody out there knows what it is (and can be made to work before the whole market for audiobooks simply collapses) please tell me…

How is an individual file that wants more than merely pure narration supposed to come in at 170 MB or under?

It doesn’t matter what the content is. I just created a one-hour mono show suitable for testing. I exported it at 192 quality MP3 and it gave me an 86MB sound file, generously under the 170MB limit. I’m also half-way to the other limit they post of not speaking over two hours.

a novel of mine … I’m trying to turn into an audiobook

I don’t think that’s correct. You’re trying to turn it into a theatrical radio drama. You all but said so. I’m surprised the company didn’t comment on the addition of music and sound effects. That’s normally not done.

It seems I am always either under -3 DB are over -3 DB RMS on portions of my tracks.

I think you’re misinterpreting the technical standards.
This is an English version of the three standards.

The first value, peak has an upper limit. Peaks may never, ever get louder than -3dB (70%), but they can be quite a bit lower than that without triggering an alarm.

RMS is a fancy-pants way of measuring loudness. That one does have two limits, an upper -18dB and a lower (-23dB).

It is possible and I have done this multiple times, to read passages into a microphone under good conditions, stop, gently adjust the overall volume, export an ACX compliant sound clip and break for lunch. So while admittedly difficult, this is not rocket surgery.


I’m just now, after reading through that the fourth time, waking up to what’s happening. The ACX Robot knows the natural cadence and structure of the human presenter/announcer. I bet you’re driving it nuts with the special effects, production sounds and background music. Exactly the same thing happens when you try to put music into a conference voice system or record tunes with a default Windows sound system.

Since you’re recording on a Windows machine, it’s possible internal voice processing is causing some of your odd problems and wandering test values.

In my opinion you should create the full theatrical presentation and run it through a stiff, global compressor such as Chris’s Compressor add-on to even out the lumps and bumps, and post it yourself — or — read the plain, flat audiobook and submit that to ACX. I don’t think the full theatrical presentation will ever make it through ACX compliance.


Reading it the fifth time.

produces large green streaks on the recording meter when one is simply breathing inside that studio closet.

You have help. Unwelcome help. A normal, flat audio recording system will not do that, but an automatic conferencing or chat system will. A presentation recorded like that, with volume changing by itself will never pass the third value, the ACX Noise Specification.

We wrote a thing about that. Follow this through.


Thanks Koz, it never occurred to me I was trying to re-invent the wheel, I was just trying to come up with something a little more grandiose than flat out narration. One of the free gifts I got from Amazon was an audiobook of Game of Thrones, read by Roy Dotrice. That actor is very talented – nevertheless, his reviews for reading that were abysmal. I believe I know why… The audience must’ve seen an episode or two of the full-blown video production on HBO, and of course no audiobook can possibly hope to compete with something like that, so his reviews were unfair and could possibly damage his own career for other things. I just wanted to avoid a similar situation. I hardly ever listen to audiobooks, preferring to read instead of listen to novels I’ve heard good things about, but I have relatives who do listen to a lot of them and they told me that nearly everyone has some sort of music or sound effect attached to it.

Thanks again for the definitions and the advice, which I am going to try to follow.

Did you go down that link to find out why Windows is messing up your voice? Your studio experience with moving sound levels is not normal.


Yes Koz, I went to all the links you sent. I’m uploading a screenshot that I took of my last ACX response. I managed to come in under the peak levels but somehow the RMS is still too high. I tried the compressor affect and tried the maximum and minimum ratios on that and couldn’t pass anything with that. It’s still kind of a Sanskrit to me to figure out what the - plus the three DB actually means things like the lower you go the louder it is, and the higher the number the softer it is. Is that correct? Could using the amplification effect to lower the volume level help that?
Screenshot (43).jpg

Here’s an illustration of some of the relationships. I changed the bouncing sound meters around so they’re bigger and easier to see. You can do that, too. The little bar to the left is a position drag bar and a little bar to the right will let you set the sizes.

The bouncing light sound meters are in dB and the blue waves are in percent. 1.0 = 100%. Maximum dB is 0 on the right and the sound gets quieter as you go to the left. The numbers are negative to the left, so they are actually getting smaller. In my illustration, I made it a point to have the sound meters flash at -6dB which is the announcing goal. The meters turn yellow when you do that. Every so often, the meters should flash up there while you’re announcing. That corresponds to about 50% in the blue waves, also in the illustration.

I found it handy to start from somewhere concrete because if I don’t, it’s easy to chase my tail which is what I think you may be doing.

I published a process for announcing a test clip.

Open the clip in Audacity.

– Select the whole clip or show by clicking just above MUTE.
– Effect > Normalize: [X]Remove DC, [X]Normalize to -3.5 > OK

Run ACX-Check.

What do you get?

Now post the raw clip. The forum should let you post a 20 second mono WAV file. Scroll down from an Audacity forum window > Upload attachment.


If your browser is zoomed in too close, it will cut off the right-hand edge of the graphic. Zoom out slightly.


Thanks again, Koz –

I set it all up just as you advised… It would seem to be rocket science, though. Is there any way I can set Audacity up with permanent markers on those meters to keep me within acceptable limits? I find if I fiddle with it enough, I can pass the -3 DB and the -23 DB, but then I get killed on the noise floor or the RMS level. I find myself juuuussst over the limit every time on one of those after the ACX robot does its thing. My recording studio is always going to record my breathing while I speak and the recording meter does go nearly all the way to the left (where all signals start and grow to the right) and if one does any of the traditional effects built into Audacity it blows the whole plan and all bets seem to be off. If the idea seems to be to keep the meter recording between -3 DB and -23 DB throughout the whole clip, it seems an impossible task! A textual paraphrased metaphor for the process seems to me similar to Abraham Lincoln’s famous observation “you can fool some of the metering all of the time, all the metering some of the time, but I can’t fool all the metering all the time.”

It seems to be terribly difficult to meet the parameters that ACX demands for just plain flat out narration!
Am I reading something wrong or what I wonder…

We are doing a stand-alone test clip because I can feel the problems bubbling up. It doesn’t work when we start analyzing a processed performance. We can’t take processing out of a clip and there’s a terrific chance we’re going to recommend tools and filters that will mess with some of the processing you’re already doing.

That and it’s good not to work blind.


Dueling posts.

the parameters that ACX demands for just plain flat out narration!

This isn’t a cellphone call or a Skype conference. You’re replacing a studio and recording engineer. Sometimes it works out better than others.

I like the Car Talk radio show. It’s produced in the WBUR Boston studios. The Car Talk executive producer decided to save a bunch of bucks by shooting the show themselves at home. No WBUR studio fees. They made it through two ratty, unstable network radio shows and went back to WBUR.

See if you can post that sound clip. I’ll try to give you step by step how to process it for submission.

You should have the Audacity 2.1.2 record meters set for colors, not the split green display. Forget RMS and noise for now. The goal is to make the room as quiet as you can and announce so the tips of the sound peaks turn yellow every so often as in my pix. That’s recording volume centered between overload damage and distortion from low volume. The recording sweet spot.

Nobody will come out with a gun if you’re a little low volume. We are trying to avoid the performances where the blue waves are almost a straight line and the bouncing meters are over on the left. Those are just trash as are the shows where the sound meter is red and the blue waves fill the timeline.

Crank out that test clip and I’ll talk you through the rest of it. Don’t process the clip. Export WAV and post it.


I found one of the clips. I don’t remember how I did this one, but I can see from the notes I didn’t do much past announcing and setting general ACX sound levels.

ACX recommendations are for higher volume than you can comfortably do in live recording, so it’s almost always required that you adjust overall volume. Maybe I pushed a pinch of Noise Reduction in there. I don’t remember. -72 is quieter noise than I usually do.
Screen Shot 2016-10-25 at 9.27.51 PM.png

I write short stories. Sooner or later I’m going to set up and record one of those myself. See if I can make it pass without 52 filters and effects.

“Once upon a time…”


Dear Koz –
You really are, and thanks again – what you mean by standalone test? A raw file that I narrate? I have a confession to make. He’s in failing health now but my brother-in-law is a broadcast engineer in Bethesda Maryland. We’re a close family in feeling but far apart in physical space. He owns and runs a company called Absolute Pitch, and helped me get voiceover work from years back, and even got an Emmy nomination for a Discovery channel series called “SEA WINGS” which I had the good fortune to narrate.
It’s a complete audiovisual production house but I hate to bug him during an election season because that’s the really busy one. But he has offered to clean up some of my files after the election.
What I am really wondering about after visiting and checking out all those tutorials is whether or not I might be using the wrong recording software. I don’t have an mbox preamp or a real fancy mic – it’s just an A- 21 audio technica with a pop filter and I’ve had it positioned all wrong according to the tutorials. My recording space is so tiny – and even though I have most of it lined with egg crate foam, I can see from those videos at ACX that there is some sound bouncing around in there and just breathing sends the green recording meter up to about the physical size or circumference of a nickel and sometimes a quarter – and that’s not breathing heavily. I did have my mic recording volume up full, but but then pulled it back to about 70% and tried it that way without it helping very much.
ACX seems to be pushing the ProTools platform almost exclusively and I know my professional broadcast engineer brother-in-law swears by it. I have a feeling I can do all the things that I originally envisioned doing by taking advantage of his professional expertise and processing all the way out in Bethesda. I can send him the zip files on all the music and sound effects that I purchased the rights to (Sounddogs and Melody Loops) he already has my opening file through chapter 6 that sounds perfect to me, but will never pass ACX standards. I think he has the expertise to make it all work. I think he might also be interested in coming on board at ACX as a producer in his off seasons. He has Parkinson’s and is limited to the amount of time he can devote to anything really. So I am semi-on my own and trying to cover all the bases. You obviously have been at this longer than I have and I am curious as to what your set up is…

what you mean by standalone test? A raw file that I narrate?

Yes. As it says in the document, hold your breath for two seconds, announce for about 18. Stop, export as a mono WAV and post it on the forum.

That’s a grand first step to figuring out what’s gone wrong.

just breathing sends the green recording meter up

It’s supposed to. One important difference between the blue waves and the bouncing sound meter is volume. The blue waves only show you the loudest parts of the show. If the volume of the performance falls below that, the display just flattens out and that’s that, even though the performance is still going. That’s at about -30dB if you’re keeping score. Audio recording limit is -96dB. Quite a difference.

The sound meters, depending on how you have them set, move when you have almost any sound at all. So as long as it’s not carried to extremes, having the meters move when you breathe is perfectly normal. All these conditions are a matter of extreme. That’s why that raw clip is a big deal. We can tell in 20 seconds how your studio is working.

I am curious as to what your set up is…

Everybody gets this one wrong.
The most valuable part of my setup is the third bedroom.

One of the original owners of the house played drums and that room has home-style soundproofing on the ceiling and walls and heavy carpet on the floor. My recording computer makes little or no noise. Past that, it almost doesn’t make any difference. I got super close to an audiobook test clip by announcing into my laptop built-in microphone and recording in Audacity.

ProTools is a delightful software package, but I didn’t have any trouble recording voices and sound effects commercially with Audacity. Unless you’re doing something dreadful wrong, the recording software is the least of your problems.

Audio Technica doesn’t appear to make an A-21 microphone.

ACX seems to be pushing the ProTools platform almost exclusively

ACX wants to make their job easier and one way to do that is insist on top quality equipment. Note in the videos, the question isn’t whether you are going to have a sound booth, but which one. Also note the laptop isn’t a noisy Dell. It’s a quiet MacBook.

This peels off the announcers who can’t afford all the top equipment and makes it much more likely that an audiobook submission is going to pass without a lot of hand-holding. Hand-holding is expensive. ACX publishes a list of forbidden effects.

I know of few people who can deliver a performance without some of those tools, but that, too, peels off the lower quality announcers. A noisy studio requires stiff Noise Reduction and stiff Noise Reduction can be heard in the voice. Those fail in ACX’s Human Quality Control. The complaint is “Overprocessing.”

So figure out the microphone identity and forward a test clip. I used to tell people to select a clip from existing work, but I stopped doing that because the Room Tone (silent) section at the front is important and an existing segment doesn’t always tell us what we need to know. You can submit stereo (two blue waves) but stereo cuts off at 10 seconds instead of 20.


That silent portion is harder than you think. Obviously, we can hear you breathing, so hold your breath. Not so obviously, we can also hear you scratching your head, shifting in your seat and shuffling papers. Boost the volume of your silent section temporarily and listen for human noises. The goal is just the room noises (the phrase “Room Tone”) and noises the microphone itself or the computer is making.

Some microphones (again, in the lower rank of quality) make objectionable noises by themselves. We’ll find out if you’re in that camp with your clip. ACX doesn’t use the “legal” (hazardous work environment) definition of noise. They use the definition that simply measures everything that’s not you talking. Anything. In some cases you can fail for noises only bats can hear and cats can feel. Fortunately, those can be detected and dispatched, although those represent steps you must take at each and every recording.


I’m not joshing. Attached the built-in microphone of a 13" MacBook Air sitting on a desk in my quiet third bedroom. The first is the raw clip and the second is gentle noise reduction, careful compression and volume set.
Screen Shot 2016-10-27 at 10.25.26 PM.png
I don’t necessarily recommend this because you don’t have good control of the shoot, I just post it to prove no, you don’t need to spend a million dollars on a microphone to make this work.

But a quiet room goes a very long way.


Dear Koz –
Thanks again for all your concern about this and the help. I finally checked out in full and watched all the tutorials. I’ve got myself set up now with an audient ID 14 for a preamp and hooked that into my DAW. The tutorials for the thing only deal with ProTools, so I’m thinking about going in that direction if I should find out that Audacity cannot work with my new preamp. I also purchased a new microphone now on the way to me, cost me about $230 for a Rode NT 21 or whatever it is with a shockproof mount. I’m also building a little tent structure with audio blankets around my set-up which will be complete next week and then I can start practicing doing the complicated thing I started out to do.
Those ACX tutorials are great, but the learning curve is still a monster, but it is starting to make a little more sense to me.

Dear Koz –

I decided to get rid of my little closet studio. Any movement at all I made in it seemed to magnify tremendously. My tent structure is being set up in my normal computer room where I have more room to move around. It’s an audio technic a mic that I have been using that I picked up for about 50 bucks or so, but I decided on the advice of my brother-in-law (the real Pro at this stuff in my family) and the tutorials on the ACX page to at least start with the best equipment I can afford and go forward from that. I will be submitting that sound file as soon as my mic comes and I get everything hooked up properly.

I am still fiddling with the audient ID 14 which I bought from Sweetwater because of their two-year free warranty and the help and advice I can expect from them in adding or upgrading later on – I am going to need all the help I can get as I move on with this project. All my stuff is fiction – at least for now, and I wanted to have the best presentation it can have to entertain a bored beach population who don’t have the imagination to simply read a book, closing themselves off mentally from a raucous environment and enjoy it the way the author meant for them to. Music and sound effects tastefully and artfully put in place at the right moments can enhance any presentation and cause good word-of-mouth in the sales arena.

Looking forward to your first submission.

I think up the post I pointed to the basic instructions for posting a test.

It turns out over many uses, the most difficult part of shooting that thing is Room Tone or the two second silence. Freeze and hold your breath.

Room Tone can go at the beginning or the end. Doesn’t make any difference. One sister posting to this one made it through the voice part of the test perfectly and then, I joked, it sounded like they were struggling to take their pants off during the silent portion. Rustle, rustle, rumble, rumble, clunk.

That doesn’t work. It has to be the sound the room makes without you moving or breathing.

Music and sound effects tastefully and artfully put in place at the right moments can enhance any presentation and cause good word-of-mouth in the sales arena.

Did ACX say that? Music and effects can interfere with their automated quality control system and cause you to fail when the voice part of the presentation is perfectly OK.