I’m starting to work with a narrator to produce my first novel as an audiobook and have no trouble passing the ACX check test after applying the three effects (Filter Curve, Loudness Normalisation and Limiter) but when I listen through headphones, I’m not convinced by the quality.
I record on my macbook pro and when I listen back it sounds warm and clean but transferring it to my windows laptop there’s a tinniness to it and, post-edit, the sound seems a little loud and blown out to me.

Any advice would be massively welcome! I’ve included the original recording and the post-production version optimised for ACX.

Many thanks in advance.

Just so. I applied simple mastering to the raw clip and it passes ACX Test nicely. However, we are reminded you also have to pass theatrical tests before they will move to publish. You are a performance for sale. That’s the goal.

You sound uncomfortable. Your word spacing is odd and hard to listen to. You’re so uncomfortable that you missed words in the script. That’s not good news for someone who’s supposed to make it through a book with no mistakes.

Under some conditions you can fix that in post production editing, but that kind of thing is supposed to happen once or twice a chapter not several times a sentence.

Which microphone are you using? You have a very minor recording oddity which isn’t that important but it pricked my curiosity. You have non-symmetrical wave peaks if anyone is counting. I know pro announcers which have that, so it’s not deadly.


Thanks Koz.

Luckily, I’m not the narrator who will be recording the book. I just recorded it for the forum. So you don’t have to worry about my delivery and I missed the words because I was reciting from memory so as not to have the computer too close to the mic.

I’m using a Rode NT1-A. Can the irregular wave forms effect the sound quality? Is there anything I should do to address it?

When I listen to the silent moments through headphones on my PC and crank the volume to it’s maximum, it sounds like a rhythmic bass sound, almost like music in the distance, but never above about -48 db. Is that a problem?

Thanks again!

Fortunately, the first step in Audacity Audiobook Mastering is a rumble and bass filter. So all that trash vanishes.

As near as I can tell, you have very well behaved noise and it nicely fades into the background during the reading—like it’s supposed to.

Don’t go Diving For Noise. Set the playback volume for comfortable listening and then roll the show back into a silent section—and don’t touch anything. It should sound almost completely silent or really tiny rain in the trees sound shshshshsh. That’s normal.

Part of the fun is finding a clean listening system you trust. Most home systems are suspect unless you know they’re not trying to “help” you in some way—or they’re not getting help from somewhere else.

I got a PC for production at work and it wouldn’t pass sound quality tests no matter what I did. Turns out someone in Systems left Windows Cathedral Effects running on the speaker system by accident.

Do you like Skype/Zoom/Games/Chat? Those are famous for applying effects, filters, corrections and processing during operation, most of them won’t run without it … and then sometimes forgetting to turn them off when they’re done.

“My voice sounded OK the last week, but now sounds honky and bubbly.” Restart the machine and the problem vanishes. Windows has setups and control panels designed to affect sound quality intentionally both recording and playing back.

So no, this isn’t easy.

I know this isn’t helping, but you know you’re OK if you can get two different systems to more or less match. This is one of the reasons Hollywood has standard headphones for field shoots. Everybody is listening to the same thing. I have trouble listening to them for long shoots, but they’re convenient and they will show you errors before anybody else hears them. That’s why the sound people fell in love.

Sony MDR 7506

Screen Shot 2020-08-24 at 17.42.29.png
Go somewhere else for headphones to listen to a movie for enjoyment.

Most earbuds and earphones need not apply. Their quality can change depending on how you put them in.


Thanks so much, Koz. That’s pretty reasurring to be honest.

I’m not too worried about the silence now, I’ve got it sounding really good. I’m more concerned about the recorded speech. I’m finding a lot of clicks in my narrator’s voice and I’m getting a bit paranoid (this is my first audiobook, could you guess?) and ending up spending hours searching them out. I’m attaching a part that I think is particularly bad with clicks on the words “desk” “Gaze” and “neatly” and a slight buzz at the end of the word “Bob.”

Am I being too much of a perfectionist or should I be searching out such sections and re-recording them? My narrator is sipping water regularly to avoid mouth noise but these clicks and artefacts keep cropping up.

I can’t buy the Sony headphones right now but I’m borrowing some AKG K240 from a friend - would they give a better idea of the real sound quality? I know they’re recommended by ACX along with the Sonys you suggested. I’ve been using AKG Y50BT until now - plugged in over a standard audio cable, not bluetooth.

Many thanks again!

Fellow NT1A user here and jobbing v/o, piling in with some (non-audiobook specific) thoughts because I don’t get out much lately.

I’m assuming your narrator is using your set-up. If not and she’s just sending you the wavs, some of this may be moot.

Generally, I think you’re recording with too little gain. Currently it’s peaking at -17dB and you’d be better off aiming for a peak around -10 to -6. From your initial post I’m guessing it’s because you find it a bit harsh-sounding when turned up, in which case I sympathise. The NT1A is fantastic for the money, with very low self-noise, but it’s not the mellowest of mics and can sound aggressive in the higher frequencies. Recording with more gain will give more distance between the narrator’s volume and that of the background noise, making it easier to remove the latter if necessary without affecting the voice, plus larger waveforms make it easier to identify those pops and clicks and esses.

The ‘buzz’ at the end of Bob from .7-.9sec is a pop from a slightly overenthusiastic second b. You can quick-fix it by zooming in, highlighting from the second b to the end of the pop then applying the Fade Out effect and repeating with ctrl+R (cmd+R on Mac) until vanquished to your satisfaction.

I’m only hearing the odd mouth click tbh but if they’re really doing your head in, perhaps try Paul L’s DeClicker plugin, which has saved a great many of my takes. It’s miraculous but takes a while to do its thing and may not be suited to long audiobook recordings - prob best to apply it to a couple of minutes at a time if you’re going to use it. It’s very much not part of the Audiobook Mastering template though.

That’s really crisp. It makes my ears hurt. The damage is hard to see in most views, but not Analyze > Plot Spectrum.

Musical tones are low pitch on the left and high pitch on the right similar to a piano. Loud is up. There is no Time, so all of these tones exist in whichever chunk of the performance you selected.

That little haystack on the right is not normal. That’s the ice-pick in the ear sounds at all the “S” letters.

There is a de-esser tool, but I’m not very good at applying it, and if I miss the settings, it just turns Essing into some other damage.

The good news is the basic performance passes ACX technical conformance, but could still use just the most gentle noise reduction.

Now we just have to get rid of the nuclear SS’s.

As we go.


Fellow NT1A user here

Hello fellow user. Do you have overly crisp delivery on your NT1A? Did you find a method of getting rid of it, either in software or hardware/recording technique? Say a woolen sock over the microphone? I know that doesn’t work. We tried that. It just makes everything muffled.


Thanks very much to you both.

Yes I was uncertain about gain and partly for the reasons suggested, I left it lower. I also heard on a Youtube video that the highest peaks should be reaching -6 dbs so I was wary of going higher - but is it okay to go higher on exception? I have one very shouty character, is it okay for that one to be hitting 0 or above if the standard waves are not higher than -6?

On anything new we record, I’ll push it higher but we’re at least 1/3rd of the way into the book now so I guess I’ll have to deal with what I’ve got. I’ve largely sorted the clicks through retakes and the plugin (I found it just after I posted last time so that saved me some sleep!) They’re not as bad as they were.

As Koz mentioned, I tried a sock over the mic but it was very muffly and we wasted a few hours recording as a result.

For the SS sound, I’ll try the de-esser. I think we’ve got a clean sound now apart from that and I’m about to travel abroad so I only have 10 days to finish the project so starting again from scratch isn’t an option. Any other tips would be greatly received.

I have one very shouty character, is it okay for that one to be hitting 0 or above

No. Not really. You get a shouty character through acting, not volume changes. Actors back away from the microphone and get louder so their voices have stress without actually getting louder. If the raw recording goes over 0dB, the system will start to make up new sounds like ticking and popping. That’s overload distortion and it’s usually permanent.

This is the approximate goal for recording.

If you record a lot lower volume than that, the system noise (ffffff) will create problems. If you go over, any loud error will make clicking and popping sounds. You are the recording engineer in spite of everyone insisting you don’t need one.

I only have 10 days to finish the project so starting again from scratch isn’t an option.

Are you posting to ACX/Audible?

You have to pass technical testing similar to the Audacity ACX Check, and you also have to pass Human Quality Control where they listen for distortions and noises and the ability to read well out loud. You need to pass everything.

As of The Sickness, ACX stopped offering their “Audition” service where they will evaluate a short reading instead of making you submit the whole book. They also posted two restrictions on what kind of work they will accept. I have to be able to buy your book on Amazon—now—and you can’t be reading reference works or something in the public domain. Scroll down.

Good luck.


Yeah, with Audible. I have the book published and it’s fiction so that’s not a problem. I did look at at all the requirements before beginning.

For the last five chapters, I’ll record at a higher level obviously, but I have to at least have the recording finished by the 17th, which essentially means editing it to an almost perfect standard to be able to find the errors that need re-recording before I leave.

Is the S problem so pronounced that the book would be rejected by ACX? I can try the De-esser but I know nothing about the settings to be able to change them.

I genuinely appreciate your help. I wish I’d known all this a few weeks ago, but it’s a learning experience at least :slight_smile:

I’ll record at a higher level obviously

Maybe not so obviously. ACX puts great stress on having your chapters match. Or more accurately, the chapter beginnings and ends have to match, the chapters have to match each other, and the beginning and end of the book has to match. This just kills first time performers who start a book a rank amateur and finish a seasoned professional. And no, the two ends of the book do not match.

What’s supposed to happen is you let us evaluate typical works and tests under conditions you will be using. After we stomp out the problems, then you record the book. Not pause in the middle for evaluation. Roughly half of sound problems can’t be fixed in post-production processing or filtering.

I know nothing about the settings to be able to change them.

Exactly my problem which is why I wrote a note to the person who wrote the tool asking for help.

Is the S problem so pronounced that the book would be rejected by ACX?

I don’t know. I’m not a fan of the “ice pick in the ear” sound. If you read between the lines, ACX has dumped much of their pre-publication quality evaluation on associated blogs and forums (cough Audacity cough) and leaving it up to us to deal with this. All we know is what has worked in the past.

There are some rules. ACX hates distractions. Anything that pulls the listener out of the story is to be avoided. Remembering, of course, that the listener is paying for this.

The fuzzy goal is telling me a fascinating story in real life over cups of tea. That’s why they hate voice processing, “cell phone sound,” noise pumping, etc. etc. There is still an instructional video posted on the ACX site about using a full-on, glassed-in sound booth in your apartment to record books. Contrast that with the microphone makers who insist you can buy their microphone and crank out audiobooks from the kitchen table.

If you’re the reader, you’re playing three people. The performer, the recording engineer, and the Producer. It’s the Producer who gets to decide to put this off until they get back from their trip.


Try my user-friendly version … Updated De-Clicker and new De-esser for speech - #199 by Trebor

Thanks, Koz.

So not so obviously after all as you say.

I’ll keep going and cross my fingers then.

If I could put it off, I would. The issue is that I’m in Britain but live in Spain and have to head back soon. My narrator lives here and is great at the voice but really needs direction for the delivery so, once I leave, it will be a lot more difficult to keep recording and I have no idea when I’ll be back so it has to be now.

I marvel at the jet-set continent-hoppers. My idea of an extended journey is the Tesco Express.

I did hear back from the DeEsser developer and he gave me some clues and hints, but I have to try it for real, and then try Trebor’s version. So you’re on your own until the experiences roll in.


If I could put it off, I would.

While you’re doing that are you exporting WAV copies of all the original, raw recordings? We can’t take effects out of a show and you can’t recut MP3. So once you mess with it, that’s full stop.


Yes, I’m keeping all my files as raw as possible and have backups of everything too.

I haven’t applied any of the 3 ACX effects or noise correction, de-click, de-esser yet except in reversible tests. Once I’ve got the book edited how I want it and the files are ready to be saved as mp3, I’ll do all those final steps.

I’m British, my wife is French and we live in Spain so our daughter is the truly international one! We’re just her chauffeurs.

I really appreciate all the help you’ve given me. I wish there was a way to know how serious this essing problem is but I will have to press on as I don’t know when I’ll be back in the UK given the current situation.

Thanks to Trebor too for suggesting the simpler plugin. I’ll give it a go!

I think we hit one.

I mastered Bob and then applied the desibilator at these settings.

Screen Shot 2020-09-06 at 11.53.00.png
And all the nuclear SS sounds settled down. Marks in the Trebor column.

Before and After. If you wear good headphones, this may be more obvious.

Analyze > Plot Spectrum settled down, too.


Where were we.

Oh, right, the audiobook. After mastering and desibilating, the Bob clip passes all three ACX sound specifications. It doesn’t need Noise Reduction. The ACX noise limit is -60dB and the fuzzy rule for a show is to hit -65dB. Your clip is at -64.5dB and has well-behaved gentle shshshshsh rain in the trees noise, so I’m calling it good.

The voice seems theatrically to be OK. I could listen to a story in that voice. So we’re done. I’m getting a beer. Los Angeles is in a heat wave and it’s 93, sorry, 34 out there.

I’m going to hide inside.


Note the blob just past 9000 is much lower on the right (after) without affecting anything else.

Screen Shot 2020-09-06 at 12.37.28.png
That’s the Essign going away. You would think you could do that with the equalizer tools, but I couldn’t and I did try. The equalizer correction turns everything to mud. There’s something magic about the way microphones create that crispness.


Not really. Maybe I’m naturally soggy and the crispness just evens things out, or perhaps the mic is just better suited to low voices. I’ve given it Røde’s expensive foam hat but obvs that has no effect on frequency response.

Here’s the NT1A’s frequency response, which goes some way to explain the crispness:
I’ve used the De-esser when editing gf’s stuff recorded on the same mic and find these settings treat her esses just enough (threshold dependent on RMS):
Alternatively, a filter curve with a sharp notch around 8kHz seems to take the edge off.