Audible.com QC standards

I am a new audiobook producer, using the ACX.com site, which sets the recording standards for posting audiobooks to Audible.com The standards page is at https://www.acx.com/help/rules-for-audiobook-production/200485520

I get good recording quality on my recording gear, and Audacity is very good for editing and exporting in the formatting required here.

My concern is: is Audacity capable of allowing me to meet the QC mastering (quality control) standards that this page states? And if not, what kind of mastering software would this forum recommend to me?

THANKS VERY MUCH -

Pretty much what we have been recommending people do forever – except for punch-in. Audacity doesn’t do that. One alternative method not mentioned is to set a label with hot keys when you make a fluff and start the sentence over again right then. The label will tell you where to go back later to patch the work. Surgically remove the fluff. Much faster than copy/paste editing.

They even got the editing time down. Three or four times the show length to get the editing done. That burns people all the time.

They hit all the high points except room echo. Generally, sounding like you were recording in a kitchen is to be avoided as are neighborhood noises like dogs barking and running the TV. We can’t fix those in post production. You have to have a quiet room at the start. They even got the WAV/MP3 thing down. Never record directly to MP3.

I’m saving that link. I’ve only seen that list one other place and that was to set up the dual-header broadcast I shot – people interviewing each other from different cities without Skype. We didn’t have to worry about a lot of it because we had a semi-studio in LA, but I can imagine they get a lot of recording newbees.

Koz

Audacity can do everything in that list short of Punch-In. Amplify, Normalize, Compressor, the lot – but in post production. Audacity does not apply effects and filters in real time. You really do have to get the recording right without a lot of digital help.

It may be harder than you think. We’re on chapter three of a poster who wants to do a simple interview recording.

https://forum.audacityteam.org/t/newbie-needs-both-immediate-and-longer-term-help/29486/1

The longest posting on the forum, ever, was someone who wanted to simply record his acoustic guitar.

Koz

Thanks very much for the feedback - I’m a very good narrator, but new at audio mastering standards & terminology. So - once the editing work is done, (this is not difficult for me at all, since I reread phrases one after the other in the first recording, then select the one I like best) - I will Normalize at -3dB, right? But after that is done, how do I check the files to see that these other standards are met?

Here are the specific standards that my question relates to: “Your submitted files should measure between -23dB and -18dB RMS, with peaks hovering around -3dB. Your noise floor should fall between -60dB and -50dB”

I realize that the spoken word is FAR less challenging than multi track music recordings – I very much appreciate your taking the time to respond on this… :slight_smile:

Yes it is, though as Koz notes, their description of “good” recording is detailed and fairly comprehensive, but not “complete”. They mention “room tone” several times, and it is important, but the “type” of room tone is also important - if it sounds like a bathroom or corridor, then (unless that is appropriate for the book, which usually it is not) you will need to do something about the recording location, such as putting up sound absorbing materials around the space or using a “vocal shield” or “vocal booth”. This is a common problem with home recording.

Some of the technical detail is a little bit “out”, for example:
“There should be exactly 500ms (0.5 seconds) at the head of each file”
The start time of an MP3 file is not “exact” due to limitations of the format - what they mean here is “There should be very close to 500ms (0.5 seconds) at the head of each file”.
Also:
“Your submitted files should measure between -23dB and -18dB RMS”
This specification is incomplete as they do not state the RMS “weighting” or “window size”. In practice the exact definition probably does not matter too much - I assume that they are just trying to give an indication that the recording should be “reasonably” loud without pushing the loudness too much - the compression / limiting steps that they describe are likely to be close enough to the right ball park figure. Comparing your recording with some of their published recordings will immediately show you if your recordings are too loud or too quiet. There are also some plug-ins that can give you an RMS and peak amplitude figure for your recording such as the “Stats” plug-in available here: "Wave Stats" plug-in

Do you suppose their “noise floor” is taken with the microphone turned off? Room Tone rarely gets to -60 even in some studios.

If you’re an experienced presenter then you know all about soundproof studios, etc. It’s not impossible to do at home. I know people who crawl into their closet with quilts on the wall to avoid reflections and echoes.

A few furniture moving pads might be in order.

http://www.kozco.com/pictures/boothFinished/laptop-mic.jpg

The microphone on the right has a reflection suppressor behind it that may be of help.

http://www.kozco.com/tech/audacity/pix/JMASoundShoot.jpg

We were speaking with one presenter who admitted he got his very good environment suppression and clarity by waiting for trucks and buses to pass his house completely before recording.

Koz

You might also consider Chris’ Compressor.

http://theaudacitytopodcast.com/chriss-dynamic-compressor-plugin-for-audacity/

Chris designed this all-in-one compressor so he could listen to opera in the car without constantly turning the volume up and down. It evens out volume variations remarkably well during speaking performances. I download a voice radio show and since it doesn’t go through the radio station compressors, is all but unlistenable in the car. “Hi, Today______ ___ HA! HA! HA!! ____ ___ mumble -__ .”

As I like to put it, Christ started out with cadenzas and arpeggios and not millisecond attack and release times. His work is very pleasing and musical even if it may not be statistically perfectly accurate.

Koz

HAHAHAHA!! Your post about the fellow listening to opera is awesome - I was a vocal major at Univ. of Houston years ago, and totally get what your saying here! :slight_smile:

EXTREMELY helpful responses to my post - thank you very much.

There are some tutorials on the web site that deal especially with the gear and room Setup.
They did not Forget this part.
To have an Impression about the noise floor, you can generate some pink noise with an Amplitude of about 0.007 and see the rms values with this snippet for the nyquist prompt (press debug):

(snd-display (linear-to-db (rms s)))

Although the RMS matches perfectly between -50 and -60 dB, the noise is rather loud, so I think that the Peak value is meant.
An Amplitude of 0.003 for the pink noise should be nearer the truth.

I’m getting ready to record my first audiobook and I must admit I am still flummoxed by this topic.

To review, here is the ACX guideline for mastering:

Mastering:

Your submitted files should measure between -23dB and -18dB RMS, with peaks hovering around -3dB. Your noise floor should fall between -60dB and -50dB.
To make the audiobook levels louder and more-even throughout is vital. Typically, this process is achieved by RMS normalization around -20db, or compression/limiting. Compression should be applied with a fast attack and release, around a ratio of 3:1. A hard limiter may also be used, and audiobooks are EQ’d during this time, to sweeten the sound and make it more pleasing to the ear. Often, muddled low end and mid-range is cut to make the audiobook sound more clear and smooth.

When I got to Audacity > Effect > Compressor, I see the following slider bars:

Threshold
Noise Floor
Ratio
Attack Time
Decay Time

Plus there are two check boxes:

Make-up gain for 0dB after compressing
Compress based on Peaks

Can anyone help me determine what the sliders should be set at and which boxes should be ticked in order to meet the ACX Mastering Guidelines above? I’ve tried contacting ACX at their general email address but have received no response as of yet.

I greatly appreciate any help offered. I can’t believe I’m the only person baffled by this.

Thanks in advance,

Kirk

Kirk,
It’s a good thing you are asking about this now. I just completed my 3rd audiobook and for the first time I got nailed on my levels. Here is what they said:

Problem: Files have not been mastered to ACX standards. Audio peaks at 0dB in places, causing distortion.

Level varies widely in this title, especially when narration is done in character. Please make sure all files measure between -23dB and -18dB RMS, with peaks hovering around -3dB. Your noise floor should fall between -60dB and -50dB.

I now have to figure out how to go back through 7.3 hours of audio and see if I can fix it.
I feel like crap now.

Let me know if you have any ideas.
Thanks
Jim

I can assure you that you are not.
Sadly their “specification” is incomplete so it is difficult for us to advise on more the general guidelines. I have written to them again asking for clarification on their specification so that we are able to provide definitive tests to check that your work meets their specifications.

That first point is simple enough to check, but if the audio has audible distortion then it will need to be repaired by rerecording the damaged sections and editing the new takes into your production.

During recording we advise that you allow 6 dB of “headroom” so as to avoid “clipping” (distortion) Audacity Manual For “animated / lively” recording it is often advisable to allow a little more headroom than this. Once clipping occurs and is audible it is usually too far gone to be able to correct it effectively. Provided that no clipping occurs during the recording process, and that you are working in the default “32 bit float” format, then you can ensure that the final peak level is below 0 dB by “Mix and Rendering” your project to a single (mono or stereo) track, and then Normalizing the entire mix.
See these links for additional details:
Audacity Manual
Audacity Manual

Note that their specification states “with peaks hovering around -3dB”. Also, that the peak level after encoding to MP3 format is likely to be a little higher than the peak level in Audacity (MP3 format is imprecise). If you Normalize to -3.0 dB then the actual peak level is likely to be a little over -3 dB so I’d suggest Normalizing to a little below (a more negative number) than -3 dB.

I’m not sure exactly what they mean by “make sure all files measure between -23dB and -18dB RMS” as they do not specify whether that is a “weighted” measurement or what the “window size” is. All I can suggest is that you aim to make the level sound “reasonably even” throughout the recording (though I presume that there is still scope for “dramatic effect” when appropriate).

Steve wrote:

"That first point is simple enough to check, but if the audio has audible distortion then it will need to be repaired by rerecording the damaged sections and editing the new takes into your production.

During recording we advise that you allow 6 dB of “headroom” so as to avoid “clipping” (distortion) > Audacity Manual > … tml#olevel For “animated / lively” recording it is often advisable to allow a little more headroom than this. Once clipping occurs and is audible it is usually too far gone to be able to correct it effectively. Provided that no clipping occurs during the recording process, and that you are working in the default “32 bit float” format, then you can ensure that the final peak level is below 0 dB by “Mix and Rendering” your project to a single (mono or stereo) track, and then Normalizing the entire mix.
See these links for additional details:
Audacity Manual
Audacity Manual

Note that their specification states “with peaks hovering around -3dB”. Also, that the peak level after encoding to MP3 format is likely to be a little higher than the peak level in Audacity (MP3 format is imprecise). If you Normalize to -3.0 dB then the actual peak level is likely to be a little over -3 dB so I’d suggest Normalizing to a little below (a more negative number) than -3 dB.

I’m not sure exactly what they mean by “make sure all files measure between -23dB and -18dB RMS” as they do not specify whether that is a “weighted” measurement or what the “window size” is. All I can suggest is that you aim to make the level sound “reasonably even” throughout the recording (though I presume that there is still scope for “dramatic effect” when appropriate)."

Thank you Steve for the excellent response.
I will see how bad the clipping is and see if I can fix those spots. I think the Envelope tool might help. I’ll try it at least. I’m also trying to get my Focusrite Scarlett Plugins to work. The license won’t install so I don’t know if I will be able to get them to help out with cleaning it up.
One thing that I will use on my next audio book is the option in preferences for sound activated recording to see if that helps the “noise floor should fall between -60dB and -50dB.” issue. Although that might mess with the timing of the narration so I will have to experiment with it.
I also just noticed under View when you have your audio file open you can select Show Clipping which is a good way to look for problems before you call your work finished. So I will give that a try as well.
Thanks again for the tips and info. Keep them coming!
Jim

You may be in a perfect position to help me :smiley:

I’ve just had a reply back from one of the technical guys at Audible.com with some technical information about how they test. What I’m hoping to do (over a period of time) is to make some analysis plug-ins for Audacity that will help flag up potential problems (such as excessive noise floor, peak levels, rms, and so on). In themselves they will not improve your recording or recording technique, but they will hopefully help you audio book guys to spot problems before sending your work off for publishing.

The three main things that I’ll need for this are:

  1. Time to do it - I’m afraid you can’t help with that one :wink:
  2. Test material that has passed Audible’s quality control - with 2 published works you can definitely help here - and anyone else that has had work published by Audible.
  3. People to test the plug-ins - again you, and any one else that are creating audio books are the perfect people to test.

I’m currently waiting for additional technical information from Audible, and Audacity 2.0.5 is due to be released quite soon, but I hope to get started on this soon after Audacity 2.0.5 is released, so for anyone interested in testing or otherwise helping with this, please reply now and get your name down.

Thanks
Steve

Steve, I would be happy to help. What specifically do you need from me? Legally I don’t think I can hand you over the entire book but I might be able to do a couple of random chapters from each one.
Thanks,
Jim

Excellent :slight_smile:

I certainly won’t need the entire book, and probably not even whole chapters. Mostly it’ll just be (for example) the first couple of minutes, or the last couple of minutes, or just some random sections - perhaps a short extract that you think is particularly well recorded, or particularly badly recorded.

The other thing will be for you to try out the tools and make suggestions based on your experience - I’ve done lots of recording but not audio books, which are a bit of a specialist field, so to some extent you will be guiding me.

I’m getting ready to record my first audiobook and I must admit I am still flummoxed by this topic.

Measuring sound is almost as black an art as anything else having to do with sound.

I think, I’m afraid, that everyone is going to want us to generate a button that you push that makes your work ACS Compliant, no matter in what condition it started. The only people who do a similar job on a regular basis are the broadcast folks – and even they get burned on occasion.

Measuring it can tell you where the work deviates from standard, but as you found, there’s absolutely no conversion from that to compression ratios, response knee shape and release times. Further, if you set them wrong, you can seriously affect the quality of your show.

As we go.

Koz

Quite right Koz.
Even if the recording meets all of the technical specifications, that will only get it passed Audible’s automated test system - after that, someone will listen to it and if sounds bad it will still be rejected even though it meets all the measurements. With the right audio processing, any old rubbish can be made to fit the specs, but as my Grandmother used to say, “you can’t make a silk purse from a pig’s ear”.

As I think I said before, testing against Audible’s QC standards will not magically fix a bad recording, but can hopefully indicate areas that need attention.

OK. I got it.

This is a hefty, unprocessed, 16 minute download of a spoken word radio show.

http://www.kozco.com/tech/audacity/clips/1339Before.flac

The levels are low and the expression is all over the map.

This is the same thing after Chris’s Compressor.

http://www.kozco.com/tech/audacity/clips/1339After.flac

The peak average is -9 to -24, the overall RMS value is -24 to -34, the noise floor is 50 or below and the absolute peaks are -3 or at worst -2.8, but you have to really look for those. -3s are fairly common.

I used the default Chris values except I think I used 1.00 instead of 0.99 for the peak value (attached). I don’t think it makes the slightest difference.

Can we go home now?

Koz
Screen shot 2013-09-30 at 3.47.18 PM.png

Where do those numbers come from?

Does that meet Audible’s “Mastering Guidelines”?