Audible.com QC standards

Averaging the readings by eye which is about as well as you can do with spoken word. Since the original show clearly didn’t come anywhere near close, I’m guessing this is a win.

I think we’re working at this too hard. All these specifications are ranges. “Somewhere between here and here and don’t go much over this and that.” If the specification violations were shooting offenses, nobody would ever get their show accepted. I also bet they expect at least some shows to meet acceptance with little or no filtering or processing. If me mum were to author a reading, she would have no idea how to set any of this stuff.

I bet the “after” show sails straight through acceptance.

Koz

We can speculate - but I’d feel more confident after comparing some examples of what passes and what fails.

I wrote them a note about the problems on the forum and the desirability of their supplying standards testing.
Koz

Here are the results from Steve’s Wave Stats plug-in for Koz’ second radio show:

Nyquist
Length of selection: 972.181 seconds.
42873173 samples at 44100 Hz.
Analysis of first 972.181 seconds:
(42873173 samples) Mono Track.
Peak Level: -2.0 dBFS
Peak Positive: -2.0 dBFS
Peak Negative: -2.0 dBFS
DC offset: 0.0 %
RMS: -22.8 dBFS
RMS (A-weighted): -24.9 dBFS

I’ve modified the plug-in in order to process audio up to one hour (limit is 30 s in the original).
I guess that the show wouldn’t pass the test by a hairs breath because the RMS slips further down if we amplify to meet the -3 dB specification. Steve’s Limiter should do the job.

Intrestingly enough, the exported mp3 (constant, 128 kbps, librivox standard) shows a peak of -2.5 dB after import–instead of -2 dB.
Hence, mp3 conversion does not necessarily decrease the head room (i.e. increase the peak).
The offset seems to be nearly constant:
0 dB > -0.5 dB
-3 dB > -3.5 dB
-20 dB > -20.4 dB
[what a pain, my dogs try both to sit on my lap while I’m writing, They hear a thunder storm in southern Italy or in Sweden, I fancy]

The peak keeps going down with each new mp3 conversion (a thing, one shouldn’t do…)
0>0.5>0.9>1.2>1.8 dB.
The encoding to mp3 is pretty complicated, so I don’t know where the offset comes from. I guess that it may vary from material to material. But the error may nearly be constant for single voice recordings.
In any case, It would be desirable to be able to supply a error or tolerance factor in the analyse plug-in as well, if the target file is an mp3.

One of the things they are looking for with regards the rms level, is how it varies over time. For the end user (what do you call an audio book user? “the reader”? “listener”?) it would be quite irritating if the middle of chapter 3 is quieter than the rest and they have to turn the volume up to hear it clearly, and then turn it down again when “it goes loud again”.

I think that a good approach to this will be: “what might the end user complain about”?
The common complaints that we get regarding sound quality are:

  • Why is my recording quieter than my other MP3s?
  • There is too much hiss?
  • It sounds distorted/fizzing?
  • It sound like it’s underwater / metallic / bubbly / robotic.
  • It’s all echoey, like it’s in a bathroom.
  • How do I even out the levels - the part that I recorded on Wednesday is louder/quieter than the rest.
  • … (add the other ones here)…

Also, because Audible may need to reproduce the audio book in different formats, they do not want to be recalibrating their duplicating equipment for each book, so they are quite strict about peak level. Ideally they want the peak level at -3 dB. So if you take any paragraph from the book, the RMS level will be within the specified range, and the peak level will be -3dB. They are not going to reject a book if everything else is perfect but there are a couple of peak that hit -2 dB, but if the author has taken the time and care to ensure that everything else is right, then there is little reason that there should be one or two peaks that hit -2 dB, other than perhaps an MP3 encoding artifact.

Audible prefers Audio that is RMS-normalized at -20 dB. The peaks should “hover” around -3 dB. As you say, Steve, there may well be peaks that go up to 0 dB. Exacting people can then use a peak-limiter to strictly go for the -3 dB.
As I’ve said before, the mp3 conversion alters the overall level.

  • Librivox standard: 128 kbps: -3 db >> -3.5 dB.
  • Audible standard: 192 kbps: -3 db >> -3.3 dB.

There are several questions that come to my mind:

How is this “hovering around -3 dB” measured?
It seems that the average of several peaks should give this value. From a statistical perspective, extremes (“peaks”) are those values that are 2 standard deviations away from the median (i.e. 0, if we do not have DC). The average value in this range should therefore be -3 dB. Another approach is to take the peaks at different time frames, exclude 0 values and take the average.

How is the RMS normalization done?
It’s easy to calculate the overall RMS for a single track (which is most of the time a whole chapter) but this would give too much freedom; different sections can still vary by a huge amount. Again, we can take several time frames and base the calculations on this. However, those frames should be rather long, in order to preserve dramatic and quiet passages.

I don’t know, Steve, if the people of ACX have you informed about this aspects and how detailed their specs are altogether. It would be nice if the team of Audible or ACX do give you all the necessary input to develop a semi-official plug-in, of which the output is reliable and in compliance with their standard (and that of others too).

Hi Robert, those are the same questions that I have been asking - I now have some answers (not yet complete, but we’re getting there).

To quote the horses mouth:
“What we are looking for is no significantly louder peaks than -3dBFS. Due to our need to be very cautious of dynamic levels we ideally would prefer no peaks go beyond that metric.”


This is a bit more complex. What they are looking at is “blocks of sound”. Thus, if there is continuous sound for 60 seconds, then they will measure the rms of that block. Silence at the start and end of the recording, or significant gaps (silences) in the recording are not included in the measurement. I’m not sure precisely what a “significant gap” would be, but I’d guess that, say, a 2 second pause would be significant.

The way that I was thinking of handling this in an analyze plug-in is to look for sounds (similar to “Sound Finder” but using rms rather than peak level) and then to analyze each block of sound, splitting large blocks into, say 15 second chunks. The output could either be a text file, or perhaps easier for most users, as a label track. Where a block falls outside of the specified range, the label would flag that as a warning. The user could then inspect any flagged regions and make a judgment call on whether it is OK or needs attention.

I think an important aspect here is that it is not just about hitting targets - the more important thing is to provide a good listening experience for the end user. If the recording sounds amateur then the end user will complain and send it back to Amazon and ask for their money back. Audible don’t want that to happen.


I am now in touch with a guy at Audible and he has been very helpful. He is very supportive of us developing plug-ins for this as many of their authors produce their work with Audacity so he recognises the value of this. So as not to loose sight of the purpose, we’re not doing this for Audible - we’re doing this to help Audacity users produce high quality work. Guidelines that we can get from Audible are useful as they are probably the best representation of “industry standard” for audio books.

Thank you Steve, this clarifies a lot.
I’d prefer a integrated long term compressor for this blocks, in addition to the labels, but that’s only a subjective point of view.
You’ll include the RMS/Peak values in the label text, don’t you?
The following pauses seem to be fixed for a Audible audio book:
0.5s at the beginning
2.5 after “Chapter …” and 3.5 s at the end of the track. Isn’t there a recent post about the inclusion of pauses in the export? That’s probably no problem since there is usually ambient noise inbetween speech. But you could include a label where digital silence is unintentionally inserted.
However, the narrator’s/author’s/reviewer’s ears will judge the overall quality in the end. We certainly do not want to eliminate the liveliness from the narrators work.

Too many tools.

The last setting down in Chris’s Compressor is Peak Level and it can be easily set for a slightly lower peaks. I used values between 80 and 100 and the peaks gently drifted up and down as I did it. He specifically recommends that. Peaks shoot upward if they have significant high frequency content or distortion. When MP3 re-arranges the tones and constituents for compression, you can easily be missing the “moderating” tonal values and the peaks go nuts.

The first value, compression ratio is set for overall sound density. Set that for the middle dB values. I use 0.77 routinely to simulate the very serous, stiff compression at the radio station. Again, the goal of the tool was to listen to opera in the car.

I used all the default values for this test and the show came in remarkably close to what we perceive to be the gold standard, starting with a free-wheeling show that was clearly not.

The goal is one-button solution, not custom values of multiple tools for each user – although that may be needed if a performer comes in with a completely broken show.

One thing I don’t know is what happens when Chris encounters a bad show.

Noise is going to be a problem. Home recording is always an adventure.

“DEAR! Can you take fluffy out for her walk? Oh, sorry, did I interrupt you?”

Koz

That’s not my goal. I don’t believe in the tooth fairy either.

My goal is to alert the audio book creator of potential problems in their production. The solutions will vary - it may require a re-take, or just a quick pass of a limiter, or it could even be a false alarm. Hopefully not too many false alarms. I think that an analysis tool that can say “hey, look at this bit… it may be a bit too quiet/loud/noisy/…” would be an invaluable aid for audio book producers.

That’s not my goal.

You may be alone there. Nobody is itching to be provided with a list of strange tools they’ve never seen before and don’t know how to use to solve incomprehensible problems measured with college level techniques.

Nobody drives up to Mt. Wilson to readjust the audio compressors for a specific radio programme.

Some of that may be necessary, but only if the One Stop Shopping fails. If the show does fail, then the individual, custom tools may not bring it back, either. See: Overload and the Four Horses.

Koz

Absolutely.
No-one is proposing that.

That’s because there’s a team of qualified sound engineers and radio producers with years of experience behind the scenes doing that for them prior to broadcast.

The difficult situation for lone audio book authors is that, not only do they need to the voice talent, but also the sound engineer, script editor, producer, and many other roles. Unfortunately for a lot of these roles they probably do not have the education or experience, so have to fuddle through the best that they can. The tool(s) that I’m suggesting are ones that translate the “incomprehensible problems” into simple and understandable suggestions.


If it were possible to create a plug-in that converts any old recording into professional quality recording, don’t you think that ACX and every other audio production company would use it? There would be no need for any of this - like with you Professional Audio Filter™ the audio book author could just record the book on their smart phone on the way to work, send it off, and ACX would run it through the “Fix-Me” plug-in.

To give you the perspective of the narrator let me just say that a lot of the tools I see confuse me a bit. I am trying to finish my 3rd audiobook (7.3 hours long) and I have been asked to start on my 4th. The last step in this 3rd book is to fix the audio as I mentioned earlier in this thread. I have spent many months working on this book only to find at the end ACX has some issues with my audio. During recording I didn’t notice the issues since I do everything myself and can miss thing from time to time. I’m glad ACX has a QA process, I just hope my months of work hasn’t been for nothing.

I can’t speak for all narrators but at least for me I am a voice actor not an audio engineer (seems like there should be a “Damn it Jim!” in there). So simple is better. I don’t mind learning new things but I only have so many hours in a day and most of them are spent voicing a chapter and then editing out all the mouth noises, screw ups, and the occasional frustrated cursing. :laughing:
I think a lot of the narrators out there are similar to me in a sense that this is not our full-time job and we are working on royalties. This means I just spent all theses months putting together an audiobook (that got rejected the first QA session) and I might never see a return on it.

I need to fix my audio with the simplest solution and move on. I’m not asking for a single button solution but I am looking for a solution that matches ACX’s standards. If I had a tool that I could go to to set things up prior to recording that would insure that I was within ACX’s standards that would be great. But for a dummy like me I can’t find a place that would allow me to ensure that my peaks are limited to -3db and my noise floor is between -60db and -50db, or that my RMS measures between -23db and -18db. Am I going to have to go back to school and take audio engineering classes just so I can narrate audiobooks? If that is the case I won’t be able to continue to narrate because working on royalties only pays the bills once in a while. Not all books are smashing successes.

I can’t afford to hire an audio engineer to master my work for me so it would be nice if I could figure this stuff out.
I know it’s hard to read tone in a post but please don’t take this as complaining. I just want people to know that some of us are way less skilled then some of you and we are just trying to keep up with the topic. :smiley:

Thanks Voodoobones. Having spoken to other narrators on this forum I think that the position you describe is common to many narrators.

Thanks, Voodoobones, for your perspective.
Don’t judge the gift by the paper it is wrapped in, however.
I know that it gets sometimes pretty technical when Steve and I exchange our opinions. Nevertheless, it is part of the abstraction and simplification process. Our goal is always the ease of use for any plug-in we write (admittedly, Steve’s programs abide this rule more than mine).
Steve’s approach with labels aimes to:

  • not to treat the whole audio in its entirety. It would be easy to change audio such that the Audible specs were met. This would result in an unnatural feel to the whole production
  • to let the user have the last word/decision over his work. You can decide for yourself, if you apply a noise gate or the noise removal if there’s too much background distraction going on. The same with compression/limiter or leaving it altogether as it is (because the passage needs it).
  • to list only those parts that could cause problems. You won’t have to listen to the whole 7 hours - just tab to the next label/hint/warning

You could even export the label list with your own notes, e.g. why you wanted a particular part louder or quieter as the average -20 dB, the reason may convince the people at Audible and they won’t flatly reject your Audio.

By the way, Librivox has a tool called “Checker” that gives also some hints with regard to their standard. It is mostly concerned with the correct mp3 encoding of the finished, exported audio. Their audio levels are not so constrained - they aim for a peak of 98 % (they write dB, but that’s nonsense).
The system is actually not comparable because the uploaded chapters are reviewed by other narrators. This is in principle the ideal case if the feed back is honest and of a significant amount.

Useful find Robert.
Just to make a note in this thread, it’s available here: http://www.cgjennings.ca/checker/

Yet another Audible narrator in the muck. I’m having the same issues. Get everything sounding great, try to have all the settings as per Audible’s requirements, but failing their QC again and again. I have one book that I know was recorded on a low end microphone, but the QC comes back as, "Problem: Audio has not been mastered to ACX standards. Distortion issues (example Chapter 5) Audio of uneven quality. Please also submit your files in either mono or stereo, not both.

Solution: Please master the files to ACX standards. Your submitted files should measure between -23dB and -18dB RMS, with peaks hovering around -3dB. Your noise floor should fall between -60dB and -50dB.

Please also review your audio for any instances of distortion. Chapter 5 seems to be the most distorted.

Submitted files must be either mono or stereo, not both."

I don’t see any clipping in any of the tracks, and I use the exact settings for exporting each track to mp3. I’ve run normalize and compressor on all tracks, and when listening to the tracks, they appear to be reasonbly close in volume. I’m flummoxed.

My questions are:
What settings should I use when exporting tracks to MP3 to ensure consistency?
Can I safely export all tracks in a batch or must each be done independently?
What settings for nomalizing should be used?
What settings for compressor should be used (to achieve those parameters above)?

I know I am way out of my depth, but any simple rules of thumb would be greatly appreciated.

I also have some tracks that are distorted (not showing any clipping) and Im sure those are due to poor equipment/settings for the original recordings, but I would love to be able to salvage the originals and get them at least within acceptable technical specifications as outlined above.

I welcome any suggestion anyone might have.

thanks

These are very helpful. So what can someone do with a “broken” show? As a narrator I want to focus on the reading itself. I’m not yet to the point of hiring engineers, but I do need a fix, I’m afraid.

What I’d suggest in this case is to start a new topic about your specific issues. ACX have been helpful in giving you some indications about where the problems lie - in particular the indicate there is some distortion in Chapter 5, so perhaps you could post a short sample of that into your new topic. See here for how to post audio sample to the forum: https://forum.audacityteam.org/t/how-to-post-an-audio-sample/29851/1

I presume that you understand the problem there? If not, start a separate topic to ask for clarification in how to achieve that.

Audible want the format to be:
192 Kbps CBR MP3 mono OR stereo but mono is preferred, so let’s stick with mono.
If you’re unsure how to do that, start a new forum topic.

Personally I’d not be wanting to handle 7 hour or more in one go, but other narrators may be in a better position to suggest their preferred work-flow.

Probably -3 dB, but that only sets the peak level. You will probably still need to use some compression and possibly limiting, We’re still working on this part.

Unfortunately I don;t think there is a right answer for this. It depends on what material you are starting with, and of course which compressor you are using. This is another are that we need to work on.