Something else that confounds me ...

I’ve just downloaded your clip, and had a listen. It sounds fine to me. Voice quality is good. Sound floor is good.

ACXCheck shows me this on the unprocessed clip.

I applied Normalise at -3.2, and it’s now this. Passed…

If you are not happy with your ‘voice’ - it could just be that you are being too hard on yourself. Your voice will never sound to you, like it sounds to other people. To my Australian/English ears, your voice sounds fine. Its a voice I could listen to for a duration of time, which is what you are aiming for.


@DL Voice, I downloaded your clip and it failed. Please keep in mind that all of the ACX stuff is very new to me and I only got into it, after reading your post. I made a 6 min video showing how I corrected your problem and how I put my setting, to make my audio pass. It is really 3, 2 min vids rapped into 1. You fix, was very easy to do. Let me know when you view it, and I will remove it from my YT channel and edit this out of this post. It is my business channel and I try to keep it as clean as I can.

@rachalmers, I am doing the same as you. I have run close to 40 test so far with different mics and this is what I have found. My Shure SM 57 (Dynamic) does a much better job them my AKG P120 (Condenser). My AT Pro-70 Lav mic, does a great job but I have to add about 4dbs on the 80Hz EQ. Now this is what has me baffled. My noise floor is around -84dbs on all the mics and my input level is between a solid -6 and -12. I can record all the mics and the only adjustment I have to make to be ACX compliant is "Amplify to a -3.1 and everything passes.

Now here is the problem, when I try to use my external compressor, set to a 2:1 or 4:1, using all the numbers I just mentioned, I can not pass the ACX check. If I move the numbers for one to pass, then the other fails and vise versa. So, even though I don’t know why, it does seem to make a difference on how you master your audio. I have sent an e-mail to a friend of mine who is a mixing engineer to see if he can explain it to me. I hate not knowing! :confused:

I’m wondering why you are using a compressor at all for audiobook production? The wiki description of compression is pretty good, and this extract from it would indicate that compression changes the way a piece of work sounds - ultimately.

Audio compression reduces loud sounds which are above a certain threshold while quiet sounds remain unaffected. In the 2000s compressors became available in audio software for recording. The dedicated electronic hardware unit or audio software used to apply compression is called a compressor. In recorded and live music, compression parameters may be adjusted to change the way the effect sounds. Compression and limiting are identical in process but different in degree and perceived effect.

@DL Voice’s unedited sample, while very short, only fails the ACX check for RMS level, and then not by much. Normalise at -3.2 fixed it? So it’s pretty close to begin with. I’d be happy with that :slight_smile:
If you do a Contrast analysis of the clip, it shows (unedited) -26.1, and after applying Normalise-3.2 it measures 22.8 - both for the Foreground only. So even as a quick check, if that number is -23db or better … you will be spot on. Then have a look at the ACXCheck, it should read the same for RMS.

That is very good for a raw audio sample. Congratulations :slight_smile:
As several have noted, it will pass the ACX specification just be amplifying it a little, but Audacity can improve on that further.

The “ACX check” that keeps being mentioned is the automated tests that ACX perform before bothering to listen to a submission. Once you’re through that it goes to a human :astonished: who will listen critically to the recording. What they are looking for at the end of the day is an enjoyable listening experience - the last thing that they want is for purchasers of the book asking for their money back.

So now we need to look and listen critically to the recording.

The first flaw that I noticed is that the track start below the centre line and then gradually drifts up toward the centre line. This is due to the hardware (mic/pre-amp) but it is easily dealt with. An associated “problem” (not really a “problem” because it is so minor) is that there is a bit of DC offset. Both of these can be improved by applying a “high pass filter” with a low filter frequency. So that the filter does not adversely affect your voice, the filter frequency should be below the lowest toned of your voice.

Looking at “Plot Spectrum” (Audacity Manual) we can see that your voice goes down to around 125 Hz. I have changed some of the Plot Spectrum settings to show this more clearly:
window-Frequency Analysis-000.png
The filter frequency must be below this. I generally use the Equalization effect (Audacity Manual) for this. There’s a “LF roll-off for speech” preset available that works pretty well - it gives a controlled amount of reduction below 100 Hz.
LF_rolloff_for_speech.xml (299 Bytes)
See the “Equalization” page in the manual for instruction of how to import this setting.
After applying that effect there may still be a short glitch at the very start of the recording, but normally you would leave a good amount of silence at the start of the recording so that you can easily trim a bit off. (Also, ACX specify that they want so many seconds of silence (“room tone”) at the start of each chapter - I don’t recall the exact figure, but it’s in their documentation).

Next thing, on listening I notices a slight whistling throughout the recording. This can be seen in the track spectrogram view (Audacity Manual). The whistling shows up as a blue line at 8000 Hz. The picture below also shows the spectrogram settings that I’m using.

To fix this whistle I again use the Equalization effect. you could use the “Notch Filter” effect, but careful settings in the Equalization effect will do less damage to the remaining audio.
This is the 8000 Hz notch filter preset for the Equalization effect
8knotch.xml (169 Bytes)
and the is the spectrogram after applying the filter. You will notice that there is a slight lightening of the waveform at 8kHz (8000 Hz) - that’s the “damage” but it’s not audible, More importantly, note that the blue line has disappeared.

more to follow…

Back again… :wink:

We have now dealt with basic equalization and DC offset, so this is the stage at which you would do any editing (cut / paste / delete …)
If there is any DC offset, that should always be dealt with before editing otherwise it can cause clicks at edit points.

Noise Reduction. Because the background noise was so low to start with, and we’ve dealt with the whistle and low hum (there was a tiny bit of hum at around 70 Hz, but the low cut filter sorted that out), we hardly need to do any noise reduction. If we do noise reduction, then it is essential that the noise profile contains only the background noise. There’s not much of that in this sample, but there is a little between 3.4 to 3.5 seconds and that is just about enough for the current version of Noise Reduction. The settings I used were 6, 6, 3.

so what’s next - ah yes “Amazon ALWAYS…”
Definitely good to have dynamics and expression in your voice, otherwise it would quickly become very tedious, but the word “ALWAYS” peaks at almost double the amplitude of most of the other words. Just a bit too much in my opinion and could do with a little toning down in post-production.

Evening out the levels is called “dynamic range compression”, or more commonly just “compression”.
There are a number of different tools for performing compression. Audacity includes 2: the “Compressor” effect ( and the “Limiter” effect ( The Compressor effect is good for making “broad” adjustments to slow moving changes in dynamics, but that is not what we want in this case. The Limiter is good for controlling peaks, which is closer to what we want, but can be quite a “hard / harsh” effect whereas we need to use only a very gentle effect. Fortunately the Limiter effect included with Audacity is capable of very subtle limiting, and that’s what I would recommend here.

Note that when using any type of dynamic range compression, the RMS level will increase relative to the peak level. That is, if we have two copies of the same track, one which has been compressed and one that hasn’t, and we normalize (amplify) the tracks so that they both have exactly the same peak level, then the compressed track will be louder (higher RMS). Before using dynamic range compression, one should consider whether increasing the overall loudness is desirable. In this case, we know that if we normalize the uncompressed track to -3 dB, then the RMS level will be about -22.6 dB, which is toward the bottom of the range specified by ACX (-23 to -18 dB), so yes we can afford to make the overall level a little louder.

Compression / Limiter effects are generally easiest to use if the peak level is 0 dB. We can then see more easily the relative levels of different peaks. To do that, just amplify with the default settings (brings the peak level up to 0.0dB).
After a bit of experimenting, these are the settings that I chose:
Finally: ACX check, and listen to the result.
window-ACX Check-000.png

There’s a bit of noise right at the start, but for a real recording you would start with a bigger silent “lead in” and could just trim off (delete) the very beginning. There are a few minor clicks, such as a small “tick” at 3.544, but considering that this has not been edited at all I think with a little editing, the overall quality would sail through ACX “human” quality checks.

The only reason that the “noise floor” has failed in the above is because there isn’t a sufficiently long “noise only” gap for the version of ACX check that I’m using. The actual noise floor is below -70 dB.

All you need to do now is to achieve this quality consistently (easier said than done :wink:) and ensure that you keep to their guidelines about format, length of silences, and so on.

By the way, good delivery :slight_smile:

I am very new to this audio book thing. When I podcast, I always use compression. When I do phone interviews that run through my system, it really helps bring up the other participants in a very good way, plus it adds just a hair of “thickness” to the voice. I do all of this pre-production so when I get ready to mix and master, there is very little that needs to be done in post production. As far as the Audio Book and ACX checks, I have about three days experience. This may sound silly, but one of my learning methods is to find out what does not work, to figure out what does work.

I learned this long ago with audio and it has just stuck with me. I now know not to use compression for the Audio Books and it is kinda like trying a piece of software in beta, to see what makes it break, so it can be fixed. I now know that by applying compression, the RMS -18 to -23 (I think) can never be reached at the same time that the peak level is at a -3.1, (I Think). Way to many numbers and now my head hurts! :stuck_out_tongue:

Here’s another sample in which I quickly edited out some of the clicks and noises.

and in ACX compliant 192 kbps CBR MP3:

@ steve Senior Forum Staff, your last two post on this subject is tremendous to say the very least! Very seldom to I add things to my “To Do” list, but I have added 2 hours this weekend to re-read your post and implement your process into my workflow. I am not saying that I will know everything you said, but I will take a two hour crash course and then work on it throughout the coming weeks. I just wanted to say “Thank You”. :smiley:

@Steve Brilliant. Just what I need too. Thank you for taking the time to go into that detail. Every bit I learn helps me on this most interesting journey.


Fortunately, the noise floor is fine - my hair pulling moments are coming with the RMS number. Sometimes when I normalize at -3.2 it passes, sometimes it doesn’t. I can’t get it to be consistent, which is very frustrating.

If you are not happy with your ‘voice’ - it could just be that you are being too hard on yourself. Your voice will never sound to you, like it sounds to other people. To my Australian/English ears, your voice sounds fine. Its a voice I could listen to for a duration of time, which is what you are aiming for.


Thanks so much for saying that, Robert. It’s true that I don’t like the sound of my own voice too much, but this is currently not my headache. It’s the darn numbers on the ACX check. lol

You figured that out way ahead of me. I was compressing and normalizing at -3.1 or -3.2 and so happy because nearly all the time my files would pass ACX when I did that. But I didn’t like the side effect of how it sounded right before I would speak - and how the room “silence” noise would be amplified to an annoying degree. So I just figured out that compression isn’t the friend I thought it was for ACX.

I use the Audio Technica ATR2100 (it’s a dynamic USB/XLR mic) hooked into my Scarlett Solo. I also recently purchased the Audio Technica AT2035 (condenser mic). I’ve tried both. It’s definitely easier to pass ACX Check (after normalizing at -3.2) when I use the AT2035 (even then it’s not always consistent). But it’s a totally different mic than what I started out this project with and I don’t want to switch mid stream. I’d like to finish at least this project with the ATR2100, since I’m more familiar with it and how my voice sounds on it. At this point I’m considering a return to just a straight up USB connection with the ATR2100 to see if THAT makes any difference with consistency.

So I have tracks that are running about 20 to 30 minutes each at the moment. Voice only. So far I only have to run Normalise, usually around -3.2 to get the ACX-check pass. I don’t use noise reduction, as I’ve spent weeks and weeks test my recording environment, and trying to achieve optimal - minimal - sound floor readings. When I’m good to go, it’s usually at around -75dB to -60dB
My recording voice shows peaks when I’m speaking up to about -6dB, but usually more like -11db. Normalise then brings that up a bit.

interesting stuff.

It is interesting indeed - but also quite frustrating. ha!

Something else that was interesting and frustrating was when I started applying ACX’s specific recommendations for mastering (high-pass and low-pass filter, normalize at -6, limiter), my tracks would pass ACX with flying colors, but they sounded horrible - like a heavy cloth was over my mouth when I talked. Crazy.

Sometimes when I normalize at -3.2 it passes, sometimes it doesn’t. I can’t get it to be consistent, which is very frustrating.

@DL Keep in mind that ever single recording will have a different mix of input. So although I say I use -3.2 to Normalise, it’s never exactly that. I’ve had it out to -7.5 and up to 0!!!
But I now rarely use anything else but Normalise, and then Limiter, which I always leave at -3.2
Steve’s excellent tutorial is good, but lets say thats the next step.
just remember, you will always need to adjust that Normslise number.

I’m getting ready to leave the house but I’m taking my computer with me. When I get to a place where I can sit down and watch the whole thing, I’ll let you know. I was able to get that clip to pass using normalization as well, but my frustration is that I can’t always get them to pass. I just saw rachalmers’ comment about that and how you have to play around with the normalize numbers sometimes, so that’s something I need to consider now.

I just started playing around with limiter last night when I was using ACX’s recommended mastering chain, but I’m not familiar with it at all. That’s probably why the resulting audio sounded so horrible. I’m thinking that my situation probably doesn’t require all the mastering steps that ACX recommends?

@Dl The only thing I use Limiter for is to trim off any peaks that get introduced with normalise. Whice I dont always need of course. So for eg, Normalise first, if ACXCheck now shows Peaks, apply the limiter. Otherwise dont.
If your original shows peaks, and needs the rms adjusting too, Normalise first, then apply Limiter.

OK, that makes sense. I’ll try to remember that when I’m working on the files some more today.

Dana, I’ve watched the video - thanks for that. A lot of it was just reinforcement of what I’ve already started coming to understand. I want to keep my process as simple as yours is. That’s what I’m aiming for. Maybe I’m close to that already.

I just saw Steve’s posts in their entirety on the other page - that’s lot of info to absorb, but thank you Steve! Now I’m going to re-read them … maybe a few times … so it sinks in.

How can you determine what the 125HZ number is? I can’t tell where the cursor would have been placed to get that (all the way at the end, where it goes to the bottom?). I’m analyzing one of my other tracks right now and can figure out what number represents my lowest tone.