Trying to make a good 'narrator' voice sound

kozikowski · June 28, 2009, 6:31am

http://www.kozco.com/tech/audacity/just_voice_ChrisCompressor.wav

This is Chris’s Compressor applied with the compression ratio changed from 0.5 to 0.77 (heavier compression) and the Max Amplitude reduced from 0.95 to 0.9.

The effect is most noticeable in Audacity. The playback meter spends a lot more time up at -1 than it used to and the peaks of the blue waves on the timeline all line up much more now. The show got louder, but it also got denser as Chris gently squeezed much of the volume variation out.

There is one odd problem if you apply too much of this. There are places in the script where the actor leans into his expression…and nothing happens. If anything, the volume seems to get slightly lower instead of higher. It is a little disconcerting if you’re paying attention. Reduce the compression number.

Try this tool on the track after you jammed all the announcer pieces together, but before you went in and adjusted individual word volume.

I could not get the tool to work with the OGG file. I exported as WAV and compressed that. There’s also something about the end of the file that freaks out Chris’s look-ahead modules, so the last second or so of the show has bad compression. Add a second of silence to the end of the file and cut it off later.

The echo/reverb tools I tried were dreadful.

Koz

breslin · June 28, 2009, 4:23pm

Thank you! I have now leaned more heavily on the dynamic compressor, and here is the result:

http://www.buffalo.edu/~breslin/intro_speech.ogg

It sounds better! I think I’m ready to submit the project. (Deadline: midnight today (EST).)

I tried the echo, but it seemed like either it was too low to be perceptible, or it was too much. Maybe I just didn’t hit on the magic combination of parameters. ( I tried the range around 0.5 / 0.5 )

Things I’ll try next time:

If I put one of the tracks out of phase (either voice or music), maybe I could get more separation.

I think there’s frequency combat between music and voice. I think the music track is too fat in the vocal frequencies. Maybe the way to fix this (aside from using different music) is to eq the music down in the vocal range.

I really want to develop wizard-like skills with the eq. I think that’s where a lot of the critical stuff is going to be. I think I could have made the voice sound richer using the eq… Definitely I want to be able to do a warmer and richer-sounding voice.

And I’ll get a better mic!

kozikowski · June 28, 2009, 7:12pm

<<<If I put one of the tracks out of phase (either voice or music), maybe I could get more separation.>>>

Yes, but no. Don’t do that. Straight delay on one channel will cause the voice to cancel out in mono playback systems. Leave the actor where he is.

Jury’s out on the music, but I do know one technique. FM stereo radio does not broadcast Left and Right. They broadcast Left plus Right as a mono channel and then a helper sound channel with Left minus Right. L-R is magic. It’s also called the separation channel. It literally tells the main channel (the mono music), where the brass is left to right (usually right). Where the violins are left to right (usually left). As you decrease the influence of the separation channel, the stereo image of the show slowly collapses to straight mono. One of the things that happens when you drive under a very long tunnel is the separation channel gets noisy and the radio switches to much quieter and robust mono. Reverse when you drive out of the tunnel.

The effect in the car is that the orchestra collapses to a small point just to the left of the GPS receiver on the dashboard and then spreads back out again.

Nowhere is it written that you can’t increase the influence of the separation channel and accentuate the “spread” of the music. This is serious channel management in Audacity and the arithmetic is making my head hurt, but that’s a way to manage separation without serious – or really any – damage.

<<<I think there’s frequency combat between music and voice. I think the music track is too fat in the vocal frequencies. Maybe the way to fix this (aside from using different music) is to eq the music down in the vocal range.>>>

That would be Fletcher-Munson, yes. The “loudness” control on your music system.

http://www.webervst.com/fm.htm

Note that most of the effect is around 3000 where your ear reacts most strongly to sounds. Here, let me run my fingernails down this blackboard…

<<<I tried the echo, but it seemed like either it was too low to be perceptible, or it was too much. >>>

Even the more advances packages – Gverb, et al. – pretty much suck. I’m going to attribute that to apparent simplicity. You would think you could merely repeat the initial sounds with delay and decay and have it done. If you’ve ever seen the echo analysis of your bathroom, you would have to lie down with a cold towel on your head. It’s enormously complex with asymptotic decay parameters, etc. etc. One of the reasons there’s no such thing as software that can remove echoes. Same problem in reverse.

Once again we should leave the voice actor pretty much alone.

[listening to track]

You’re creeping significantly past the point where I can help. Any one of these is a pleasing, presentable product and it’s up to you to decide which you like. Enough engineering, now you need to wear your producer’s hat.

“A great work is never finished, merely abandoned.”

Koz

kozikowski · June 28, 2009, 7:19pm

<<<And I’ll get a better mic!>>>

A word there. I know several people using vacuum tube microphone preamplifiers and would rather disembowel themselves than give them up. These amplifiers wrote the book on “warm tube sound.”

The Arts people make a nice one and past that I have no ideas. One of the people at work is using a Gates broadcast amplifier built in the late forties. It’s on life support and burns your fingers, but he shows no sign at all of giving it up.

Koz

breslin · June 29, 2009, 3:25am

Thanks again Koz. I have the final version to present:

http://www.buffalo.edu/~breslin/intro_speech.ogg

It has a few innovations from the previous versions. First, I panned the voice slightly to one side, the music slightly to the other. This created a great separation between voice and music. It’s not enough to notice, but it greatly helps the ear differentiate by spatial clues.

This enabled me to raise the music levels without stepping on the voice.

As we noted, this particular musical piece has a lot of body in the vocal frequencies, which makes separation more difficult. So another part of my solution was to use an equalizer to lightly lower the music in the actor’s vocal frequency.

This enabled me to bump up the music level a little further. The music remained low enough in the critical frequencies.

Also, I targeted specific parts of the music (fanfares and cymbal crashes) and raised those to create drama and emphasize certain lines in the text.

My concern was that the music was falling out of the picture, but now I think it’s pretty good. I’m a little nervous that it’s too loud, but the original Russian was much too loud, so I’m worried that they expect the music really loud. So perhaps I’m erring a little on that side. Encouragement would do my heart wonders!

Thanks again!

kozikowski · June 29, 2009, 4:31am

I guess it doesn’t matter any more because it’s past midnight - Eastern, but it starts funny. The music doesn’t duck fast enough or the first voice doesn’t punch enough. Past that, it’s good to go.

Koz

george13 · June 29, 2009, 11:31am

Hi,
I liked it. 2 things I might think of changing:

like Koz said the music doesn’t duck fast enough at the beginning or the voice starts too low (the word “Throughout”).
When I compare it to other commercial games I know i find the bass a little lacking. Many games have a kind of bombastic loudness effect in the bass (for good or bad I think it would sound more impressing with some db more bass in the voice).
I’m no expert though.

kozikowski · June 29, 2009, 3:12pm

Some of that could be the microphone. SM58 has low frequency droop to avoid popping and handling noises and there’s that presence peak around 5KHz. I suppose if you generated a curve to take the microphone out of the picture, the curves are published, then that might help some of these minor voice issues.

http://www.shure.com/stellent/groups/public/@gms_gmi_web_us/documents/web_resource/site_img_us_rc_sm58_large.gif

But generally, nothing valuable happens below 100 Hz anyway. Most field mixers run with the 100Hz or 80Hz filters running all the time. But this isn’t a field job.

Remembering that over the course of the thread we went from answering machine sound to commercial product. No small feat that.

Koz

kozikowski · June 29, 2009, 3:25pm

Another note. The SM58 is a directional microphone and has proximity effect. The bass goes way up when you get closer. Also called the Rock Band in a Small Club effect. The lead singer comes out and introduces the band and knocks over beer glasses in the first three tables. It’s very hard to control that and it’s one of the reasons recording in a small club is a nightmare. It’s also one unpublished effect of the pop and blast filter. It keeps the performer away from the mic.

In a studio, if you’re getting lipstick on the microphone, you’re totally doing it wrong. Again we can put a lot of these effects into a track, but we can’t take them out.

<<>>

Describe your sound system.

Koz

breslin · June 29, 2009, 7:52pm

Of course, you’re right about the beginning of the voice being too low. I backed off the amplification a bit, at the very end of my process when I was checking that the total mix was not clipping. Of course, the individual tracks did not clip, but sometimes they made a combined effort to reach into the red. It was only by a hair, so probably I could have left that alone. How permissive should I be of an occasional red?

Obviously, if red is always bad, then I should have solved the problem by ducking quicker.

Good to know. It occurs to me that a good recording practice would be to turn the line level down and do a take with the actor closer to the mic. Then I can take advantage of proximity effect variation, when I’m pasting together the final.

Another thing I noticed about bass, at least in this project: the quieter words have much more bass. Maybe I lost bass when I compressed or normalized? Maybe the actor’s voice got a lot bassier in the quieter moments. His louder words were higher in pitch. But even if the tone is a fifth (half octave) up, it should still have those rich bassy undertones somewhere in there, right? I don’t know, I’m trying it now with my own voice, and it seems the richness drops out when the voice raises, even by a fifth. If this is indeed the case, I’ll settle for occasional richness, interrupted by shallower but more enthusiastic passages.

I wish I could pin this stuff down precisely, because I really like the occasional bass sound, which you can hear for example in the second half of the first sentence. (The first sentence is a really good case study in this.)

Anyway, I certainly agree about having more bass. I think this is part of what I’m talking about when I use the vague word “richer”.

I think there’s pretty tight limits what I can do with the EQ or the “bass boost”. It seems that taking bass out is easy, but when putting it in, you get muddy fuzz pretty quick.

kozikowski · June 29, 2009, 8:53pm

<<<How permissive should I be of an occasional red?>>>

Occasional red being a VU meter red or a digital signal meter hitting “0?” US-ANSI VU meters are supposed to gently hit red on occasion. That’s how they work. Digital red represents permanent destruction of the quality of the sound. There is no tolerance and no recovery after it happens in the Exported show.

This is why you produce the work low and then at the end, when you get it perfect, set the delivered show to the desired peak values with Amplify – not Normalize. You know “Normalize” messes up left-to-right relationships, right? Amplify doesn’t. Normalize is different in many other sound programs. It’s one of Audacity’s nasty surprises.

<<<Then I can take advantage of proximity effect variation, when I’m pasting together the final.>>>

Yes, but the other thing that does is give you the announcer three inches from your ear effect – or announcing into a telephone. Neither sounds particularly good. Many bad things happen when you’re too close to the mic – even non-directional mics which don’t have proximity effect.

That’s not to say you can’t do that for special effects. I, in my deep voice, once did a passable woman by getting really close to the mic and deliver the lines in a breathy whisper.

<<<I think there’s pretty tight limits what I can do with the EQ or the “bass boost”. It seems that taking bass out is easy, but when putting it in, you get muddy fuzz pretty quick.>>>

Are you going into overload? See: produce the show low and set final level just before delivery.

You can only boost what’s already there. The system can’t make it up on the fly – although there are software packages that people keep claiming can do that. Does the work sound like the actor? I’m guessing yes, and do keep in mind you’re using a microphone designed for live performance screaming, not studio capture. The first time you use a higher quality microphone you will be surprised at the work you don’t have to do to make the capture presentable.

One further note, Chris does have problems with extreme beginnings and ends of the performance. It’s common to add about a minute or so of silence to both ends so the software has someplace to go to catch its breath without affecting the show.

Koz

kozikowski · June 29, 2009, 11:41pm

Wait. It’s not a minute. I think it’s ten seconds. I gotta go look that up.

Koz

breslin · June 30, 2009, 10:00pm

Oh dear oh dear. I don’t even know what a “left-to-right relationship” is. I really need to get some proper training in this field! Well, I’ll pick up as fast as I can.

kozikowski · June 30, 2009, 11:28pm

It isn’t that crazy. Amplify rips through the whole show and adjusts one master volume based on the one loudest sound from either side.

Normalize adjusts the volume of left and right individually. The upshot of that can be a startling distortion of, for example, where the instruments are in your head while you’re wearing headphones. If you went to great effort to cause the trumpets on the right to be more prominent, Normalize may try to “fix” that for you and even them out.

Koz

breslin · June 30, 2009, 11:39pm

Ah, that makes very good sense, thank you. Like a typical beginner, I will imagine mysteries in simple things. For example, I was leery of raising amplification towards the clip level, lowering it again, raising it again, lowering again… I imagined that some frequencies would be clipped out anyhow, and flipping up and down would incrementally decrease the sound quality. I now think that this might be true for analog recording, but the digital source is unaffected by this (unless you do go past the clip-point).

That said, I don’t understand why we go for -1.0 db amplification, rather than 0.0 amplification (which should be safe, according to the mathematics).

kozikowski · July 1, 2009, 5:51am

I got a minute to address the bass problem.

http://www.kozco.com/tech/audacity/VoiceCompressBassBoost.wav

I only did the first four phrases. Note the level doesn’t change.

Koz

kozikowski · July 1, 2009, 6:10am

<<<I now think that this might be true for analog recording, >>>

Analog recordings are constantly crammed between tape saturation at the top end and oxide and electronic noise at the lower. In Digital, the overload point is always right there next to you, but there effectively isn’t a noise floor. 16-bit noise floor is -96dB. The absolute quiet limit of human hearing is about -60dB and the electronic limit is 36dB below that.

The broadcast overall average program level is -20dB (in the US) leaving a very generous amount of room for loudness increase with no distortion and still maintaining 70dB noise floor. Tape recordists would have killed for specifications like that. It is magnitudes better than you can do with the best low-noise tape on a perfectly adjusted machine.

So produce your brains out, but do it generously lower than overload, then Amplify as the last step to get the deliverable product. If you’re constantly jacking the volume up and down and smacking zero, you’re definitely doing it wrong.

<<<That said, I don’t understand why we go for -1.0 db amplification, rather than 0.0 amplification (which should be safe, according to the mathematics).>>>

In digital, yes. But sooner or later it gets converted to analog and that step is where the process can become unstable.

Koz

kozikowski · July 1, 2009, 7:03am

<<<Note the level doesn’t change.>>>

We also note that listening on my laptop speakers, there is no difference between the first four phrases and the rest of the clip. On my killer sound system® there is a substantial change.

Koz

breslin · July 1, 2009, 12:40pm

Very interesting treatment of the bass issue!

I think you amplified with a level of -7 (to give you room to work with), then applied the eq. I tried to imitate it (still working on my ear!), and ended up with something that looked like this:

It still sounds like I’ve made some error in the eq, but with my untrained ears and my wimpy speakers, I’m having trouble locating the error. I need to get new speakers, and train my ears up to learn which frequencies are which. I guess the latter is just a lot of practice on the eq.

On the subject of speakers, should I be looking towards high-quality computer speakers, or (I think more likely) a reasonable power amp and a set of studio monitors?

kozikowski · July 1, 2009, 3:12pm

Yes, you really can’t hear what you’re doing unless the sound system is up to it. There was a commercial a while back of a guy cutting together his movie on his laptop on the flight back from the live shoot. I guarantee he didn’t cut the sound like that.

That’s my system…

http://www.kozco.com/mytv/mytv.html

The sound is described in the text. Nice headphones are sometimes indicated.

Somewhere on your machine is a file called EQCurves.xml It likes to hide behind fancy-pants labels, but it’s just a text file and it will open up in TextEdit or NotePad. Paste this code into that file and save it. Make sure your editor is not set for Rich Text. Only Plain Text.

The file has many other programmed curves and it should be no problem to follow the rhythm and style of the other filters. Spaces and carriage returns do not matter in XML code, so they’re used for neatness and maintainability.

And compulsiveness. I can’t sleep until each '“f” lines up.

And it should look something like this…

http://www.kozco.com/tech/audacity/endoria.jpg

The negative offset is how I got the overall volume reduction. There’s a lot more design in there than there appears. There are no sharp bends and the boost curve was created using the droop curve of your microphone as a model and starting point.

Koz