Improved "leveling" for speech

According to my sources, the main use of the Leveler effect (Audacity Manual) is to “level” out variations in amplitude (reduce dynamic range), particularly for speech recordings.
The manual describes the effect:

Leveler is a simple, combined compressor and limiter effect for reducing the dynamic range of audio. It reduces the difference between loud and soft, making the audio easier to hear in noisy environments or on small loudspeakers.

It is best suited to speech recordings but at heavier settings or used multiple times

As a test, I made a mock-up of a two voice dialogue in a stereo recording, with one panned some way to the left and one toward the right, The voice on the left is a lot quieter than the voice on the left. I expected this test to be a difficult challenge for the Leveler effect.

As expected, in order to achieve a reasonable amount of “leveling” the effect had to be applied quite heavily, which caused noticeable distortion to the louder voice and the quieter voice was still noticeably quieter.

I then tried a new, simple, dynamic compression effect that I have called “Level speech”.
The default settings are intended to boost the levels of the voices so that they are both loud and clear with minimal distortion to either voice.

This is the test recording. First is the original “conversation”, followed by the same dialogue after attempting to “level” with the Leveler, followed by the same after attempting to level with the new “Level speech” effect:

The new effect has just 2 controls:

  • Leveling amount (%): 100% (default) provides maximum leveling. 0% produces no leveling, but still “limits” peaks to the maximum valid range of 0 dB (without “clipping” the audio).
  • Threshold (dB): The default is -20 dB. Sounds above this level are “compressed” into the high (loud) end of the dynamic range. This is called “upward compression”. Sounds a little below this level will still be made louder, but less so. To “level” 2 voices, the Threshold should be set a little below the level of the quieter voice.
    Tip: If background noise becomes too loud, try raising the threshold. If the quieter voice is not made loud enough, try lowering the threshold.

And this is the plug-in:
LevelSpeech.ny (1.99 KB)
(Old version)

Latest version is on the wiki: Missing features - Audacity Support

I think the second snippet where the current Leveler processes the sound is quite like what we want - except for the distortion. In the third section of the clip it sounds to me like we made the softer voice much louder than the one that was previously too loud and that there is now a greater dynamic range than before. To me that sounds very unsettling to listen to - like a “special effect” - not generally useful at all.

I think we want a simpler compressor for speech and music.


Gale

You mean the one with the loud distorted voice and the other quieter voice? I thought that was exactly what we wanted to avoid!
The leveler failed to “level” the vocals and added noticeable distortion. How can that possibly be desirable for the intended task :open_mouth:

If you prefer a less “leveled” effect, try setting the “Leveling amount” to less than 100%.

Forget the distortion. Neither of us want that.

To me the middle section that I understood to be processed by Leveler sounded like the two voices were more “equally loud” than before.

I have not tried your effect, I just listened to the clip where I assumed the third section was trying to make the voices equally loud.

The third section, to me, has the voices vastly different in loudness. To hear “it wasn’t such a problem” comfortably I now must turn the volume up, then get jolted out of my chair by “to my next point”.

Also, listen in the third section to the volume changes introduced in each word. Why the volume boost on “mans” of “left over humans”? Why the volume reduction on “vivors” of “the survivors”? The emphasis on “survivors” in the original audio is “surVIVors” (which is preserved in “Leveler” and our “Compressor”). In your effect, it’s “SURvivors”.

I don’t know how any of that can be called leveling (in the vernacular). It sounds (and looks on the waveform) like the “inverted loudness” in Chris’s compressor. I don’t think we need that in a “simple effect”. If we do need it for some reason, should that not be applied by > 100%? :confused:


Gale

I tried lower leveling rates. At 50% the abrupt volume changes within the same voice seemed even more noticeable. At 10% - 20% yes I guess it was effective in “leveling” the differences between the voices without distortion but it still reduced “leveling” within each voice and made them sound jerky.

As a noob I would use Leveler at Light setting and apply three times. I know I have old ears but I can barely hear distortion if I do that, and that is much more comfortable to me than anything I can get with this effect.

Sorry, I can only give my honest reaction.


Gale

To me, the middle section (Leveler) sounds horrible. Using a less “heavy” setting provides even less “leveling”.

Note that the heavier leveler settings are just more repeats of the lighter setting. There is NO difference between applying the heaviest setting and applying the lightest setting 5 times. There is NO difference between apply the light level 3 times and the “Heavy” setting.

In the first section, the voices are about -30 dB rms and -20 dB rms respectively. (10 dB difference).
In the second section, the voices are about -21 dB rms and -15 dB rms (6 dB difference).
In the third section, the voices are about -17 dB rms and -17 dB rms (both about the same average level).

A very similar pattern can be observed in the peak level of the voices.

Note that the voices are panned. If you are sitting closer to one speaker than the other then one voice will sound louder than the other.

I wasn’t aware that the Leveler effect was aiming for sound quality. If that is the case the the Leveler fails even more badly. I thought that the aim was to “level”, which is exactly what the “Level Speech” effect does (even if that does not match your opinion).

The “Level Speech” effect is a fast compressor with a high compression ratio. The change that you notice within single words is because the effect reacts very rapidly to changes in peak level, thus the word “do” will tend to emphasize the “Ooh” sound because that has a lower initial peak level.

The “reaction speed” is a compromise. The faster it is, the more it will tend to change the emphasis in words, but if it is much slower then there will be a delay before the quiet voice “fades in” and the quiet voice will “fade out” just before the loud voice starts. Ideally a compressor would have controls for these two “fade times” which are the Attack and Release controls on a typical compressor.

The 100% setting precisely matches the peak level to -0.1 dB, that is, 100% is the “level” (flat dynamics) setting.
Yes it sounds “unnatural”, but that’s because flat dynamics are unnatural, but isn’t the point of a “leveler” to “flatten” the dynamics?
The “amount” slider allows you to “flatten” the dynamics less, which will sound less unnatural.

I can adjust the Attack and Release so that the compressor reacts more slowly, or provide controls for doing so.
The default “Leveling amount” can be whatever we want it to be. I set it to 100% as that is literally “leveling” all peaks to -0.1 dB.

OK then I should have saved RSI and done “Heavy” once in Leveler. If I invert the result of “Heavy” against three “Lights” I agree I get silence. :wink: The Manual has no information about it but I thought that three “Lights” might give me something a bit less than moderate.

Any noobs I polled would I think say that a leveler should make the voices sound more (or less) equally loud, according to a setting on the effect.

The distortion you and perhaps others can hear in Leveler even at lower settings are not something I like. Ideally we want “equally loud” (whatever that means) without distortion - which I can seem to get with our Compressor. Is the waveshaping making it easier for Leveler to sound more “equally loud” even though statistically it may be less “equally loud” than your effect?

No I was not sitting closer to one speaker than the other. The “it wasn’t such a problem” sounds far quieter than “to my next point” to me, even if their RMS are matched over both channels (if that’s what you mean). “To my next point” sounds too loud to me even if I sit towards the right-hand speaker to counteract it and even if I turn my slightly stronger left ear towards the right-hand speaker.

I am glad you agree this effect is un-natural. No-one I am aware of says that Leveler’s or Compressor’s treatment of volume is un-natural.

Whatever value this effect may have, it isn’t IMO a replacement for Leveler if it has to be set to 10 - 20% to make the voices sound equally loud - even if you can fix the attack and release time. And Leveler (correctly for noobs) doesn’t have a Threshold control.

Can someone else with reliable ears give their view?

I still agree with myself on that, given part or even most of the objective is to help people who want to compress music or speech but can’t get on with Compressor.

Direct quote from a user I’ve found.

A bit of distorsion is better than no compression!

Gale

I spotted it in the code. The settings are the number of repetitions from once (Light) to 5 times (Heaviest).

How strange. I’ve tried listening on two different pairs of speakers and those two phrases sound to me to be virtually identical loudness. I guess we already knew that “loudness” is subjective.

I probably wouldn’t say that a voice recording playing at double speed sounded “un-natural”, though clearly it would not sound “natural”. I’d probably say that it sounded “speeded up” or “like a chipmunk”. Similarly I’d probably say that a heavily “leveled” (with the Leveler) recording sounded “distorted” rather than “un-natural”, though again I don’t know anyone that “naturally” sounds distorted.


But the Leveler only “levels” if the levels are already quite close, otherwise it distorts like crazy. Anyhow, Leveler does have a “threshold” control - it’s labelled “Noise Threshold”.

A hole in the hand is better than a hole in the head, but I’d rather avoid both. :wink:


What were the setting in “Compressor” that sounded right to you? That should give me a good idea of what you are looking for,

This compressor has the same controls as before but is somewhat less aggressive unless “Leveling amount” is pushed high.
For a subtle effect, use a low “Leveling amount”.
For a strong effect, use a high “Leveling amount”.

The “Threshold” setting, as with the current “Leveler” effect, is so that excessive amplification of low level sound can be avoided.
As with the current “Leveler” effect, this setting does not prevent low level noise from being amplified, it just amplifies it less.
LevelSpeech.ny (2.03 KB)

I listened again (using the same speakers) and now in headphones, and I feel the same. The speakers are JPW monitors worth £80 about 10 years ago. I don’t regard them as complete crap - they seem to suit the smallish room I listen in.

I can hear the distortion in Leveler more clearly if I compare it with Compressor. If I compare Leveler with your leveler’s attempt (third part of the original clip), that sounds so “un-natural” to me that it totally outweighs the distortion in Leveler.

Also in that third part, I can hear what I regard as “boominess” which is very close to “distortion” (mostly in the voice that was originally too quiet). That “boominess” sets a sort of sympathetic ringing going in my ears which is not very pleasant. The “boominess” is not there if I just take a section of the voice that was originally too quiet and amplify it to 0 dB.

Your new attempt largely fixes the changes of dynamics and emphasis within each voice (not completely at high leveling). Even at low leveling I don’t think it’s quite as smooth as it could be for the first 1.5 seconds “What does he need distraction”. My general impression is otherwise slightly less favourable, in that the “boominess” is more oppressive unless I set it to 10%. In the first version I found 20% the optimum setting, but in the new version 20% clearly gives a louder (peak and rms) result than before.

Otherwise the new version seems similar - at high settings I sense that the loudness of the voices has been “inverted” so that the softer voice is now louder. Maybe it is a legitimate effect but I don’t expect to hear that at 50% (default) setting and I doubt many current Leveler users will expect it.

OK but the difference is - whatever descriptions are given for it - empirically, noobs can mainly ignore the threshold in Leveler if there is little noise. I’ve just seen the old description you found for its Noise Threshold, but because of the distortion problem and the implementation, that setting seems to make little real difference to the audible result. This actually helps noobs.

You say the threshold in your new effect needs to be set “a little below the level of the quieter voice”. This is what I would prefer to avoid in a “simple” compressor if possible. User has to look at the vertical scale (linear) and figure out how that relates to dB. Also is this peak or rms level of the quieter voice?

My tests so far of your leveler did not change the threshold. It looks as if the quietest phrase of the quiet voice in the original is about -12 dB, so I tried -13 dB threshold at 10% leveling. That did not improve it for me or seem to make much visible difference to the waves. Then I tried - 5 dB and -4 dB at 10% which got rid of the “boominess” and give reasonable leveling (I would have liked a bit more leveling). But, both those settings are a long way from the defaults.

If I want a quite heavy leveling that sounds to me something like released Leveler without the distortion - Ratio 8:1. Threshold -35 dB. Other sliders unchanged from default. Make-up gain and “Peaks” unchecked. Then Normalize to -2 dB. Note that this reduces the stereo separation quite a lot - more than released Leveler or yours. But I think some people may think that helps “leveling” in some sense.

If I want Compressor to do a lighter leveling something more like your leveler at -5 dB (retaining stereo separation) then I would set Threshold closer to its default -12 dB and perhaps not use Normalize. I like both those Compressor settings in their different ways, except that the heavier setting exposes that “unwanted fade in” problem at the start that Compressor has.

Also I tried your demo clip in the Levelator which some users regard highly Levelator Binaries and Source Code : Doug Kaye : Free Download, Borrow, and Streaming : Internet Archive - you can get the sources. There are no settings whatsoever. They say this combines elements of a compressor, normalizer and limiter. And they say a leveler’s purpose:

is not specifically to reduce the dynamic range of a signal like in a compressor or limiter (though that is often what happens), but to simply have an audio signal stay at roughly the same volume for an extended period of time.

Here is the Levelator’s result on your demo clip: gaclrecords.org.uk . I like this too, and it retains separation. I think it lacks a little “body” in what was originally the louder voice and has a little of the “inverted loudness” feeling I get in yours. I marginally prefer yours at -5 dB threshold.

=====

I saw your latest comment in Crew thread about this. I’ve got to go, but yes - I really don’t think Leveler was intended by Lynn as a “steamroller - flattener” effect such as the 100% setting in your effect makes. It was intended as a simple compressor for speech (as I said all along).

But given our current compressor is zanier than it was in Lynn’s day, I see lots of value in the new effect being a simple general purpose compressor for speech and music. I think it would be a missed opportunity not to try - clearly noobs try to use Leveler for music, unaware of its history.

If possible I would still like it to be an expander too. I think expansion/compression on the same control would be something noobs could get a grasp of, though less important that a simple general purpose compressor.


Gale

It occurs to me that the previous test audio is flawed in that we are comparing different words, so I’ve compiled a new test sample,
The first part has short phrases repeated with a pause between each phrase. The first, third and fifth repeats have been amplified by -6, -12 and -18 dB respectively.
The second part of the test is the same as the first but without the pauses. This part is to test how well the compressor is able to respond to rapid changes as might occur in a conversation.

New “extra simple” compressor and test results to follow.

Results for Audacity Leveler:

Default setting - Moderate:

My assessment:
Not much noticeable distortion, just a touch of harshness on the loudest phrases, but negligible leveling of the volume changes.


Heavy:

My assessment:
Noticeable reduction in dynamic range between the louder phrases and the -6 dB phrases, but accompanied by noticeable rasping on the louder phrases, particularly “Some words”.


Heavier:

My assessment:
Definite reduction in dynamic range. The -12 dB version sound close to what the -6 dB was and the -18 dB version close to what the -12 dB version was. The distortion on the loud phrases is now quite unpleasant. I’d not want to listen to this for very long.


Heaviest:

My assessment:
Breaker one-nine, breaker one-nine. This is the bear in the air…

New version of “Level Speech” compressor.
LevelSpeech.ny (2.35 KB)
Default Leveling amount 30%:

My assessment:
Good amount of dynamic range reduction. The quieter versions are still noticeably quieter than the louder version, but much less difference. Louder versions sound a little “heavier” and a very slight increase in background noise. Enormously better than any of the results with the Leveler.


Leveling amount 70%:

My assessment:
Very strong leveling, probably more than would be wanted in most situations but could be useful for “maximizing loudness”. It would be interesting to test this on some uncompressed “hard house” or hip-hop recordings. Noticeable “compressed” sound to the louder phrases. Some noticeable increase in background noise, particularly after “I stood a little to the right…sss”

Slight tweak to the effect. The default setting is unchanged but the noise level is a bit lower, particularly at higher settings.
LevelSpeech.ny (2.35 KB)

Another minor tweak. Slight speed improvement when using as a 0 dB limiter (Leveling amout = 0%)
LevelSpeech.ny (2.37 KB)
This is a rather nice “advanced feature” for such a simple effect.

  1. Ensure the track is 32 bit float format.
  2. Amplify above 0 dB.
  3. Apply the effect with “Leveling amount” set to 0.
    The over 0 dB peaks are limited to 0 dB and the track amplified to a peak level of -1 dB.

Tweaked the amount scale to give finer control with subtle settings.
Default threshold (internal setting) raised from -18 to -12 dB.

Not sure is scaling down output to leave headroom is a good idea or not so removed from this version.
LevelSpeech.ny (2.4 KB)

It certainly improves matters …
Demo of Steve's LevelSpeech,ny (2.4 KiB).gif
but how can it “know” when it is the quiet person talking , ( and to increase the gain accordingly ),
rather than a quiet section within the loud person talking, which should be left alone ?

1 Like

Is this plug-in still needed or can it be done with the normal leveler?

If you want a “mild distortion effect”, then “Leveler” may be the effect that you want.
If you want to “even out” the volume of voices in a recording without distortion, then, in my opinion, the “Leveler” effect is not the effect that you want.

I would suggest that you try both the Leveller effect and this “LevelSpeech” plug-in and use whichever you prefer. Please leave comments in this topic with your opinions regarding these two effects.

I tried the level speech plug-in with this audio. I used the -36 threshold and heaviest degree of leveling. I don’t like how it turns out. The louder voice gets very sharp. Some tips to make it smoother?