equal loudness curves, plot spectrum, and dB levels

Steve_Bender · February 11, 2015, 9:03pm

Hello, first time poster here. I’ve been using Audacity fairly intensively for certain aspects of music creation for about half a year, but I just recently decided to make it my DAW, so I’m ready to invest some time in learning its capabilities more fully. I’ve been reading the manual/wiki/forums and I’m amazed by the features it offers (that I was not using), the level of knowledge that exists here, as well as the dedication of the developers. So impressive.

Anyway, I have a question about equal loudness curves and the plot spectrum as they relate to music creation, dB levels, mixing, and EQ.

I have read that a general rule of thumb is that for two occurrences of the same basic sound to vary noticeably in terms of volume/dynamics, they need to differ from each other by at least one dB. Less than a dB and it sounds completely unvaried in terms of dynamics (like the “machine gun” snare rolls in really bad EDM). This is a general rule I’ve been using when creating musical parts in Audacity from samples. So, for example, if I’m creating a percussion part (hi hats, cymbal, snare roll, kicks etc) I will usually make two adjacent hits at least a dB apart (say, -2 dB followed by -3 dB) so that a listener is able to perceive them as being distinct, volume-wise. If I have five different hi hat samples in a hi hat pattern, a fairly common things I’ll do is have them set at -1, -2, -3, -4, -5 (or sometimes more like -1, -3, -5, -7, -9). This has been a decent rule of thumb so far, although lately I have been tending to larger differences like 1.5 or 2 dB. I get a decent amount of dynamic variation this way.

I have also read just the bare minimum about equal loudness curves, and how different frequencies need to be at different dB levels to be perceived by a listener as being equally loud. I get the basic concept, but only have a very rough understanding of how to employ this idea in music creation and mixing (although I find it helps just to be aware of it while mixing, even if you’re still getting the job done primarily through trial and error and active listening).

So my question is how these two things relate, and how to go about analyzing sounds to be able to make smart choices about it. So, for instance, let’s just say that a sound centered at 4,000 Hz only needs to be at -4 dB to be perceived as being equally loud as a different sound centered at 1,000 Hz, at -2 dB (I have no idea if this is even close to accurate – I just chose -2 and -4 to keep the math simple). Does this mean that the dB difference needed between two of “4,000 hZ sounds” would be only half of the dB difference needed between two of “1,000 hZ sounds?” Or the other way around? Or does it really not work this way at all?

Assuming that there is some kind of merit/usefulness to what I wrote above, what would be a good way to go about implementing this knowledge in music creation? I see that Audacity has a “Plot spectrum” function that gives a visual readout of the frequencies present in a given sound. And I also understand that the smaller the audio clip analyzed, the more accurate it is. Would this be a good tool to use for my purposes? Basically, could I analyze a kick drum, see where the predominant frequencies are, compare that to the equal loudness curve, and then use that to determine my minimum dB difference? If so, then using this same method to analyze a hi hat should yield a different minimum dB difference, because the frequency profile of each instrument is pretty drastically different.

Assuming the plot spectrum is a good tool for this purpose, what is the best way to read it? Obviously, most non-synthesized sounds are going to have dB energy across a wide swath of the audible frequency spectrum. What kind of rule of thumb can I use to say “this sound is primarily ______ Hz?” Is it just a matter of looking for the peak, and that’s what I would use?

Or is this all just unnecessary analysis? I’m anticipating that someone will say “just use your ears, son,” which is fine – I agree that ultimately that is the best test of whether any methodology really works. But still, I’m intrigued by the possibilities of this, so I’m curious if anyone has any insight into how I might make it all work for me.

Please correct any misinformation or bad assumptions I have. I’m here to learn.

Thanks.

kozikowski · February 11, 2015, 10:31pm

There are some rules of thumb. Woman’s higher voices tend to be used for announcements because their voices tend to group in the sensitive peak of the Fletcher-Munson curves.

Also, lesser known is that Fletcher-Munson isn’t the only game in town and people have claimed much more statistically accurate results with their own curves. And we’re having a special sale on Friday. Sign here.

So Fletcher-Munson tends to be popular because of their cost. -0-

If you want to spark a spirited discussion, ask where half-volume is. Most people agree it’s not half signal, 6dB.

I found a lot of benefit in knowing typical instruments and where they tend to cluster on the frequency scale. Screaming Babies on Jets, for example, almost has to include energy around 3,000 Hz. Thunder, trucks and diapason organ music tends to cluster south of 100Hz, etc. That goes a long way to avoid shotgunning a post production session while trying to figure out what to do to fix a performance.

Switch the Effect > Equalization tool to Graphic mode so you get the Sliders instead of trying to grapple with the graphs. Run them up and down until you get a good idea where everything is.

There are lots of oddities. Human hearing in general goes from 20Hz to 20,000Hz. Guess where middle C on a piano is. 256Hz. That oboe note at the beginning of the concert? 440Hz. That’s why you can go a semi-reasonable job of broadcasting with AM radio which only goes up to 5,000Hz.

What do you play?

Koz

Steve_Bender · February 11, 2015, 11:10pm

Wow, lots of info in there that I’ll try to digest slowly. Thanks. I wasnt aware that “half volume” was such a contentious issue!

I am a drummer with no access to a drum set, which is largely why I turned to creating music with a computer (but now I love using computers to make music for other reasons). So now I’m trying to make music with drum set sample sets (that I arrange myself into beats with Audacity), and all other kinds of sampled instruments (bells, vibes, Rhodes organ, piano etc.) that I arrange into riffs, chords, and melodies. Or occasionally I’ll use free sounds that I find online as source material (but less and less of that as I’m doing more recording). I also dabble in guitar and keyboards, which I record through a small tube amp. I also run different instrument plugins through FL Studio, export it as a wav, and arrange/edit it in Audacity. I also run all kinds of things through my amp and record that with a mike, mainly synthesizers lately. I also record a lot of percussion stuff myself – hand drums, hand claps, claves, castanets, shakers, tambourine, stuff like that, plus weird stuff like wine glasses or whatever piques my interest.

So, whatever I record, it has its own dynamics already. No problem there. It’s when I’m building something from scratch that I have to use rules of thumb sometimes to make sure that I’m getting some dynamic nuance in there, and that it’s actually coming through to the listener. That’s what my question is really about. How do I do that while taking frequency into account?

So, will the minimum dB difference change based on where it sits in the spectrum? Assuming that I can get a decent idea of where a sound is centered frequency-wise, will I be able to use that info to give myself a better idea of minimum level differences? Basically, I’m trying to create drum parts (as well as melodic stuff) “from scratch” from samples, or generated from plugins, where the dynamics are nil, or just not very good/real. So I have to build in the dynamics myself, so I have to say “this hit/sound will be -3, this will be -6 etc.). So I need to have a semi-reliable way of knowing that the dynamics are actually translating, that they’re actually perceivable. If they’re not, then I need to adjust my methods. Does that make more sense?

steve · February 11, 2015, 11:17pm

Ooh, lots and lost of questions

Simple things first: The abbreviation for Hertz (cycles per second) is Hz (upper case “H”, lower case “z”).

Ultimately, “yes”. For music, the all-important question is “what it sounds like”, but that is not to say that the theory and science of sound is not interesting or relevant to music and music creation. In fact, the statement “just use your ears” is far more complex and “deeper” than it may first appear.

A bold statement: “There is no such thing as sound”.
What I mean by this is that “sound” is not a physical phenomena. “Vibration” is a physical phenomena, but “sound” is a subjective perception. “Sound” can be defined in terms of “hearing”, which is (usually) a sensory reaction to a physical stimulus of vibration, primarily through the ears. Some scientific studies have shown that sound may also be heard through other parts of the body - in particular, very low frequencies close to the threshold of hearing may be “audible” through other parts of the body (though the sensory diference between “hearing” and “feeling” very low frequency vibration can be quite indistinct).

The extreme limits of human hearing are usually quoted as approximately 20 Hz to 20 kHz (20000 Hz), though most people are not able to hear either of these extremes. The upper frequency limit tends to deteriorate with age, so for most adults the upper frequency range is likely to be around 16 kHz or lower.

Between the upper and lower limits, the ears respond in a non-linear manner to the vibrational frequency. Generally the greatest sensitivity is at around 300 Hz (3 kHz). This can be seen in the “Equal Loudness” curves as the lowest point of each curve:

The curves show that below about 200 Hz, there is an increasing reduction in sensitivity, down to the point at around 20 Hz where the ears are unable to respond.
Similarly, at some point above 10 kHz the ears rapidly lose their sensitivity.

Because hearing is subjective, and varies from person to person, it is impossible to measure “loudness” exactly. We can measure the “vibrational intensity”, but for any particular sound, that sound may sound louder or quieter than it does to another.

So far we have only been considering simple “tones”, but the picture becomes much more complex when considering “real world” sounds. As with other human senses, hearing tends to work by “comparative” measure rather than by “absolute” measure. When in a noisy environment, a particular vibrational stimulus may “sound” fairly quiet, but if that same stimulus is experienced in a very quiet environment it will sound much louder.

“Loudness” is also subjectively different if the sound is in short bursts rather than continuous.

…

I’ve already written quite a lot here, so I’ll sign off for now with a couple of links:

http://wiki.hydrogenaud.io/index.php?title=ReplayGain_1.0_specification

Steve_Bender · February 11, 2015, 11:48pm

Thanks for your reply, Steve. I’ll set aside some time to read those links. And I like the distinction between vibration and sound. Good to keep in mind.

Possibly a dumb question here, but is 3,000 Hz right about F#7 / G7 in “standard” (A440) tuning? If so, that would seem to comport with my observations so far that the top of the typical musical range (talking about fundamental pitch, not overtones, of course), so octaves 6, 7 and 8, are REALLY easily audible at pretty low volumes in a mix. I wasn’t sure if that was just my mind playing tricks on me, or the fact that the sounds up there have so little to compete with. But now I think it must be related to the equal loudness curve.

So… I’m just spitballing here - I’m thinking that up in those higher ranges, I should play around with figuring out what the minimum dB difference needed is in that general band, and then just come up with some fairly arbitrary way to increase that as I move down the spectrum. I’ll probably just make a really simple repeating pattern with a Rhodes sample at G7 and see what’s the minimum dB difference between two notes I can get away with and still get that “up/down, up/down” volume effect. Then that wil lbe my baseline “best case scenario” and I can build off of that. Becaue if human hearing (on average) is the most sensitive in that 2,000 - 4,000 range, that will be the frequencey range where you really need very little dB differences between sounds to get some dynamics. Once I have that fairly well established, from there it’s just kinda doing guesswork and trial and error to figure out what is needed at lower fundamentals, but it should certainly be a value greater than what is needed in that higher range.

Does that sound workable? OR is there an easier way that I’m overlooking?

steve · February 12, 2015, 12:28am

Another handy link: Frequencies of Musical Notes, A4 = 440 Hz

I was saying previously that hearing works “comparatively” rather than against a fixed “absolute” measure.
“dB” is particularly useful in audio because it is not actually a “unit”. “dB” represents a “ratio” (Decibel - Wikipedia). When we use “dB” is always with reference to something else. For example, in audio recording, and “signal processing” in general, signal levels are frequently measure with reference to “full scale” (often written as dBFS. How this applies in Audacity is that the “full track height” is the “full scale” reference point. The “reference point” is the 0 dB level. That is why waveform amplitude is generally measured as negative numbers. The (valid) waveform is smaller than the 0 dB reference point, and is therefore a negative number to represent “below 0 dB”.

Although there are many complexities to the issue, as a reasonable guideline, throughout the range of “musical notes”, the “dB scale” works pretty well for the way that we hear. When talking about loudness, +1 dB is a small increase in volume, whether at 200 Hz or 2000 Hz. Similarly -1 dB is a small reduction. When played through a good loudspeaker system (“studio monitors”), a given number of dB change will sound like pretty much the same “amount” of change, regardless of frequency (though weird things happen at the extreme limits of hearing). +10 dB in “level” is generally quoted as “doubling the loudness” (though note the proviso that “loudness” is subjective).

kozikowski · February 12, 2015, 1:42am

Thunder and earthquakes enough to rattle the windows? Even if you can’t hear it, either of those can push air around enough (vibration) to move glass which does make noise. So the window in the living room is an inaudible to audible converter.

Vibrations down at the low end can be very entertaining. What does an earthquake look like? Ummmm. A wine glass falling over? OK, what does it sound like? [pause] A wine glass falling over?

I can tell you didn’t have your drum set with you, because you’re worrying about fractions of dB (deci-Bells, by the way. From Alexander Graham).

You mighty find value in worrying about two different things. First is a pleasant mix of musical instruments. Those tend to sound very desirable, but not particularly loud or “dense.” Which tools would you use to fix that? Even if you did scream into the microphone, it’s not going to have the same impact as the same song as produced by Warner Brothers Recording.

Those are the valuable tools and effects.

Koz

Steve_Bender · February 12, 2015, 3:10pm

Thanks for explaining that. I have seen dBFS written a bunch of places, but I just kind of glossed over it.

Good to know. So this means that a kick at -1 dB should be (keeping your proviso in mind) roughly twice as loud as a kick at -11 dB, right? With this in mind, I think I’ve been constructing a lot of parts with too narrow a dynamic range. I rarely set anything up from the get go any quieter than -11. But given this new info, I think I’ll start doing that. When playing the drums, tons of sounds are easily way less than half as loud as the loudest sound, so it would

When constructing beats from scratch, fractions of a dB can actually be noticeable in a lot of situations, although up until now I’ve tended to use 1 dB as my minimum difference. But based on what I’m learning, I think I can probably use less than that (maybe .75 dB) with sounds centered up in the 2,000 – 4,000 Hz range.

If you were to record someone playing a hi hat, you’d probably have a dB range of at least 6 dB, minimum. Probably often more like 12 or 18 or more (although some of that is likely to be reduced in post production with compression, or addressed on the front end, at recording). The thing about making music with samples is that you can’t rely on the natural (accidental and intentional) variations in dynamics in a performance. So if you want dynamics anywhere near approaching natural dynamics, you have to build them in by modulating the velocity/dB level. There are plenty of people making computer/electronic music that do not bother to do this (or who use subpar programs that deliver subpar results in this regard), hence the glut of crappy techno/house out there with drums that sound completely fake. So that’s why I have interest in this topic. I’m trying to study what actual drum parts look/sound like, and develop ways to emulate that with samples and careful editing with Audacity. So far, so good – I’m just trying to take it to the next level. I’ve been doing it all by educated guess so far. So I just want to get a little more methodical and scientific with it, a little more education than guess. I just thought that different frequencies might have different necessary minimums, and based on what I’m learning, I think my intuition was probably right. Now I just have to test it, first in isolation, and then thrown in a mix.

Thanks to both of you for your help and insights.

kozikowski · February 12, 2015, 4:54pm

I think I’ve been constructing a lot of parts with too narrow a dynamic range.

What was the goal again? If it was to recreate the effect of a live performance, most recordists would just as soon live performances didn’t have quite so wide a dynamic range…

Koz

Steve_Bender · February 12, 2015, 5:28pm

Can you clarify what you mean by recordist? Do you mean the performer? Or do you mean the audio tech / sound engineer?

If it’s the latter, then I agree with you. But this is one of those situations where what’s good for the engineer/producer is not necessarily what’s good for the artist/ performer / song. Personally, as a performer/artist, I want to have command over a huge dynamic range. I want -.1 down to -32 if I can manage it and make it sound right in the mix!

But I understand that an engineer doesn’t necessarily want that, because it makes it harder for them to retain audibility across different parts of the song. Hence the tendency of most modern producers to compress the holy living he11 out of everything. But I’m trying to resist that strategy, and that is part of the reason that I am pursuing creating drum parts (and other things) with samples. It’s because I can very tightly control the dynamic range, but still HAVE a dynamic range, all without having to resort to an insane amount of compression, which I think often flattens the dynamic nuance of a performance.

Do you ever listen to really early jazz recordings from like the 40s? Have you noticed that you can rarely/barely hear the kick drum? Well, people used to assume that that was because jazz drummers back in the day used the kick very sparingly. Not so. Heavy kick work and even “four on the floor” was actually very common, believe it or not, as I’ve learned through reading some interviews with guys working back in the day. You just can’t HEAR it in the recordings because the microphones of the day simply could not deal with the explosive energy of a close-miked kick. But the pereception of jazz as being a genre light on the kick has persisted, even to the modern day, despite us no longer having the technical limitations that gave rise to the phenomenon. So I see the modern (over)use of compression in a similar way. It’s used so ubiquitously because it’s the easiest way to rein in an unwieldy dynamic range in a performance. And since it’s used so ubiquitously, people have come to just accept it as “how things are done.” But if you’re building parts from scratch from samples, you don’t have to do that. You make the dynamics exactly like you want them.

Robert_J_H · February 13, 2015, 8:14am

Steve Bender:

steve:

signal levels are frequently measure with reference to “full scale” (often written as dBFS. How this applies in Audacity is that the “full track height” is the “full scale” reference point.

Thanks for explaining that. I have seen dBFS written a bunch of places, but I just kind of glossed over it.

steve:

+10 dB in “level” is generally quoted as “doubling the loudness”

Good to know. So this means that a kick at -1 dB should be (keeping your proviso in mind) roughly twice as loud as a kick at -11 dB, right? With this in mind, I think I’ve been constructing a lot of parts with too narrow a dynamic range. I rarely set anything up from the get go any quieter than -11. But given this new info, I think I’ll start doing that. When playing the drums, tons of sounds are easily way less than half as loud as the loudest sound, so it would

kozikowski:

I can tell you didn’t have your drum set with you, because you’re worrying about fractions of dB

When constructing beats from scratch, fractions of a dB can actually be noticeable in a lot of situations, although up until now I’ve tended to use 1 dB as my minimum difference. But based on what I’m learning, I think I can probably use less than that (maybe .75 dB) with sounds centered up in the 2,000 – 4,000 Hz range.

If you were to record someone playing a hi hat, you’d probably have a dB range of at least 6 dB, minimum. Probably often more like 12 or 18 or more (although some of that is likely to be reduced in post production with compression, or addressed on the front end, at recording). The thing about making music with samples is that you can’t rely on the natural (accidental and intentional) variations in dynamics in a performance. So if you want dynamics anywhere near approaching natural dynamics, you have to build them in by modulating the velocity/dB level. There are plenty of people making computer/electronic music that do not bother to do this (or who use subpar programs that deliver subpar results in this regard), hence the glut of crappy techno/house out there with drums that sound completely fake. So that’s why I have interest in this topic. I’m trying to study what actual drum parts look/sound like, and develop ways to emulate that with samples and careful editing with Audacity. So far, so good – I’m just trying to take it to the next level. I’ve been doing it all by educated guess so far. So I just want to get a little more methodical and scientific with it, a little more education than guess. I just thought that different frequencies might have different necessary minimums, and based on what I’m learning, I think my intuition was probably right. Now I just have to test it, first in isolation, and then thrown in a mix.

Thanks to both of you for your help and insights.

An important aspect of the psychoaccoustical treatment of audio is masking.

It happens in both domains (frequency and time).

The frequency spectrum is divided into ~24 bark bands. A sound can easily masked by another sound if they exist in the same band and the loudness difference is above 6 dB.
The bands are fairly linear up to 1000 Hz (100-200, 200-300, 300-400 Hz and so on). From there on, the logarithmic scale causes the bandwidth to increase.
The mixing trick is to emphase a sound where the other sound is not very present and to attenuate it when it would be masked by the other sound (and vice-versa). This gives each instrument enough room in the mix while not wasting “energy” in places where the sound is masked anyway. It is clear, that’s all achieved with the equalizer and the underlying spectra (or the ears…).

The temporal masking is similar. A loud sound can easily mask a following quieter one if the distance is short, thus the secondary one has to be boosted (sometimes more than the initial one).

This is most interesting in stereo mixes where the “law of the first wave front” comes into play.
The first sound will establish the location of the source and the brain will interpret closely following events in the same place, even if their panning is different.

Another interesting phenomenon is the Moire-effect where the brain starts to “invent” beats that are actually not there. Chopin has a piano composition with a 3/4 and a 4/4 pattern in superposition where one does suddenly hear the missing notes (if the player is good, that is…).

The same applies to the frequency spectrum and is for instance used to generate a deeper bass on mobile devices where the speaker do not support low frequencies under 200 Hz (works also with treble).

You see, there are many things possible.

Robert

Steve_Bender · February 13, 2015, 8:00pm

I have noticed this. If I left/right separate an entire track and advance one of them forward by like 5-15 ms, all of a sudden it sounds like most of the sounds are originating in the non-advanced side! And that side also sounds a lot louder, even though they’re the exact same volume! Soooooo odd that your brain does that. It makes sense from an echolocation perspective, but I don’t understand the persistent, sustain impression that the volumes are different. That is bizarre.

Do you have a good resource for this?

I’m still trying to figure out how to go about this. Because these bark bands sound good and everything, but we both know that most sounds are a composite of a lot of different bands (unless you’re making music purely with sine tones!). So how do I determine in which bands sounds are “fighting each other?” What’s the best thing to use for that? Then once I have determined that there are two sounds in a mix that are clearly masking each other in a given band, do I just try to mix them differently, to give one prominence over the other? Or do I start attenuating and boosting? And what are the best tools for that? Notch filter?

This is related to missing fundamentals, right? If I EQ a sound to chop off the fundamental, but all or most of the overtone series is kept intact, it still sounds like the intended note. Or, like you say, if the fundamental is there, but the playback device is incapable of (or really bad at) producing that fundamental frequency, the “flavor” of the sound is still there. I didn’t really understand this phenomenon until about a month ago when I tried running bass frequencies (E2 and below) through my 8 inch guitar amp. According to the amp speaker’s stated specs, it can’t even produce notes that low. Yet it still puts out sound, and it’s still the right note, as verified through Audacity’s pitch detection (which I’m guessing must consider overtones). That threw me for a loop for a bit.