How to capture envelope from one track and apply to another

Hi,

Looking for some help / direction on creating a Nyquist plug or code in the prompt to do the following.

I have a stereo binaural beats track. I would like to mix it with various music files (2nd track), but I want to have the binaural track have an envelope that follows the music track. Part of the idea for binaural beats (according to some) is that it should be barely audible behind the music. For the raw beats track, adjusting the volume either completely hides it during loud music passages or makes it too noticeable during quiet passages.

I can achieve my goal (kinda) by manually applying the envelope tool on the beats track, eyeballing the music track. This is real tedious for a long (1 hr) music track and less than perfect.

Is there an obvious way to achieve this in Audacity? If not, any suggestions on coding this in Nyquist?

John

The magic phrase is “envelope-follower”,
e.g. the second block of Nyquist-code here … https://forum.audacityteam.org/t/import-list-of-frequencies-and-levels/16343/5

Audacity has an effect called AutoDuck which does the exact opposite of what you want.

You could add, (generate), a white-noise track, AutoDuck that noise track using the music, then AutoDuck the binaural-beats using the AutoDucked white-noise track , ( then discard the noise track ).

Unfortunately that code is a bit tricky to use because it requires the sound to be followed to be in the left channel of a stereo track, and the sound to apply the envelope to in the right channel of the stereo track. To use that code with stereo tracks (and binaural beats are stereo by definition) requires messing around splitting and rejoining the stereo tracks several times.

Here is an alternative Nyquist script that will work with mono or stereo tracks, provided the tracks are not too long (I’ve set the max selection length to 30 minutes as that tested OK). This code requires Audacity 2.1.1 or later.

;type process
;control res "Time resolution" float "seconds" 0.1 0.01 10
;control mode "Follow peak or RMS level" choice "Peak,RMS" 0

(defun mono (sig)
  (setf sig (s-abs sig))
  (if (arrayp sig)
      (mult 0.5 (sum (aref sig 0)(aref sig 1)))
      sig))

(setf step (truncate (* res *sound-srate*)))
(setf op (- 2 mode))
    
(cond
  ((> (get-duration 1) 1800)
    (format nil "Error.~%~%Selection too long.~%~
                 Reduce the selection to 30 mins maximum"))
  ((< (length (get '*selection* 'tracks)) 2)
    (format nil "Error.~%~%This effect requires at least 2 tracks to be selected.~%~
                 The amplitude envelope is copied from the first~%~
                 selected track, then applied to subsequent tracks."))
  ((< (get-duration 0.5) res)
    (format nil "Error.~%~%The 'Time Resolution' should be considerably~%~
                 shorter than the length of the selection.~%~
                 The absolute maximum allowed 'Time Resolution' is~%~
                 half of the selection length."))
  ((= (get '*track* 'index) 1)
    (setf *scratch* 
      (snd-copy  (snd-avg (mono *track*)  step step op)))
    ; return *track* to prevent audio from being released from *scratch*.
    *track*) ;*track*)
  (T
    (let ((env *scratch*)
          (offset (* res (/ (1+ mode) 2.0)))
          (initial-amp (snd-fetch (snd-copy *scratch*))))
      ; release *scratch* when we're finished
      (if (= (get '*track* 'index)(length (get '*selection* 'tracks)))
          (setf *scratch* '*unbound*))
      (mult *track*
        (sim
          (abs-env (pwlv  0 offset initial-amp))
          (at-abs offset (cue env)))))))

The track that you want to follow must be the first selected track.
The envelope detected in the first selected track is then applied to each subsequent track.

Steve

Thanks a lot! Perfect. I applied it, with the default 0.1, Peak to a 30 min sleep music file (which I mixed to mono first) and it created exactly the envelope that I wanted.

Now, I need to study your code and figure out what you did. Eventually, I’m going to learn Nyquist.

Thanks, again
John

Having done a bit of testing, I think that the code should be safe up to 200,000,000 samples (about 75 minutes at a sample rate of 44100 Hz), provided that the computer has enough available RAM.

You don’t have to mix to mono for this effect. When calculating the envelope for a stereo track, the effect analyzes the average of the two channels.

Steve,

Thank you for providing this code. I have been using it in a similar manner as dougherj, but with affirmations rather than binaural beats (it makes them just barely audible). Thank you again, it is just what I needed.

I have a question, though. Is there a variable for how much it reduces the signal on the target track? I currently do a trial and error process with normalizing the target track then applying the Nyquist code. I can’t make out in the code how it determines how much to dampen the signal on the target. I am trying to get the target track to be 15-18 db (on the VU meter) below the envelope track.

Am I making sense?

Thanks,
John (MisterHSP)

For Nyquist plug-ins, the audio from the track that is currently being processed is called track.

The effect works in two stages:

  1. The “source track” is analyzed in short blocks (“steps”), in which the peak level for each step is measured. Each of these measurements is added to a new “control signal” to produce a “sound” at a much lower sample rate than the original, that rides the peaks of the “source track”.
  2. The amplitude envelope (control signal) created in (1) is saved to a scratch-pad variable called scratch, which has a special feature of surviving in Nyquist from one track to the next.
  3. When Nyquist processes the next track (the “target track”) it applies the envelope (from scratch) to the audio from the target track (track) by multiplying (same as “amplifying”) the track by the envelope.

The envelope follower is based on the snd-avg command: Nyquist Functions

Steve,

Thanks for the reply. I studied your answer, studied the code, and tried to cross-reference everything in the Nyquist functions and documentation you linked to. A couple of times, I thought I was almost there. But the thing I am searching for still eludes me.

Then, I got further confused when I took 59 seconds of a music track, duplicated it, and applied the Nyquist code to them. The duplicated (second) track was reduced in amplitude by about 12 db in the quieter portion and by about 5 db in the louder portion.

I still can’t pinpoint what in the Nyquist functions is reducing amplitude in the second track… and what is is basing the reduction on.

Can you educate me on that?

Thanks,
John (MisterHSP)

This sets the code to run as a “process” type of plug-in (an “Effect”), and creates the two controls

;type process
;control res "Time resolution" float "seconds" 0.1 0.01 10
;control mode "Follow peak or RMS level" choice "Peak,RMS" 0

This is a function that averages two channels of a stereo sound to create a mono sound

(defun mono (sig)
  (setf sig (s-abs sig))
  (if (arrayp sig)
      (mult 0.5 (sum (aref sig 0)(aref sig 1)))
      sig))

This sets “step” to “Time resolution” in samples (“res” was in seconds)

(setf step (truncate (* res *sound-srate*)))

The “mode” control has a value of 0 for the first choice (peak) or 1 for the second choice (RMS)
The SND_AVG function requires a value of 1 (op-average) or 2 (op-peak)

(setf op (- 2 mode))

Test for some conditions (if something is true, do something)

(cond

Just basic error checking

  ((> (get-duration 1) 1800)
    (format nil "Error.~%~%Selection too long.~%~
                 Reduce the selection to 30 mins maximum"))
  ((< (length (get '*selection* 'tracks)) 2)
    (format nil "Error.~%~%This effect requires at least 2 tracks to be selected.~%~
                 The amplitude envelope is copied from the first~%~
                 selected track, then applied to subsequent tracks."))
  ((< (get-duration 0.5) res)
    (format nil "Error.~%~%The 'Time Resolution' should be considerably~%~
                 shorter than the length of the selection.~%~
                 The absolute maximum allowed 'Time Resolution' is~%~
                 half of the selection length."))

By this point, no errors have been found, and this is the first track, so grab the envelope and jot it down on a scratch-pad. Nyquist plug-in effects process one track at a time, and nearly everything is reset at the end of processing a track. The exception is “scratch” which can retain its value from one track to the next, so we temporarily put the envelope into scratch. (More about this envelope later).

  ((= (get '*track* 'index) 1)
    (setf *scratch*
      (snd-copy  (snd-avg (mono *track*)  step step op)))
    ; return *track* to prevent audio from being released from *scratch*.
    *track*) ;*track*)

“T” means “true”. This is the final part of the COND statement, so if none of the other tests have been “true”, then do this.
Basically what we are doing is amplifying (“multiplying”) the selected track (which is NOT the first track) by the envelope. Because our envelope does not quite start from the beginning of the track, the position of the envelope is padded a little at the start and shifted slightly to the right so that the peaks in the envelope will line up with peaks in the first track. Then, if this is the final selected track, we delete scratch so that it is not hanging around in RAM.

  (T
    (let ((env *scratch*)
          (offset (* res (/ (1+ mode) 2.0)))
          (initial-amp (snd-fetch (snd-copy *scratch*))))
      ; release *scratch* when we're finished
      (if (= (get '*track* 'index)(length (get '*selection* 'tracks)))
          (setf *scratch* '*unbound*))
      (mult *track*
        (sim
          (abs-env (pwlv  0 offset initial-amp))
          (at-abs offset (cue env)))))))

Creating the envelope:

    (setf *scratch*
      (snd-copy  (snd-avg (mono *track*)  step step op)))
    ; return *track* to prevent audio from being released from *scratch*.
    *track*) ;*track*)

Looks like I corrected an error here. Anything after a semi-colon is treated as a “comment” and is ignored by Nyquist, so ignore that final ;track)

    (setf *scratch*
      (snd-copy  (snd-avg (mono *track*)  step step op)))
    ; return *track* to prevent audio from being released from *scratch*.
    *track*)

We are setting scratch to have the value of (snd-copy (snd-avg (mono track) step step op)))
In Nyquist, a “value” may be a number, or a character, or a “string” (text), or a “sound” or one of several other data types. In this case, the value of scratch will be a sound (though one with a very low sample rate).

scratch should retain its value, but due to memory management in Nyquist, if scratch is given a “sound” value, we have to reference that sound when Nyquist returns at the end of the track, otherwise the sound is deleted. With other data types you don’t need to do that.

We use SND-COPY to prevent the track from being damaged by SND-AVG.

If, for example, “op” = 2 (the same as “op-peak”), then (snd-avg (mono track) step step op) first converts track (the track audio) to mono, then finds the peak value of the first “step” samples, saves that to scratch, the moves forward by “step” samples and finds the peak level of the next “step” samples, and so on for the rest of the track. scratch thus ends up as a “sound” with one sample value for each “step” samples of the original track. In other words, it follows the peaks of the original track.

When we multiply the second track by scratch, because scratch has a very low sample rate, it is still the same length as the original selection, even though it has far less samples. Where the sound of scratch has a value of 1 (0 dB), the second track is amplified (multiplied) by 1, so clearly it remains at its original level. Where scratch has a value of 0.5, the second track is amplified by 0.5 (half its original level). Thus the second track as amplified by an amount that follows the contour of the first track.

Does that help?

It helps. It confirms that I had pretty well puzzled out the functions and results they returned. Thank you.

What has me still puzzled is that when the original selection has a value of 1 (0 dB), the second track is not remaining at it’s original level. It is being reduced a bit (approximately 1/3).

In my experiment, I even rendered the original selection from stereo to mono (to eliminate the variances between channels), then duplicated it. The second (duplicated) track was then set to follow the first. My expectation was that they would be the same, still (or at least roughly the same, since the sample rate for the control envelope is much smaller.) What I get is a track that follows the first track but at about .7 amplitude of the original for the passages that are at or near 0 dB. But, it is even lower amplitude on the quieter passages (maybe 2/3).

Is this just a function of the functions or is there a hidden variable in the functions being called?

John (MisterHSP)

Are you using RMS rather than Peak?

No. I am using the defaults across the board. Peak, .1, etc.

John (MisterHSP)

Try this:

  1. In a new Audacity project, generate 10 seconds of sine tone with an amplitude of 1.0.
  2. Duplicate the track (Ctrl+D)
  3. Apply the Fade-In effect to the first 4 seconds (approx) of the first track.
  4. Apply the Fade-Out effect to last first 4 seconds of the first track.
  5. Select both tracks and apply the effect.

Do you get something like this:
tracks000.png

Yes. It looks just like that.

However, if I apply the effect again, I get a curving fade in and fade out on the second track, not a straight line.

As you can see in that example, where the peak amplitude in the first track is 1.0 (0 dB), there is no attenuation of the second track. Where the amplitude of the first track is less than 1.0, the second track is attenuated by a proportion equivalent to the peak amplitude of the first track.

Yes, that’s correct.
Try applying a (linear) Fade-in effect multiple times to the same audio selection and you will see the same thing happen.

Okay. I still don’t understand why the music is reduced in the second track when it is duplicated from the first (and the first is normalized to zero before duplication). They are identical tracks before applying the effect.

It doesn’t seem to operate the same on the music as it does on a solid sound slice.

Didn’t I read something about "loud* getting set/changed in one of the functions? Could something related to that be at play here?

John (MisterHSP)

I think I can see where you’re getting confused.
I think that you are expecting that the contour of track 2 will be the same as the contour of track 1 after processing. IF track 2 is a solid block of sound, then that will be the case, but if track 2 has a varying level before processing, then it will not have the same contour as track 1 - in some cases it is impossible for it to have the same contour as track 1. Let me try and explain.

“Amplification” is another name for “multiplication” - there is no difference between the two except that when talking about numbers we generally use the term “multiplication” and when talking about sound we generally use the term “amplification”. Other than that, the two terms refer to exactly the same operation. For example, if we “amplify” a sound by a factor of 2 (equivalent to +6 dB), then the amplitude at each point in the sound is doubled, that is, each audio sample value is multiplied by 2. Similarly if we “amplify” by -6 dB (a factor of 0.5), then the amplitude at each point is halved (multiplied by 0.5).

Talking about the “peak” option, as that’s the simpler one to explain:

When we create the “amplitude envelope”, we create a waveform that goes up and down, riding the peaks of track 1. For example, if track 1 is a 10 second tone that starts at amplitude 1.0 and fades out linearly (in a straight line) from 1.0 down to zero, then we can easily calculate the amplitude at each 1 second interval as:

Seconds   Amplitude
  0          1.0
  1          0.9
  2          0.8
  3          0.7
  4          0.6
  5          0.5
  6          0.4
  7          0.3
  8          0.2
  9          0.1
  10         0.0

So when we make our amplitude envelope, it will do the same.

Now let’s say that track 2 is a solid block of sound (a generated tone) with a constant amplitude of 0.8. Then we can calculate what the amplitude at each 1 second interval should be as:

Seconds   Envelope    track 2   New level
  0        1.0          0.8     1.0 x 0.8 = 0.80
  1        0.9          0.8     0.9 x 0.8 = 0.72
  2        0.8          0.8     0.8 x 0.8 = 0.64
  3        0.7          0.8     0.7 x 0.8 = 0.56
  4        0.6          0.8     0.6 x 0.8 = 0.48
  5        0.5          0.8     0.5 x 0.8 = 0.40
  6        0.4          0.8     0.4 x 0.8 = 0.32
  7        0.3          0.8     0.3 x 0.8 = 0.24
  8        0.2          0.8     0.2 x 0.8 = 0.16
  9        0.1          0.8     0.1 x 0.8 = 0.08
  10       0.0          0.8     0.0 x 0.8 = 0.00

If we do this experiment, we see that the predicted results are pretty close:
tracks000.png
Now lets say that rather that having a constant level tone in track 2, we have a tone that fades in from 0 to 1.0.
Again we can calculate the expected result as:

Seconds   Envelope    track 2   New level
  0        1.0          0.0     1.0 x 0.0 = 0.00
  1        0.9          0.1     0.9 x 0.1 = 0.09
  2        0.8          0.2     0.8 x 0.2 = 0.16
  3        0.7          0.3     0.7 x 0.3 = 0.21
  4        0.6          0.4     0.6 x 0.4 = 0.24
  5        0.5          0.5     0.5 x 0.5 = 0.25
  6        0.4          0.6     0.4 x 0.6 = 0.24
  7        0.3          0.7     0.3 x 0.7 = 0.21
  8        0.2          0.8     0.2 x 0.8 = 0.16
  9        0.1          0.9     0.1 x 0.9 = 0.09
  10       0.0          1.0     0.0 x 1.0 = 0.00

and again we can test our prediction:
tracks001.png
So now we can see that the only way to make the second track have the same shape as the first track, is if the second track has a constant amplitude before we apply the Nyquist code. To some extent we can do this by applying a dynamic compression effect with a very strong setting

What if we start with tracks like this:
tracks002.png
A couple of applications of the Audacity “Compressor” effect with these setting can even out the level in track 2 quite substantially:
window-Compressor-000.png
However, there is no amount of processing that we can do to make the middle section anything other than silence:
tracks003.png
Then the results of our Nyquist script comes out like this:
tracks004.png

Thank you for going to all the effort to educate me on this. I see what you are saying here. I can see that I may have to just work with the results it gives.

Thanks for staying with me on this and furthering my education.

Warm regards,
John (MisterHSP)