Reverse ducking

I have an existing multichannel mix with pretty big dynamic range, and a voiceover track. I’d like to mix the voiceover with the original mix and have it be understandable. Ordinarily I’d use ducking, but I’m trying to change the original mix as little as possible, so instead I’d like a sort of reverse ducking – instead of reducing the volume of the original mix when the voiceover cuts in, I’d like to increase the volume of the voiceover when the original mix gets louder. And ideally this wouldn’t be thresholded, as it is with Auto Duck, but would instead smoothly change the volume of the voiceover to match the volume of the original mix.

Is there any existing set of filters I could use for this?

As a general fuzzy process, we automate an existing manual process. You would you do it manually? Write down the steps.

I suppose I’d get a mono mix of the original mix, find its amplitude as a signal, apply a low-pass filter to that, and use the result to modulate the voiceover. Seems straightforward enough as a set of equations, but I don’t know how to do it within Audacity. I see that there’s support for arbitrary custom filters via “Nyquist Prompt”, but it also looks like that has real problems when use with long clips (mine are on the order of an hour), and I didn’t want to reinvent the wheel unnecessarily.

(something like:)


cur_mean_squared = control_samples[0]
for control, in, out in control_samples, in_samples, output_samples:
	cur_mean_squared = (1-alpha) * cur_mean_squared + alpha * control*control
	rms = sqrt(cur_mean_squared)
	out = in * rms

Here’s a quick and dirty (very dirty) hack using the Nyquist Prompt.
This code assumes mono tracks with a sample rate of 44100 Hz for both the music and vocal tracks.

Run this first on the (music) track that you want to follow:
(Memory usage will go quite high for this pass, but I think you should be good for around an hour. I’ve tested with 30 mins of audio)

;; Envelope read hack
(setf *scratch* ())
(let ((avg (snd-sqrt (snd-avg (mult s s) 1000 1000 op-average))))
  (do ((val (snd-fetch avg)(snd-fetch avg)))
      ((not val))
    (push val *scratch*)))
(print "Done")

and then run this on the (vocal) track that you want to apply the envelope to:

(let* ((ln (length *scratch*))
      (env (make-array ln)))
  (dotimes (i ln)
    (setf (aref env (- ln (1+ i)))(nth i *scratch*)))
  (mult s 2.0 (snd-from-array 0 44.1 env)))

Thanks, I’ll give that a try. It occurs to me that I could duplicate the mix and drop its sample rate to something totally unreasonable, just to conserve memory in the scratch space.

The command (in my previous post):

(snd-sqrt (snd-avg (mult s s) 1000 1000 op-average))

multiplies “S” (the sound from the audio track) by itself, (the square of each sample value).
That is then subject to the command “SND-AVG” which in this case takes the average of 1000 samples, then steps to the next 1000 samples, and so on. The result of this has a sample rate 1/1000th that of the original sound. Assuming that the original sound has a sample rate of 44100 Hz, the new sample rate is 44.1 Hz.
“SND-SQRT” then takes the square root of each sample value, so we have the RMS of the original sound with a “window” size of 1000 samples, which for 44100 Hz initial sample rate will now have a sample rate of 44.1 Hz.

If you want to follow the peaks rather than the RMS, that command could be changed to:

(snd-avg s 1000 1000 op-peak)

Note that the code is intended only as a demonstration of the idea, not as an example of good Nyquist code (quite bad code really, but hopefully easy to follow :wink:)

Okay, so you’re waaay ahead of me. Awesome. :wink:

Yeah, that code works great. Memory usage is a bit of a nailbiter near the end, but nothing crashes. I threw in a MAX to keep the voiceover from cutting out entirely when the main mix is silent, and the result is perfect. Thanks!

I thought it might be :smiley:

If this were to be made into a releasable plug-in it may be best to put in a length check at the beginning.
Also, depending on how rapidly the volume needs to respond to changes in the music level, the “envelope” sample rate could be made even lower, which would save on memory.

I realise that you may just want to get on with the job in hand (processing your recordings), but if you are interested in developing this code into an installable plug-in I’d be happy to help you do that. In the mean time, here is a better coded version (also works with stereo tracks). I’ve also provided some options at the top.

(setq action 0)         ; 0 = get envelope, 1 = apply envelope, 2 = reset
(setq mode 1)           ; 0 = RMS 1 = Peak
(setq window-size 0.1)  ; Averaging window size in seconds

;;; Make amplitude envelope from track
(defun get-env (sig mode step)
  ;; mode 0 = rms. mode 1 = peak
  ;; step = window size in seconds
  (setf *scratch* ())
  (let* ((ws (round (* step (snd-srate sig)))))
    (if (= mode 0)
        (rms sig (/ 1.0 step))
        (snd-avg sig ws ws op-peak))))

;;; maximum of left and right channels
(defun mono-max (sig)
  (s-max (aref sig 0)(aref sig 1)))

(defun pass1 ()
  ;;; calculate a mono  envelope
  (setf env (multichan-expand #'get-env s mode window-size))
  (if (arrayp env)(setf env (mono-max env)))
  ;;; Save envelope to *scratch* property list
  (putprop '*scratch* env 'saved)
  ; The sound MUST be referenced in the return value.
  (print (if (> (snd-length env ny:all) 0)

(defun pass2 ()
  ;;; Apply envelope to track
  (let ((env (get '*scratch* 'saved)))
    (if (and env (soundp env))
        (mult s env)
        "Error. Invalid envelope.")))

(case action
  (0 (pass1))
  (1 (pass2))
  (T (remprop '*scratch* 'saved)
    "Envelope deleted from memory."))