Stereo tracks in Nyquist

The issue of rms for a stereo track came up recently in another forum topic, so I thought it would be worthwhile to start a topic that looked at the more general issue of handling stereo tracks in Nyquist plug-ins.


Background.

For process and analyze plug-ins, audio is passed from audio tracks to Audacity as the value of “S”.
“S” is a global variable.
For mono tracks the value of “S” is a sound object.
For stereo tracks the value of “S” is an array with two elements. Each element is a sound object.

Stereo tracks:
The first element of “S” may be accessed with (aref s 0) and this is the sound from the left channel.
The second element, (aref s 1) is the right channel.

To return a stereo track to audacity, an array with two elements, where each element is a sound needs to be returned.
example:

; create an array with noise as the first 
; element and a 100 Hz tone as the second
(vector
   (noise)
   (hzosc 100))

Calculating rms for mono/stereo sounds.

Nyquist includes a useful function for computing rms signals.

(rms sound [rate window-size])

http://www.cs.cmu.edu/~rbd/doc/nyquist/part8.html#index654

However the rms function only handles mono sounds.

For stereo sounds we could mix the sounds before calculating the rms, but this would give an incorrect result if there are out of phase components between the two channels. For example, if the two channels are exactly 180 degrees out of phase (inverted) then the mixed, or averaged value is zero, but clearly we can listen to out of phase stereo and is not silent (unless mixed to mono).


Another approach is to simply add the rms values for each channel.
This is commonly done for marketing stereo music systems. This method is simple to calculate and simple to understand ( 2 x 20 w rms = 40 w rms ???) but unfortunately it is a bit misleading. If the two channels are exactly in phase at the listening point then the heard signal will be twice a “big” as one channel alone, but for stereo music the two channels are rarely exactly in phase, so there will be some “phase cancellation”.


What we really need to do is to take an average, but we know that averaging the signals (sum and divide by 2) does not work.
Two options left are:

  1. The average of the rms values
  2. Calculate the square of all samples, then find the average of that before calculating the square root.

Are these two options the same? No they’re not.
To be continued…

They are not, but with sound signals, recorded by microphones, reality looks like this:

[1] The phase argument is with microphone recordings in so far neglectable as the RMS value is usually computed to produce a very low-frequency envelope. The standard frequency argument of the Nyquist RMS function is e.g. 100Hz and to produce a 180 degree phase-shift at 100Hz the stereo channels must be recorded with two microphones that are placed a distance of minimum 1.72 meters or 5.5 feet apart:

344 meters per second = average sound velocity in plain air
100 Hertz = 100 periods per second
period / 2 = 180 degree phase shift

344 / 100 / 2 = 1.72 meters = 5.5 feet

Okay, this is not an unrealistic microphone distance in symphony orchestra recordings. But people wo do such recordings know that such a microphone arrangement produces “swampy” bass signals and therefore frequencies below approx. 300Hz are usually either recorded with an extra mono microphone or alternatively the bass signals of the microphones are mixed to mono in the microphone recording mixer anyway to “ground” the bass fundament.

[2] In a 100Hz envelope the phase relation of frequencies above 200Hz in the original signal have no significant influence to the summed RMS values of both channels, where a microphone recording with heavy phase problems sounds so bad that everybody would instantly throw it away instead of trying to compute an RMS envelope out of it.

[3] In practice the envelopes are computed with maximum 10Hz (usually even lower) to prevent interpolations with the audio bass signals, what means that the microphones must be minimum 8.5 meters or 28 feet apart to produce a 180 degree phase difference and frequencies in the audible range have no significant influence to the summed RMS values of the stereo channels anymore.

This means that the easiest way to compute a 10Hz RMS envelope from a stereo microphone recording signal is just simply SUM the RMS values of both channels and then divide the result by 2, where (mult 0.5 x) has the same result as (/ x 2):

(mult 0.5 (sum (rms (aref s 0) 10)
               (rms (aref s 1) 10)))

I do not want to say that this is the mathematically over-correct method to compute the mono RMS envelope of a stereo signal, but in an 10Hz envelope phase differences above 20Hz do not really matter in practice. There may be edge-cases like bass compressors, but the main perception range of the human hearing is approx 100Hz to 10kHz, and a 100Hz stereo phase difference does not matter in a 10Hz envelope.

Yes, two exactly 180 degree phase-shifted mono signals produce silence, but I never have found such signals in natural sounds, that usually contain a mix of many frequencies with many different phase relations and more likely behave like equally distributed noise. Also a simple delay (like produced by far away placed microphones) does not produce a constant phase-shift over the complete frequency range from 20Hz to 20kHz, instead it produces a is frequency dependent pase-shift like a comb-filter.

Everybody who writes a filter that produces a constant 180-degree phase-shift with all frequencies from 20Hz up to 20kHz wins a big cake!

  • edgar

I agree that the phase argument is negligible for (most) real world microphone recordings.
(I only say “most” so as to hedge my bet in case someone comes up with a scenario that contradicts the statement, but no such case comes to my mind :wink:).


:smiley:


Yes that is easiest, and probably sufficient in many cases, but I suspect that it depends on what you are doing.
The point is that it will often give a different result to the rms of all samples (root of mean of all squares), so for any analysis purposes if average rms value is given then it should be specified that it is the average rms and not the rms of all samples.

As you wrote, the average rms may be calculated with:

(mult 0.5 (sum (rms (aref s 0) 10)
               (rms (aref s 1) 10)))

Calculating the rms of all values for stereo tracks is a little more complex but may be done by either of these snippets:

(snd-sqrt
  (mult 0.5 (sum
    (snd-avg (mult (aref s 0)(aref s 0)) step block op-average)
    (snd-avg (mult (aref s 1)(aref s 1)) step block op-average))))



(snd-sqrt (mult 0.5
  (snd-avg (sum
    (mult (aref s 1)(aref s 1))
    (mult (aref s 0)(aref s 0)))
    step block op-average)))

When applied to a “normal” stereo recording then the difference will be pretty small, but what about (for example) a signal that pans from hard right to hard left? (as may occur in film music, electronic music, sound effects, electronic measurement, or other real world situations). The difference signal that is panned hard to one side can easily be a couple of dB between the two methods.

To directly compare the two methods with a 10 Hz rms envelope:

(mult 0.5 (sum (rms (aref s 0) 10)
               (rms (aref s 1) 10)))



(setq win (round (/ *sound-srate* 10)))
(snd-sqrt
  (mult 0.5 (sum
    (snd-avg (mult (aref s 0)(aref s 0)) win win op-average)
    (snd-avg (mult (aref s 1)(aref s 1)) win win op-average))))

You are aware that (mult (aref s 0) (aref s 1) …) aliases? I still haven’t tested in detail, but I’m afraid that this makes the RMS difference.

But I also still haven’t understood the Nyquist RMS function in full detail to tell the truth:

(defun rms (s &optional (rate 100.0) window-size)
  (let (rslt step-size)
    (cond ((not (eq (type-of s) 'SOUND))
           (break "in RMS, first parameter must be a monophonic SOUND")))
    (setf step-size (round (/ (snd-srate s) rate)))
    (cond ((null window-size)
           (setf window-size step-size)))
    (setf s (prod s s))
    (setf result (snd-avg s window-size step-size OP-AVERAGE))
    ;; compute square root of average
    (s-exp (scale 0.5 (s-log result)))))

Roger first declares a local RSLT variable with LET, but then in the code he uses a RESULT variable. This is obviously a typo.

Some investigations later:

(setf s (prod s s))                 ==  (setf s (mult s s))
(s-exp (scale 0.5 (s-log result)))  ==  (snd-sqrt result)

But I still think that (mult s s) aliases.

I don’t see how that explains the difference.
For example, create a stereo track with a full scale 100 Hz tone in the left channel and silence in the right channel.
Am I right in thinking that no aliasing will occur?

(mult 0.5 (sum (rms (aref s 0) 10)
               (rms (aref s 1) 10)))

produces a constant -9 dB output.

(setq win (round (/ *sound-srate* 10)))
(snd-sqrt
  (mult 0.5 (sum
    (snd-avg (mult (aref s 0)(aref s 0)) win win op-average)
    (snd-avg (mult (aref s 1)(aref s 1)) win win op-average))))

produces a constant -6 dB output.


Looks like a typo to me.

Okay here some code to try. For the math see Wikipedia Root mean square > Definition

rms = (snd-sqrt (snd-avg (mult s s) win win op-average))

Would mean for mono sounds:

(defun rms-mono (snd rate)
  (let ((win (round (/ *sound-srate* rate))))
    (snd-sqrt (snd-avg (mult snd snd) win win op-average))))

For stereo sounds:

(defun rms-stereo (snd rate)
  (let ((win (round (/ *sound-srate* rate))))
    (snd-sqrt (mult 0.5 (sum (snd-avg (mult (aref s 0) (aref s 0))
                                      win win op-average)
                             (snd-avg (mult (aref s 1) (aref s 1))
                                      win win op-average))))

For stereo sounds using a DOTIMES loop:

(defun rms-stereo (snd rate)
  (let ((win (round (/ *sound-srate* rate)))
        (average-snd (s-rest)))
    (dotimes (ch 2)
      (setf average-snd
            (sum average-snd
                 (snd-avg (mult (aref snd ch) (aref snd ch))
                          win win op-average))))
    (snd-sqrt (mult 0.5 average-snd))))

For arbitrary mono or multichannel sounds:

(defun rms-multi (snd rate)
  (let ((win (round (/ *sound-srate* rate))))
    (if (arrayp s)
        ;; multichannel sound
        (let ((channels (length s))
              (average-snd (s-rest)))
          (dotimes (ch channels)
            (setf average-snd
                  (sum average-snd
                       (snd-avg (mult (aref snd ch) (aref snd ch))
                                win win op-average))))
          (snd-sqrt (mult (/ 1.0 channels) average-snd)))
        ;; mono sound
        (snd-sqrt (snd-avg (mult snd snd) win win op-average)))))

I still do not understand why this apparently works. Is this real true? Can anybody find errors?

Yes, but that no aliasing occurs in one very special case doesn’t mean that it never aliases. The possible signal combinations where a signal multiplication aliases are a myriad higher than the few special cases wher no aliasing happens. For example, if I multiply no signal with no signal at all then this will surely not alias either. But does this prove anything?

The more important discovery is that if I take apart your code, Roger’s code, and the Wikipedia math than they all contain a sqared signal component and I still do not understand why this does not alias. Obviously it’s me who’s wrong and not all others.

I also appears to me as if with the code from above I still do not get resonable results with multichannel square wave signals. But I first must read further math explanations. :frowning: Please try the code from above, in particular RMS-MULTI, and improve or correct it if you find something wrong with it.

  • edgar

This is my take on the issue:

In most cases it would alias if we were at all concerned about the frequency content, but we’re not.

Thinking about it on a sample-by-sample basis it is not a problem.
If we used (snd-fetch) to grab each sample value, then calculated the square of each value and stored it in an array, you would probably have no problem with that. It’s just numbers so no problem. The problem arises if we then try to convert that array back into a sound, but fortunately we are not doing that.

What we do with those numbers is to add up a series of them and divide by the number of samples in the series. We are still just dealing with numbers so the question of aliasing has not arisen. We now have the mean square of all of the sample values in the “window”.

Next we calculate the square root of each of those averages - they are all numbers between 0 and 1 so no problem here.

We then construct a new waveform using each of the resulting numbers as a sample value. The only frequency value at play now is the sample frequency of our new sound and there has been no question of aliasing problems because we were dealing with numbers and not frequencies.

Other than a couple of typos I think that is all correct.

In the first stereo example you’ve written (aref s 0) where it should be (aref s 1) and there’s a couple of parentheses missing from the end.
[update: I see you’ve corrected the (aref s 0) though there’s still a couple of parentheses missing]

The loop versions are a little peculiar because (s-rest) forces the output to sound-srate

Alternative code for RMS-MULTI

(defun rms-multi (snd &optional (rate 100))
  (if (not (arrayp snd))(setf snd (vector snd)))
  (let* ((win (round (/ *sound-srate* rate)))
         (channels (length snd))
         (avg-snd (force-srate rate (s-rest))))
    (dotimes (ch channels)
      (let ((sqs (mult (aref snd ch) (aref snd ch))))
        (setf avg-snd
              (sum avg-snd
                   (snd-avg sqs win win op-average)))))
    (snd-sqrt (mult (/ 1.0 channels) avg-snd))))

As with (rms) the result has a sample rate of rate. The default value of rate is 100 Hz.