Stereo Vocal Remover

Hi guys, the Vocal Remover only gives mono results so I came up with a fairly simple way to make it stereo. (Conversation began in this thread.)

I’m surprised that no one else seems to have used this technique before, so I will name it “Subtractive Center Removal” or “SCR”. The technique gives slightly noisy results (no idea why that is?), but it does allow you to perfectly isolate what is exclusive to the left, right and center channels, which apparently was not possible before.

Here is the procedure:

  1. Make 4 copies of the waveform (these will become L+, R+, L- and R-)
  2. Invert 3 and 4
  3. Select all 4 > Nyquist Prompt “(s-max s 0)”
  4. Split all 4 from stereo to mono
  5. Of the 8 mono tracks, select 2, 3, 6 and 7, and invert
  6. Mix and render each pair, to get 4 mono tracks
  7. Select all 4 > Nyquist Prompt “(s-max s 0)”
  8. Invert 3 and 4
  9. Move track 3 up to above track 2
  10. Mix and render 1+2 and 3+4, to get two mono tracks
  11. Select both and “make stereo track”

Perhaps more useful is the fact that SCR can be used to greatly enhance the basic “vocal remover” mono—If you add this (at a slightly lower level), it just broadens out the sound and makes it far more pleasing than the standard mono. If it’s too noisy, it can also be reverbed (eg. room size=0, reverberance=0, wet only).

My request
I really would like to see “Subtractive Center Removal” added to the Effects menu, and possibly also “Subtractive Center Isolation” (which is just an inverted SCR added to the original sample, to isolate the center channel). They’re simple algorithms, yet long-winded to do manually.

I would also like to request that the following options to be added to the Vocal Remover function:

• A slider for “SCR stereo” (none <—> loud)
• A slider for “SCR reverb” (none <—> 100% wet)

I believe this will give a much more pleasing result for removing vocals.

And if anyone can work out why the results are sometimes noisy/distorted, I would really like to know!

Did you read https://forum.audacityteam.org/t/removing-vocals-only-and-not-other-instruments/32944/1 and try the “2D stereo toolkit” and example files in that post?


Gale

Unfortunately (but as predicted) the procedure does not work :frowning:

The attached audio sample has a 3000 Hz tone in the left channel, a 440 Hz tone in the right channel, and a 440 to 1320 Hz “Chirp” panned centre.

You describe the purpose of the exercise is to “perfectly isolate what is exclusive to the left, right and center channels”.
What I actually get is a mess:

Here is Steve’s test file in direct comparison.

Original → Your proposal → 2DST Center removed → 2DST Center isolated

*2DST means of course my plug-in (2D Stereo Toolkit)

Your code does actually only such samples handle correctly that ar positive in both channels at the same time.
Negative ones (and combinations) are scattered all over the place.

 L      R      L-wo-c   R-wo-c
 1      1        0        0
 1      0.5      0.5      0
 1      0        1        0
 1     -0.5      1.5     -0.5
 1     -1        2       -1
 0.5    1        0        0.5
 0.5    0.5      0        0
 0.5    0        0.5      0
 0.5   -0.5      1       -0.5
 0.5   -1        1.5     -1
 0      1        0        1
 0      0.5      0        0.5
 0      0        0        0
 0     -0.5      0.5     -0.5
 0     -1        1       -1
-0.5    1       -0.5      1.5
-0.5    0.5     -0.5      1
-0.5    0       -0.5      0.5
-0.5   -0.5      0        0
-0.5   -1        0.5     -0.5
-1      1       -1        2
-1      0.5     -1        1.5
-1      0       -1        1
-1     -0.5     -0.5      0.5
-1     -1        0        0

(only valid if I’ve done the step-by-step correctly…)
I’m very sorry.

By the way, Steve, does my plug-in now run faster with the increased speed on Linux?
(Bug “of-which-I-haven’t-the number”)
Just in case:
https://www.dropbox.com/s/tkonxx1njg1lzcu/rjh-stereo-tool.ny?dl=0

On my machine it runs at roughly the same speed as “Sliding Time Scale / Pitch Shift”.
I’ve got a lot of other stuff running at the moment (I’m in the middle of building the latest source code) but processing a 3 minute track takes just over 1 minute.

With the new version linked too?
That’s very strange.
It takes 22 s for a 3 min song on my system, whereas the sliding/scale shift effect takes over 4 min (!!!)
But I didn’t want to disturb you (I guess you’re trying Paul’s spectrum selection features, aren’t you)
Thanks anyway

My apologies. I can see that it doesn’t work perfectly, but it does work partially.

What my technique does do
If I am not mistaken, my technique does completely remove the center, whilst maintaining stereo separation, removing a lot of the L from R (and vice versa). This means you can at least inspect what is playing in the left and right (which is what I wanted it for, and I could find no other way to do).

Limitations
However, what seems to happen is that some waveforms on the left do bleed into the right, and vice versa. This happens more when there’s more playing in the center. It’s a subtractive technique which pulls the unwanted waves down below 0 and chops them off. If there’s nothing playing in the center, the unwanted waves are successfully pulled down below 0 and eliminated. But if there are loud waveforms in the center, then the unwanted waves still protrude above the center (in addition to which you lose one half of them, making them sound distorted).

Conclusion
Sorry this is not what everyone wanted. But for me it did what I wanted, which was to inspect what is playing in a track’s left and right channels. I also found (as I said) that it gives a pleasing sense of stereo separation when mixed (at a low volume) to “Vocal Remover”.

To make it easier to test, here is the entire procedure in one Nyquist script that can be run from the Nyquist Prompt (not optimised, just a direct “translation” of the procedure into code):

(let ((s1 (s-max s 0))
      (s2 (s-max s 0))
      (s3 (s-max (mult -1 s) 0))
      (s4 (s-max (mult -1 s) 0)))
  (setf s1 (diff (aref s1 0)(aref s1 1)))
  (setf s2 (diff (aref s2 1)(aref s2 0)))
  (setf s3 (diff (aref s3 0)(aref s3 1)))
  (setf s4 (diff (aref s4 1)(aref s4 0)))
  (vector
    (diff (s-max s1 0)(s-max s3 0))
    (diff (s-max s2 0)(s-max s4 0))))

Sorry but I think that is an overly optimistic assessment.

Taking another example:
I have a 440 Hz sine wave in the left channel and a 700 Hz tone in the right channel. There is no audio common to both left and right channels.
firsttrack000.png
If this procedure successfully allows one to “inspect what is playing in a track’s left and right channels”, then we would expect the audio to be largely unchanged (because there is no “center audio”).
What we actually get is:
firsttrack001.png

Yes I’m using the version that you posted the link for.
No I wasn’t trying Paul’s spectral selection feature, I was testing building with the new version of wxWidgets - I’m testing Paul’s spectral selection now :wink:

Now that my computer is not under such a heavy load, processing time has gone down to around 40 seconds for a 3 minute track, but note that this is with a debug build, which is (usually) slower than a normal release build, so processing time seems to be very reasonable. By way of comparison, the standard Vocal Removal effect takes about 10 seconds to process a 3 minute track.

Is there any ideas of my Stereo Vocal Remover about 2D Stereo Toolkit for Tutorial Video?

There are no video tutorials for Robert’s Stereo Vocal Remover or 2D Stereo Toolkit that I’m aware of.