Center-panned vocal isolation

Nerd42 · May 28, 2013, 1:59am

I can’t think of any reason why the “Vocal Remover (for center panned vocals)” can’t be modified to do the opposite of what it currently does. It shouldn’t be too difficult to rewrite that plugin to use inversion cancelling to delete everything that’s center-panned, and return left and right panned sound only. A “Vocal Isolator (for center panned vocals)” or somesuch. What say you?

Robert_J_H · May 28, 2013, 8:46am

That isn’t so easy as it might appear.
The voice removal itself is limited (it produces a mono Output).
We have the two channels L and R.
The Voice Removal eliminates the Sound in the middle by subtracting R from L.
This means that all in dead Center is removed and the sides are preserved proportionally.
The exact opposite of this procedure is L + R, which is nothing but the mono Version of a Stereo track.
You’ll get this by choosing “Stereo to Mono” from the Track menu.
Here’s a code that produces both Versions (mid/mono on the left and side/center-less on the right).

(psetq l (aref s 0)
       r (aref s 1))
(vector (mult (sqrt 0.5) (sum l r)) (mult (sqrt 0.5) (diff l r))))

(Effects > Nyquist Prompt, paste and click OK)
The 08-15 procedure to remove Center panned Sound is unprecise because it discards important Information.
Let’s look at it in a numerical Fashion:
The sample values go from -1 to 1.
When both values in both channels are equal, the Sound is dead Center and L - R equals to Zero.
But -1 - -1 gives the same as 1 -1 or 0.378 - 0.378.
Clearly, the source of our Zero value is lost - we can’t tell which values were there in the beginning.
The Problem is that a wave form holds different kinds of Information. Firstly the Amplitude (the absolute value of the sample) and secondly the time coded frequency Phase.
The Panorama of the Sound changes continuously not only its left and right Position but also the spread across the Speaker base line. And worse - the width can even go beyond the true length of the base line (that’s where a mixture of positive and negative values from one channel at the same time Comes into Play).
In short, there’s much more to it than simple Addition and subtraction.
Professional Tools work with FFT’s, Cepstra, pattern recognition and so on.
I’ve been writing some code that extracts certain bands from the Stereo field (i.e. -30 to -10 %) but it always depends on how well the track is behaved.
Sometimes the result is “twisted” i.e. the band moves over the whole Stereo Image.
This is the case when additional “Stereo improvements” were applied.
If the Phase is out of sync, you’re having no Chance to produce a meaningful result.

Besides, the code above can easily be used for mid/side Manipulation. If you split the Stereo track after the execution of the code you can do a lot of impressive things with the two channels like compression, equalisation or pitch shifting (the right or side channel especially).
After joining the tracks (make Stereo), you can apply the Nyquist prompt as before and you’ll have the L/R Version again (which now might sounds more interesting).

steve · May 28, 2013, 9:44am

There is a Channel Mixer plug-in that allows full control of how the left/right channels are mixed, including inverting one channel to perform center-panned vocal removal. There is no combination that can produce center-panned vocal isolation.

kozikowski · May 28, 2013, 10:53am

What say you?

I’d say you only think that if you also thought that the show before and after vocal removal is the same show. It’s not. The show after vocal removal is mono, not stereo, so none of the arithmetic tools work. The proofs go on for pages, but the instruments playing in the second show have no convenient relationship to the instruments in the first.

This is in addition to the not so obvious problem that vocal removal fails more often that it succeeds. This is a test clip I made from a very high quality song, first after vocal removal and then the original song.

http://kozco.com/tech/audacity/clips/MysteryTrain.mp3

Simple Vocal Removal doesn’t work at all with this song and it’s fairly typical. It doesn’t matter which vocal removal technique you use. You will find the YouTube demos work perfectly if you use exactly the same song and the same quality that they used. The technique doesn’t directly transfer to all songs.

Koz

Gale_Andrews · June 3, 2013, 6:08am

Have you read what we say about Isolation in the Tutorials: http://manual.audacityteam.org/o/man/tutorial_vocal_removal_and_isolation.html#Case_3:_Vocal_Isolation ?

Gale

Nerd42 · June 13, 2013, 11:27pm

kn0ck0ut first subtracts the R input amplitudes from the L (SOP to remove common ie centre-panned material) then spectrally subtracts this result from the initial L input, leaving the centre of the stereo image on the L output.

That should be doable as a Nyquist plugin. I could be wrong, but I don’t think that noise removal procedure recommended in the Audacity tutorial gets results anywhere near as good as kn0ck0ut sometimes does for center-panned isolation. Also, the goal isn’t always to get the vocals isolated – it’s to get whatever happens to be in the center isolated.

It is true that kn0ck0ut’s results vary widely depending heavily on how the source material was mixed, but I was able to get some great results from the original CD release of Srgt. Pepper with kn0ck0ut years ago. (unnecessary now that we have The Beatles Rock Band, but it was just an example)

steve · June 14, 2013, 1:46am

It probably is possible, but FFT is not friendly to work with in Nyquist, is poorly documented, and is quite slow. If you want to try and develop a centre panned vocal isolation effect in Nyquist, please start a new topic in the Nyquist part of the forum (Nyquist - Audacity Forum) and I’ll help as far as I am able.

Robert_J_H · June 14, 2013, 4:48am

I’ve been trying to implement such a spectral voice removal/isolation tool in Nyquist for the last three weeks or so.
It takes about equally long to process a track though…However, I am really happy with the result. Especially the stereo track without the center is quite satisfactory:

As you can hear, I’ve simply switched between the original and the center-removed version.
Drums and Bass are of course also removed. A simple low cut does not work very well (it disturbs the phase/group delay).
The algorithm creates a center channel without the orthogonal phase part. Thus the isolated center is of course mono.
Since it is optimized for center-less stereo output, you want perceive a great difference to a normal mono extraction (0.5 x left + right).
The side slopes are sometimes a few dB more attenuated than the pure mono version.

Since the code is very slow, I do usually use the preview function while processing. The playback starts after about 5 s.
It is pretty cool to listen on the fly - you can simply cancel the execution if you’re not happy with the expected result.

Now, why is it so slow?
Imagine 1:40 min of audio. This means 8820000 samples that must be calculated. The FFT snatches 8192 samples. 3/4 of it is zero padding and the rest is overlapped. Thus 8 times more samples are actually calculated.
The calculations are presently such that the real time output plays smooth (on my machine).
I would not try it on a x286 though…
I am now working on the controls. The first version will include such things as frame size, window type, zero padding and so on.
Maybe we can thus find optimal settings to be hard coded in the end.

steve · June 14, 2013, 11:00am

Fantastic Robert. As you say, the results for centre removal don’t sound much different to the simple invert and add method, but it’s terrific that you got this working.
There may be a slight problem with your windowing function - I can hear a slight clicking, which on close inspection is every 4096 samples. I’m guessing that corresponds to your window overlap position.

Robert_J_H · June 14, 2013, 11:58am

Maybe it is the DC and Nyquist values that make those phase distortions. I am not quite sure how to handle those - simply setting to zero?
I’ve only used a analyse window.
But this could be split into two (square root) and a synthesize window applied after the IFFT. This should further improve possible time aliasing.
Pretty complicated the whole Fourier stuff.
I am currently trying to add a function that compresses/expands the stereo field in a non-linear fashion.
This could increase the narrowness of the centr - for isolation and the opposite for removal.

And Yes, the output is virtually identical to the M/S method - which is a good sign (energy preservation).

steve · June 14, 2013, 12:16pm

I’m impressed
I was wondering why we had not heard much from you for a while. I’m looking forward to seeing your code.

Robert_J_H · June 14, 2013, 7:32pm

The subject is quite absorbing. Hundreds of pages to read in order to aquire another little bit of useless knowledge - in the sense that it doesn’t have a bearing for the subject at hand.
The stupid thing is that all important formulas are in a graphical format - which seldom can be read correctly by a screen reader. It is all “dead reckoning” in the end.
The code is at the current stage quite a mess - or to put it more elegantly - it has still some redundances.
The audio example above had indeed a overlap of 4096 samples (without zero padding) and a simple triangular window.
There was in fact a little bug. the last sample was a 1/4096 too low. However, I doubt that this is responsible for the slight click.
I am sure that a lot of other bugs will loom up once the first plug-in version is submitted.
So, butter your sandwiches and heat up your tea/coffee water…

Robert_J_H · July 1, 2013, 3:37pm

Ok, that’s the first attempt:
Center Removal, Isolation and more
The tool is - as it is - not specialized for center isolation (including echo cancelling and such stuff), but it attenuates at least by 3 dB (near the center).
This value increases rapidly towards the sides.
You can control which portion of the stereo field should be attenuated the most y choosing “Isolate Center (inverted)” and applying it on a duplicated track.
If you change the gain of the original or the copy, the focus that cancels some part moves gradually sidewards.
Happy experimenting.

lancehall · September 29, 2014, 11:27pm

The trick to better de-mixing of Beatles tracks is that you FIRST have to fix the azimuth.

On almost all the Beatles song (remastered ones for sure) the left and right channels are not perfectly in sync. One channel is always 1 digital sample offset from the other channel, sometimes 2. Money is 6 samples offset.

You have to strip the stereo into separate left and right channels and then move one channel ahead or behind the other. In my program I play the tracks with the stereo width set to 200% (which is same as OOPsing) and when the channels are in sync the vocal will disappear. When you re-sync the channel it’ll eliminate the vocal sibilance artifacts. Then down-mix that to a new stereo mix and you’re resulting isolations will be far better.

Robert_J_H · September 30, 2014, 5:39am

That’s a rather old thread that you’ve posted in.
The post before yours has the link to the plug-in that does stereo vocal removal/isolation. It also has the option to shift a channel an arbitrary number of samples.
Some people find this feature intimidating.

In my program I play the tracks with the stereo width set to 200% (which is same as OOPsing) and when the channels are in sync the vocal will disappear. When you re-sync the channel it’ll eliminate the vocal sibilance artifacts. Then down-mix that to a new stereo mix and you’re resulting isolations will be far better. >

Could you explain that in more detail? (in the plug-in thread itself would be best)
Stereo widening of 200 % implies normally that one channel has a inverted polarity. It could also be seen as having mid in one channel and side in the other channel.
The plug-in has also a analyse feature that shows the correlation between the two channels.
I could add:

absolute stereo width (max or average)
temporal alignment (within 10 samples for instance)

I always appreciate new input.

lancehall · September 30, 2014, 10:55am

I’m just saying the one sample mis-alignment is something most people don’t know and it’s why isolations are not as good as they could be. Actually it’s probably slightly more or less than an exact sample.

I’ve been de-mixing and remixing the Beatles for 15 years and it’s when I figured that out is when I started getting good isolations. Also learning the actual EMI EQ points greatly improved my mixes.

lancehall · September 30, 2014, 11:09am

I flip on the live OOPsing (stereo width) just so I can instantly monitor the manual re-alignment I am doing. It has nothing to do with the rest. I’d rather do it manually because I know it’ll be right.

I use the Center Channel plugin.

The other manual step is using an inversion of the center extraction to subtract from the Left and Right channels instead of doing another plugin pass with the stereo sides only setting. That guarantees that the Left and Right isolations are perfect fits with the center channel information and less artifacts.