Extracting Vocals Using Instrumental (AGAIN!!!)

I’m sure this had been on here a million times, but I just can’t figure it out…

I have a regular full “band” version of a song and an instrumental version. I want to ultimately extract the vocal and make a acapella version.

This is what I have figured out that I need to do, so far:

Load both versions in
Line them the “waveforms”
Invert the instrumental

Now from what I have found on the net, this should give me my result. The parts that were exact (the music) should cancel each other out and leave the vocal as the difference. Now is it just me, or does that just seem too simple? Do I have some setting wrong??

I know that the tracks must be EXACTLY the same (and to be honest - are there any that are??) and lined up EXACTLY, but am I missing something? I just get the same original sound over and over.

I even took one of my daughters sing along CD’s and used those songs, to try something easy… Well it wasn’t.

If there are any experts who can help a novice like me that would be great. I have a bunch of ideas that I would love to try but just get nowhere.

Please help as I am getting too frustrated to keep trying!!

Thank you in advance.

OK - With a few more tries, I actually got close to what I wanted, and it was that easy!!

I guess my main problem is that I need to have a “cleaner” more EXACT instrumental version to work with, and that is a problem. Is there any way to improve the track, via amplifying it or something to get more “peaks and valleys” that are closer to the original version that I am trying to work with?

I know that I am sounding like a novice here (I am), but I really want to understand waveforms and all the intricacies of sound, and it appears that a lot of you out here really know your stuff. I just want to tap into that knowledge, and gain a better understanding of what can truly be done with a program like Audacity.

Thanks for all your patience.

I can explain why this seems so easy when the two tracks are identical except for the voice.

You start with 2 tracks, (instrumental) and (instrumental + voice).

Inverting a track is the same as adding a minus sign to it. So when you invert and mix these two tracks together, you get this equation:

-(instrumental) + (instrumental + voice) = voice.

As long as the two instrumental signals are exactly equal, they’ll exactly cancel out and you’ll be left with only voice. The theory really is that simple. The reality is that the only time you’ll encounter two signals that are exactly the same is if they’re from the same source material. If your two recordings aren’t from the same performance, you’ll never get anywhere. I’ll try to explain why after I answer your question.

I guess my main problem is that I need to have a “cleaner” more EXACT instrumental version to work with, and that is a problem. Is there any way to improve the track, via amplifying it or something to get more “peaks and valleys” that are closer to the original version that I am trying to work with?

Almost certainly not, with one exception that I’ll explain at the bottom. Even if the two instrument tracks were pulled from the same performance, they might be mixed differently. And if they’re mixed differently, you’ll get the difference between the two tracks when you subtract one from the other. It might be technically possible (though unlikely) to “backtrack” with one of the clips so that they line up more perfectly, but without know what the mixing engineer was doing, there’s no way you can find a process that will reverse all the differences. And even if you did know what the engineer did, it will still probably be impossible because many multi-track processes are not reversible once you’ve mixed a signal down to 2-track stereo (such as compression, panning, individual volumes, modulation effects, reverb, etc).

Conceptually, if you have two tracks that are identical, then when you subtract them, the difference is zero at all points (along the time axis), so the final product will be zero. The more differences there are between the two signals, the more substantial the difference will be at all points, so the final product will have audible sounds in it. The reason even “small” differences between signals won’t cancel out is because the tolerances are so tight. A common digital signal has 44,100 data points every second. These all need to be identical in order to cancel two signals out by subtraction, so each note of the performance needs to be within 22.6 microseconds. Even if you were to build a robot band that could play acoustic instruments within this time tolerance, there would still be enough differences between two performances to make subtraction impossible. Acoustic instruments don’t ever respond identically to the same input in the real world due to the properties of the materials involved and acoustic interactions within the environment.

So basically, if you take one instrumental mix and add vocals to it, you’ll be able to use the subtraction method. But if you take one performance and mix them independently, you won’t be able to subtract one from the other very well. This is probably what’s happening to you.

On the other hand, if the only difference between the two instrument tracks is a slight volume difference, then it should be possible to get them to exactly cancel out if you can find out what the exact volume difference is. It’s possible that the track with the voice in it was turned down a tiny bit to make room for the voice. That would change the original equation to something like this:

-(instrumental) + (.95 * instrumental + voice) = (-.05 * instrumental) + voice

If that’s the case, then you’ll hear a quiet version of the instrumental track mixed with the full volume voice track. But if the difference is more than just volume, then you’ll either get a mild filtering effect or a mild echo effect.

Someone on another part of the forum came up with this:



That is a very nice find, and thanks must go to greenie for sharing it with us :slight_smile:
The demo’s on their website are very impressive, (especially the pro version, although the standard lower quality version is attractively priced at 90 cents). http://www.elevayta.com/

As far as I can tell it seems to work like a very fancy “noise reduction” type plug-in. For Audacity users it’s a shame that it is VST.

I tried the demo of ExtraBoy Pro for a few days but got pretty poor results. Maybe I didn’t spend enough time with it, but I suspect that the examples they use on their website are very selective - as I recall it was a jazz combo of bass, piano, drums etc where each instrument was very distinctive in its frequency range and stereo placement.

I imagine that they are selective, but I’d also imagine that it would take a lot of tweaking to get good results.
This is the first vocal isolation effect that I’ve seen, so it is of interest just for that :slight_smile:

<<<This is the first vocal isolation effect that I’ve seen, so it is of interest just for that>>>

Exactly that. It used to be you just couldn’t do this. Now it’s maybe possible a little bit.

Let me count the number of people who bought the Sound Soap noise reduction package and tried to use it based on the stunning demo. I’ve never actually experienced the demo, but I can tell you they carefully tuned the scene to simulate miracles.

What they actually did was doom many productions to failure. “Don’t worry about the microphone, we can clean anything up in Sound Soap!®”

They couldn’t.


It is true that it is not fabulous, which can get any instrument out of a rock band, but it is still useful option that we did not use to have. Also even when things are well separated it takes a fair amount of work. Just have to pay attention carefully to the piece. Most of the time you will be beaten, but non the less sometimes it can be done.


I know this topic is a bit old but I wanted to ask–

I successfully removed the vocals from a track using the method outlined here: http://audacityteam.org/wiki/index.php?title=Vocal_Removal (vocals in middle). Then I tried to combine this new instrumental track with the original song to isolate the vocal, but so far, it hasn’t worked.

Shouldn’t the instrumental track and the original song be exactly the same and perfectly aligned already? I used the original song file to derive the instrumental track, so it seems to me like they should be equivalent . . . if they are, why is this not working?

An idea: do the instrumental track and original song need to be in the same file format for this method to work?

I’m working on Mac OSX so I can’t use VoiceTrap, and this is a budget operation :slight_smile: so I can’t buy that program to run windows on your mac or anything fancy like that.

I am completely new to audio manipulation, so I please forgive my ignorance.

Oh, in case you’re curious, I’m trying to isolate the vocal on R.E.M.'s “Horse to Water” from Accelerate.

thanks for any help–


No, it won’t work.

How centre pan remover works:
You have two audio channels, one is the left channel, and one is the right channel.
Anything that is panned dead centre will be identical on both left and right channels.
By turning one channel “upside down” (inverting it) and adding it (mixing it) to the other channel, the result will be the difference between the two channels.
The difference between two identical signals (anything that is dead centre of the stereo mix) is nothing (silence).
What you are left with is a mono track that is made up of the sound from one side of the stereo field, and the inverted sound from the other side of the stereo field.

If you then add in the original track, you will reinforce one side, reduce the other side, and re-introduce sounds that were in the centre.

Looking at this as a simple equation;
If “A” is the sounds on the left, “B” is the sounds on the right, and “C” is the sounds that are central -
Then the left channel of your original stereo track is A + C/2
the right channel is B + C/2

If we invert the right channel and add it to the left channel we get (A + C/2) + (-B -C/2) = A - B
(notice that C/2 - C/2 cancels out the centre pan sound).

Now if we add in one channel of the original sound, be it “A + C/2” or “B + C/2”, we are re-introducing “C/2”.

There is no way to manipulate the tracks to cancel out both “A” and “B” to leave “C”. This is why the plug-ins mentioned previously (Extra Boy and Voice Trap) resort to using complex digital processing in the attempt to produce this effect.

I read your post’s about taking the vocals from the beat by aligning the instrumental and regular mp3 of the song but I can’t figure out how to do it in Audacity. I really dont know how to use that program, I was able to bring in both versions Instrumental and full song but I cant figure out how to line it up right or invert anything, can you email me maybe and explain this more? mysterymenace at hotmail.com I’m trying to get the accapella’s for Down with the King by Run Dmc.

First, putting an email address in a public forum such as this is not a very good idea… it will be easily grabbed by spam bots search the internet for email address for spam lists. Therefore I edited your original post and did a little “obfuscation” of your email address.

Second, we don’t provide help by email. We’ll gladly try to help, but we’ll post our answers here on the forum, so that everyone who might have the same problem can learn from it too. You can easily follow new posts on the thread by subscribing it for email notifications and you’ll get an email notification everytime someone posts on this thread.

What recordings do you have of the song?
What exactly are you trying to achieve?