Vocal isolation with backup tracks

So I have the typical impossible task of isolating one persons voice from a recording of 5 people. However, I have a trick up my sleeve. To make a long story short, I have a recording of 5 people talking in a interview over the internet. 4 of the 5 did me the kindness of recording their isolated vocals at home in audacity. The 5th person did not have the ability to do this. So, I really have 5 tracks listed below:

Track 1: Person A
Track 2: Person B
Track 3: Person C
Track 4: Person D
Track 5: All Together

I am wondering if I can sync the tracks together, and using Track 1 cancel out Person A’s voice from Track 5. The end result being that I am left with only Person E who did not have a backup recording. There by, creating a 6th track. Does that make sense? Essentially I am hoping for a negative times a negative equals a positive type thing here.

It might work probably not… :frowning: There will be timing differences from the different hardware so it will drift out of sync. You also need to match the amplitude of the individual track with the amplitude in the mix or it won’t subtract completely.

This kind of thing can work under “laboratory conditions” if you plan it from the beginning but it usually doesn’t work in the real world.

Essentially I am hoping for a negative times a negative equals a positive type thing here.

You can do that with the Invert effect. For example, you can open a file, then import the same file again into the same project. Select one of those tracks and invert. When you play or export they will cancel to silence.

The same trick will work to subtract one part (or more parts) of a mix but the levels can’t be adjusted during the original mixing, etc. …The thing in mix has to be exactly as the inverted thing you’re going to subtract-out.

Why do you need the individual tracks if you have the mix?

I’m betting you don’t get past A. The local microphone recording is going to be missing internet packet management, delays, and your receive processing and corrections.

Skype, Zoom, Meetings, and other services do a ton of processing to make it sound like they’re not doing a ton of processing.

Try it. Open the composite and the A voice. Effect > Invert the A track and use the time shift tool and Effect > Amplify and see how you do.

These are all perfect quality WAV (Microsoft) files, right?.. right?.. None of this works with MP3. MP3 gets its small, convenient sound files by re-arranging tones and leaving some of them out.

There is a way. The Hollywood solution is to get Person 5 to listen to the composite on headphones and re-announce his portion into a clean microphone in his quiet studio. This sounds like crazy magic, but it’s not that hard. Announce a word or two behind and with a little practice, you can hit the timing and pitch and keep it up for a long time. It’s a remote cousin to Simultaneous Translation between languages, except they’re both in English.

His recording will be off time, but that may not make that much difference because his voice will be clear and by itself. Edit as needed.

I think that’s your only hope.


Next time you do this, tell him to wear headphones and record his voice on his phone on the desk.

Screen Shot 2021-09-16 at 10.14.22.png

That’s going to come out hand-fulls better than what you’re doing. Obviously, turn off notifications and turn on Airplane Mode. The phone can’t be doing anything else.

This is from that setup.