remove music from music+voice by fft ?

what i want to do:

I have a wave file with music + voice. I also have the music on another file without the voice. I want to get the voice only.

Doing the standard karaoke filter method doesn’t work very well. I would like to use the clean music track to get to the voice. Problem is, the music track I have has been mp3 encoded and will not work by simply subtracting the signals in the time domain.

I think what I need to do is

align the two tracks
for every small window in the music track {
obtain an fft profile of the music in that window
eq filter the corresponding window in the music+voice track, using the profile from above

Is this possible using automation? Would it work? Has someone already done it?

My guess is that the FFT windows would need to overlap. Consecutive filtered windows would need to be cross-faded to avoid clicks between windows.

Sure would be sweet if there was a script like that or better yet if this functionality could be included in audacity. Something like “Subtract wave from file…”.

Similar things have been attempted by a couple of plug-in developers. Success with these tools is variable.

–Voice Trap
–Extra Boy

Hi, thanks for your prompt reply.

I have installed both these plugins to audacity and am shifting through the docs.

It is not obvious to me how any of these two plugins is going to use information from my clean music track to deconvolute the other one.

What I think I want is some frequency domain filter with a profile that varies over time, cancelling the music coming from my clean music track.

Is it really possible to do this with these plugins? Any pointers would be appreciated.

again, thanks

No, they don’t work exactly as you describe, but they use some similar principles. You may see what I mean when you’ve gone through their documentation and used them, or you may totally disagree with me that there’s any similarity, but they are the closest that I’ve come across. FFT filtering alone will not produce the results that you are after (even using matching of profiles) because “packets” of frequencies will probably be common to both profiles. The plug-in needs to be able to take these packets in context so that it can guess whether they belong to the music or the voice. These two plug-ins attempt (with quite a bit of user interaction) to do this.

Plus, you only get cancellation between two audio files if they are bit-for-bit identical except for the singer. Those are extraordinarily hard to come by. It doesn’t count if you got both tracks via MP3 and the internet. MP3 damages sound as it works and it damages the cancellation.

Also, the two songs have to start at exactly the same time. It doesn’t count if somebody captured two wild tracks and posted them. You’ll never line them up accurately enough to restore the cancellation – also see: MP3 damage.

MP3 is a delivery format, not a production format. WAV is a production format. MP3 damage removes a lot of production tools from you.

Most of the packaged tools work by locating the singer in the stereo field. If the singer moves around, you’re dead. If the show’s in mono – even two-track mono, you’re dead.


OK guys thanks for the detailed explanation.

I guess I can’t do anything here because I have a song with lyrics plus some speech that I want to isolate from the music+lyrics.

again, thanks for helping me understand