I’ve tried searching for solutions to this problem, but I don’t do a lot of sound editing and maybe I just don’t know the right combination of terms to find my answer (if there is one). I have a recording from a memorial service that unfortunately had music left on that got recorded along with the person speaking (music is clear as day since it was muted to the in-room audience, but not to the video recording so it’s not a minor background annoyance, but a loud issue). I obviously also have the music that was played so I have the exact waveform file. Is there any way in audacity to line it up with the recording that has speaking and the song and tell it to remove the song waveform based on the separate waveform file? Does that make sense? Basically a take these two waveforms and remove what is the same. Or take these two waveforms and isolate what isn’t the same?
There is something called Spleeter that you can try. But I wouldn’t expect miracles.
…Audacity has an effect called Vocal Removal and Isolation but it mostly relies on the vocals being in the center (identical in both channels) of a stereo recording. That doesn’t work with mono recordings or with “live” recordings where the left, right, and center are not well controlled.
That CAN work under under “laboratory conditions”. If you have a digital copy of the music and a digital mix that includes the digitally-exact data (same volume, etc.) it can be perfectly subtracted-out. You can simply use the Invert effect and mix the inverted copy and the common part will cancel perfectly when mixed. Mixing is done by summation so by inverting & mixing, you are subtracting (adding a negative).
You can try it, but you don’t have a “perfect digital mix” so it may not work.
Usually if you have an exact digital copy of the music, you also already have an exact digital copy of the voice so there’s rarely a need to do it. (It’s usually just something fun to play with).
…If you open a file, and then import another file they will both be open in the same project and they will mix (sum) when you play and when you export. If you have two identical files, and you invert one and mix, they will cancel to silence.
…Another example is if you have two separate recordings of yourself saying “hello”. In that case there is enough timing & phase difference between the files that subtraction sounds exactly like addition… It’s a normal mix and it sounds like you and your twin saying “hello” together.
If you make a copy of your “hello” file, subtraction gives you total silence and regular mixing (addition) simply doubles the volume.