I’ve been working on voice separation for a while, using various tools like UVR and even some self-trained models. Most of the time, the results are pretty good, but instruments like guitar and piano tend to be harder to separate. Today, I used LyRuno to separate a song and extract the guitar and drum tracks, and I was honestly amazed by how high the quality was.
The downside is that the installation package is pretty large—several GBs—and without a GPU, the processing speed can be slow. But it’s definitely worth trying.
Additionally, LyRuno categorizes different sound scenarios and provides corresponding models for each, such as extracting accompaniment, vocal remove, isolating instruments, and separating dialogue from sound effects, etc. I haven’t had the chance to try them all yet, but it looks promising.
Yeah, I think it’s actually using the GPU for model inference, which explains the huge difference in speed.
I’ve been playing around with LyRuno for a few days as well, and I have to say the separation quality is really good—vocals, instruments, accompaniment all come out pretty clean. On my machine with a GPU, a 5-minute track takes around 2 minutes or so.
But on another PC I have without a GPU, it’s pretty much the same as what you’re seeing—almost 50 minutes to process one song, which honestly feels a bit crazy.