separate out audio with two people speaking one after the other


I have an audio file which has 2-3 people speaking one after the other. Basically it is the first person speaking and other 2 people translating into their own language. There will be only one person speaking at any time.

I want to separate out each language or person speaking automatically. Can I separate it?

Thanks in advance


Not automatically.

The problem is there’s no identity. I don’t know of any good free way to detect spoken English, for one example.

You might be able to sense the pitch and timber of the voices and sort which one is speaking at any one time. You could also use a cyclical sense as error detection. If you knew that French was always the second language and you missed one, that would get you back on track. That would stop working the minute the gap between the voices changed.

That’s way beyond regular Audacity, home-baked Macro programming, and may be beyond Niquist programming.

Maybe someone else will post.


As koz wrote, “Automatically” is the big problem.

Although Alexa, Siri, Cortana (and similar) can detect certain spoken words, that is about the extent of speech recognition in PCs in 2021. Even they don’t “understand” what is being said, and they have no idea “who” is speaking.

Audacity has no way of knowing when one person stops speaking and another starts speaking, unless there is something physical that can be measured. If, for example, there is a longer pause between people speaking than between the words of one person speaking, then you could use “Label Sounds” to add labels around each part. See:

If there are not gaps, then you could add labels manually (See:

When all the parts are labelled, you can use the “Edit menu > Labelled Audio” commands, which may be useful. See: