How to swap *only selected* audio parts between two channels

Hi all, is there an easy and quick solution in Audacity to this:

I often make a spoken word stereo recording, which consist of synced input from two mono microphones, each recording a different person (such as speaker-translator, or an interview between two people, etc.) located at places near each other.
In this setup, both microphones record the voice of person in front of them, but also voice of the other person, talking in front of the other microphone - with lower volume and quality of the sound. So, at any given time, one of the channels contains the best quality recording, while in the same time the other channel contains the lower quality record of the same.
However, each person needs to have his own mic, mic sharing is not possible. It is also not practically possible to switch the mics on/off during the recording.
These recordings can last more than 2 hours, and those two people are constantly alternating - first person says one sentence, the other one translates it, etc., making numerous swaps during the recording.

The result I’m looking for is a mono output, where at any given time, only the microphone of speaking person is heard, and where the original flow of time is perfectly preserved (mics are only alternating, but every moment of audio is captured, with one mic or another).

When I just simply mix the original stereo track to mono, it gives some usable result, but the lower quality input, “overheard” by the other mic, distorts the resulting recording at any given time. I would like to be able to tell Audacity manually, at what point which mic should be muted and which mic should be aloud.

To get better result, I wondered if I could manually select several parts of stereo track (sentences), then swap left and right channels only for those selected sentences - as a result, there would be one channel containing only high quality sentences, and the second channel would contain only the low quality version of the same sound. Then, delete the low quality channel, and save the high quality one as mono.

Or, if I could add some marks into the recording, which would dynamically mute or unmute channels as needed, at exactly the same time.

Is there a chance that this could be done somehow in Audacity, perhaps by using some plugin, etc.? Many thanks for any suggestions.

Can we assume the “bad” sound is always lower volume than the prime voice? You may be a candidate for Noise Gate.

Split the stereo show to two mono tracks with the drop-down on the left. Select one of the two tracks and apply Noise Gate. You might be able to hit a setting where all the bad audio drops to silence. Repeat with the other track. This isn’t going to work if the performers like to be expressive or weave around a lot. But it’s worth a shot.

There is also Auto Duck. You apply that to a voice track and a music track and it’s supposed to automatically reduce the volume of the music during voice. I have to look that one up.


Auto Duck is a part of the Effect set in Audacity 2.1.2. Split into two monos. Use it twice. Once right-side up to suppress track 1 and again upside-down to suppress track 2.



An alternative approach would be to “mute” (silence) the “bad” parts, then mix to mono.

Here’s two little snippets of code that can be used in the Nyquist Prompt effect:

;version 4
;type process
;; This will silence the left channel
  (s-rest 1)
  (aref *track* 1))

;version 4
;type process
;; This will silence the right channel
  (aref *track* 0)
  (s-rest 1))

If you would prefer to use a plug-in, then you can do the same (or swap the channels as you originally intended) using the Channel Mixer plug-in:

See here for download/installation instructions:

Given the job is a ping-pong interview, is repeatedly invoking nyquist easier than split-playing the presentation and control-L silencing the parts you don’t want?


Ctrl+R to repeat the last effect.
Not much difference really, except that you don’t need to split the track.

Many thanks guys, I’ll try your suggestions within a few days and let you know how it went.
I’m actually a total beginner and have no “pro” level of knowledge in anything related to sound engineering, so it might take me some time.

Regarding the question:

I’m not really sure, but most probably not. It sounds as if the sound volume was subjectively lower, but it really might be a different quality mostly, caused by the “proximity effect” of a particular mic. When I look at the soundwave visualization in Audacity, the difference does not seem to be that much, as the second mic with a lower quality distorted signal probably also picks up many other sounds, such as some reflections from room walls and other noise. Due to the recording quality requirements, I use phantom-powered condenser mics, so their pick-up capabilities are generally pretty good. Also, sometimes the setup is one-to-many, where one supercardioid mic records a main speaker person, while other cardioid mic records a group of people (a discussion). In that case, even at the time when the main speaker is the only one talking, the sound from the secondary (group) cardioid mic might be stronger than the one from the speaker, but I would still want the (possibly lower volume) output from the main supercardioid speaker mic, as the quality is much better and due to its pick-up pattern it does not include significant noise and wall reflections (“the sound of the room”).

This looks very interesting. I’m thinking that I could even create a special, separate control channel for use with the AutoDuck feature only (to be deleted later), by copying one of the original channels and then manually emphasizing some parts of it by making them very loud and others make totally silent. This way I could possibly improve my control over the AutoDuck behavior, as to when exactly it gets activated.
And as an additional plus, the AutoDuck might provide nicely another thing, which I also thought about: making a “micro cross-fade” (e.g. within half a sec or so) between the two channels, so that alternating changes of microphones does not feel too abrupt when listening.

I’m just wondering, if there will be a way how to create only one “special control channel” (with loud sound and total silence) and use it for AutoDucking both channels (one after another, and with opposite effect), or if I will have to create two special control channels, for each original channel separately. But thinking about that, perhaps I could use the first AutoDucked channel as a control channel for the second channel AutoDuck. :slight_smile:

Anyways, thanks a lot again, and I’ll try and see.

Any other ideas how to automate it, split or not?

Which microphones are you using? There are ways to avoid interference and noise in an interview. In your case, the “wrong person” is considered noise to the other.

That interview spacing is not an accident and they’re both wearing lavaliers.


Dueling posts.

sometimes the setup is one-to-many,

So you’re trying to record a conference. There is only one good way to shoot that given you don’t have supervisory control of the performance … or a crew. Give up trying to capture theatrically perfect and use an actual conferencing system with Auto Duck, Noise Reduction and Echo Cancellation built in. There’s just nothing like a recording system that “knows” what a voice is.

It may seem like a good idea to go down a performance correcting it word by word or phrase by phrase, but if this goes beyond an interesting hobby, that will get old in a hurry.

Press Stop at the end of the performance, cut off the messy ends, adjust volume a little and post the finished show.

Drive to the next performance.

There was a recent posting from someone who was correcting his audiobook … word … by … word. That may take longer than the person who wrote the book. Not likely to last past one reading and not viable if you’re too close to retirement.


RODE M2 for supercardioid and AKG P170 for cardioid pattern. Probably best mics available where I live for the rather limited budget I have.

While sound recording is quite important for events I record, its quality is not the top priority, so there is no possibility to adjust physical position of participants based on recording requirements. I do have a chance to position my mics as good as possible, but thats it. Other than that, the lower quality of recording would have to be sufficient.

Not really, just sometimes it gets similar. There may be ocassional questions from the audience to the speaker, and sometimes a brief discussion may spark. But mostly, it is an interview-like recording of two people only.

That would be probably nice, but completely beyond my budget. I’m not a pro, and it is a non-profit activity.

Agree. But I do it once every few months - as a hobby, kind of. Frequent enough to think about the workflow, but not something I would do for living.

Thats basically how I do it now. Just looking for some realistic way to further improve the result, if feasible.

Absolutely agree. But with AutoDuck, it seems to me now that what I want might be automated easily, so I plan to give it a try.

… every few months…

So you’re in the sour spot.

Auto Duck and Noise Gate famously fail if the two performances are too close together.

Post back if you get something to work.