Automatically joining clips end to end within one track?

I’m kind of new to Audacity. As part of research, I’m separating a track, now containing two speakers, into two separate tracks-- one for speaker A, and another for speaker B. This is kind of labor intensive, especially as I’m interested in being consistent about the periods of silence around the tracks without just leaving holes where the other person is speaking. I basically want a track of continuous speech with no pauses greater than 0.6 seconds for each speaker.

I’ve found a relatively easy way to accomplish this, but it leaves clips scattered across the track, with lots of intervening background that could be misinterpreted as actual silence by Audacity. The short version of this question is this. When there are a number of clips scattered on a track, is there an easy way to automatically join them all end to end without using truncate silence, or do they need to be copied and pasted individually? As the amount of silence within the clips is actually quite important to me, I don’t want to confuse that silence with the grey background the clips have been posted on. I would rather just line the clips up end to end, preferably not manually.


I think that I mostly understand what you mean, but I got a little lost on some of the detail :wink:
Probably best if I just describe some of the Audacity features that may be useful for this job - you can then try these features and work out which ones help with your work-flow.

I don’t use Mac OS X. When I give a shortcut with Ctrl, on a Mac I think you need to use Command.

If you have multiple audio clips in the same track, you can “join” them together into one audio clip by selecting the entire track and then “Edit > Clip Boundaries > Join” (shortcut Ctrl+J)

Selecting audio

  • To select an audio clip you can double click on it.
  • To select an entire track you can click on the information panel on the left end of the track.
  • To select an entire track (another way), click on the track then hold the shift key down and press Home, then End (then release the shift key)
  • To select everything in a project: Ctrl+A (“Edit > Select > All”)

“Mixing” one or more audio tracks into one continuous track
Select the track(s) then “Tracks > Mix and Render”.
The new “mixed” track appears below any other tracks.
If more than one track is in the mix, then the track name becomes “Mix”.
You don’t need to select the entire length of the tracks - just part of each track is enough and the entire track is included in the mix.

Aligning tracks (Audacity 2.0.5 required for some options)
Probably the most useful for you is the one that is new to Audacity 2.0.5: Align End To End

Labelling parts of a track
This may be useful for keeping track of who says what when:

and I presume that you already know how to use the Time Shift tool.

For this type of project, if it is a long project, it is often easiest to work with fairly short sections. Save each section as separate projects until all sections are complete. Backup often. When all of the sections are complete, each project can be exported so that you have a separate WAV file for each section, then create a “master project” and put all of the sections together by importing each section WAV file, and aligning them end to end.

Thank you for the thoughtful reply. I’m still learning Audacity lingo, and maybe can now explain the problem more clearly.

Imagine about 150 clips scattered on a single track, with varying and sometimes very large amounts of dead space between them. What I want to do is time shift the clips so that they are chained immediately one after the other on that same track. I want each clip to start immediately after the previous one, without having to manually use the time shift tool to drag each of the 150 or so clips together.

The “Join” tool creates extra silence between the clips, which I don’t want. “Compress silence” impacts silence not only between but within the clips, which I also don’t want.

Is there a way to automatically time shift all the clips in a track so they are so chained? It seems like the sort of thing that could be easily automated, but I haven’t figured out how to do it except by hand, clicking and dragging each clip one by one, which is time-consuming and tedious. This is a large project, and would mean hand dragging about 10,000 of the clips in this fashion by the project’s end unless I can automate the process. Thanks again for the help!

How “silent” is the “silence” within the clips? Is it “absolute” silence? (generated silence rather than recorded silence?)
Are the silences that you want to end up with precise durations? All the same duration? All different?

It might be easiest if you tell us the story of where all of these audio clips with attached silence came from.
Did you attach silence to the start and end of 10,000 audio clips? If so, why?

I touched on this earlier, but the reason I am so protective of silence is that I am measuring it for research. Basically, I’m comparing aspects of speech melody between two speakers, both recorded onto one track (not my preferred method, but I have to work with what I have). I can put the speech into a semi-automated program for analysis, but the individuals’ speech must be on two different tracks first, one for each speaker. Among the measures I’m interested in, I’m measuring the delay from one speaker to the next-- how long does it take each speaker to respond?

After running a 24 Db Noise Removal over the entire conversation to help clear out some background noise, I’ve then labeled sound (with 0.3 seconds on each end), exported the data from the labels and placed it into a database, and cut and pasted the labeled audio clips onto a new track, thereby removing any very large pauses from the sample but documenting the nature of what I’ve removed. This has the benefit of methodically cutting up the samples of speech, saving a lot of manual labor, but creating a lot of separate clips.

I can then go through and separate speakers within individual clips, carefully cutting at the end of the speaker’s voice so that the silent time before the next speaker (of various durations) is preserved on the subsequent clip. I then drag one speaker’s audio onto another newly created track. The end result is between 150 to 200 clips per conversation, but because there’s been so much cutting and pasting, there’s a lot of dead space between the clips, which doesn’t have a lot of research meaning. The best thing to do would just be to drag all those clips together for each speaker.

I’ll be doing this entire thing for about 60 conversations, so making this as quick and easy as possible would be wonderful. Thank you again for your help.

As long as the remaining background noise is over -75 dB, then you can use “Truncate Silence”. (Truncate Silence is supposed to work down to -80 dB, but there is a bug that makes the -80 dB setting non-operational.

Try using Truncate Silence with both Minimum silence and Maximum silence set to 1 millisecond, and the threshold set to -75 dB. As long as the background noise is over -75 dB then the proper “gaps” will be truncated to 1 millisecond and the “silences” will be protected by the background noise.

If you find that some of the “silences” are being truncated, use less Noise Removal.

The “gaps” between audio clips are read by Truncate Silence as “minus infinite dB”.

Worked like a dream. Thank you!