2 audio tracks, same session, timebase mismatch HELP?!

I had to do an interview (for a radio show) with a couple instead of just one person. In order to capture all three voices I used my iPad plus RODE two-lavalier-mic setup (Hindenburg Field Recorder) for the two interviewees, and a handheld Zoom for my own (interviewer) voice. The Zoom was recording direct to mp3 and the iPad to wav.

So I thought I was being clever: I’ll just load both files into Audacity, and the strong signal from the Zoom will overlay the weak background signal from my voice coming over the lavalier mics clipped to my interviewees, so I’ll have decent quality all round. And I can adjust the relative volume levels of interviewer and interviewee voices in mixdown, because they’ll be on separate tracks. What’s not to like?

I haven’t tried this before. It quickly became obvious that it was a Bad Idea, and in fact I’m in a bit of a bind now.

What happened – and I don’t really get it – is that the time base for the two recording devices seems to be ever so slightly different. The mp3 track, over the same thirty minutes of wall clock time (first section of interview) is slightly shorter than the same thirty minutes of the wav track. I aligned the start points of the two tracks carefully, and at first all is well, we’re in synch and it sounds great.

But after only a few minutes, I start to hear a subtle “phaser” effect, and pretty soon we have serious echo (or “pre-cho”) as the mp3 track is quite audibly ahead of the wav. FFWD to 15 or 20 minutes in, and it’s almost unlistenable because of the babble of out-of-synch voice.

So now I have a real mess. The interviewees’ voices are just barely audible in the background of the mp3 (zoom) track. My own voice is audible in the background of the wav (iPad/Hindenburg) track – but not loud enough to use. So this out of phase problem cannot be ignored. I have no idea how I’m going to fix it! This was a great interview and I’m just about in despair over how to rescue it.

I’m only a shallow user of Audacity, no real expertise, I just edit and adjust levels usually, mixing down to mp3. Is there a way to establish a marker A and B on Track 1, then a marker C and D on Track 2, then tell Audacity to stretch Track 1 so that the content between A and B occupies exactly as much playback time as the content between C and D?

Or (hoping against hope) is there any super smart Audacity filter that aligns the amplitude envelope of Track 1 to the envelope of Track 2? The envelope is very recognisable (same features occuring, though with different overall amplitude, on each track). I can visually identify the alignable features, and visually perceive the drift between the tracks.

I had no idea this could even happen. I thought that the timeclock on digital recording devices was trustworthy. I mean, the Zoom marks time down to the hundredths of seconds. It never occurred to me that two devices could record the same audio source and come out with different timebases – off by more than a second if you listen long enough!

You can use change speed so all the tracks have the same runtime.

Could use AutoDuck to squelch the weaker version(s), if there is still phasing after the speed change.

“You can use change speed so all the tracks have the same runtime.”

So… carefully clip the tracks so they begin and end on exactly the same moment of audio (which will make them slightly different “lengths” according to Audacity) then use “change speed” to make the shorter track longer, or the longer track shorter?

Can make the short one longer (slower), or the long one shorter (faster). I’d assume the WAV is correct & the MP3 is wrong.

There is a slow piece-wise way of doing it without changing the speed …

Split the shorter track into, say, minute-sized chunks, (split where the join is not going to be noticeable),
then move each chunk using time-shift tool so it is in-sync, then repair the tiny gaps which will now exist between the chunks.

“Split the shorter track into, say, minute-sized chunks, (split where the join is not going to be noticeable),
then move each chunk using time-shift tool so it is in-sync, then repair the tiny gaps which will now exist between the chunks.”

Yikes. We’re talking 2 hours of audio here, so that would be many many hours of fiddly work.

I just tried converting the WAV files to mp3, trying to match characteristics of the encode, but the problem is still there even when everything is converted to mp3. Maybe I didn’t get all characteristics matched (sampling rate, bitrate, etc) – worth a couple more tries.

Looks like stretching or shrinking is the only fix. The only way I can figure out to do this is to find a discrete audio event near the end of the sample – like start of sentence or someone coughing – figure out what the offset is, figure out where it “should” be on one track or the other (i.e. either moved later or earlier in time), then stretch or shrink the target track. Sure hoping I don’t get artifacts from the stretch or shrink, but the percentage change should be really tiny so fingers crossed.

I never really wanted to know this much about Audacity, but here goes…

Well I have solved this problem, thanks for the advice. I’ll include the method here in case someone shows up with the same problem someday.

So, you have 2 tracks, recorded by 2 different devices from the same real-world signal. When you pull them into Audacity you find to your dismay that the devices disagree about the timebase, and one track is actually shorter than the other. In my case, the difference was about 6 seconds over 90 minutes of recording. So you trim and get them all synched up at the beginning, but a few minutes in you start to hear phase shift… then echo… then babble.

In my case, there was background audio from Source 1 in mics 2 and 3, and background from Source 2 and 3 in mic 1, so there was no way out: any loss of synch was immediately and continuously noticeable. So let’s say Track 1 is the one that’s short relative to Track 2. [Who can say which one is “right”, without an atomic clock and maybe getting into some Heisenberg conundrum of observational relativity? But one is shorter and one is longer, and Track 1 is the shortie.]

We could either compress Track 2 into less time, or stretch Track 1 into more time. I chose to stretch Track 1. Here’s how.

  1. align both tracks carefully at the beginning, using a distinctive sound event found on both tracks to synch them up. something with a hard attack is ideal. listen closely to make sure you are perfectly aligned.

  2. now go to the end and look for another distinctive event found on both tracks. this will be your “clapper” for the fixup process. I was lucky – there was a classic impulse noise just a couple of tenths of a second from the end of the track. perfect.

  3. write down the time at which this event happens in Track 1. This is T1.

  4. write down the time at which this event happens in Track 2. This is T2.

(for grins, shift one of the tracks to line up the end clapper events perfectly and listen to the last few seconds of both tracks. if it all sounds normal, then you’ve got the right clapper.)
(but do remember to line them up at the beginning again!)

  1. OK, now subtract T1 from T2. This is DT, delta time, the correction you want to apply to Track 1. It needs to be DT sec longer than it is.

  2. select Track 1 (double click)

  3. from Effects, choose Change Tempo. A dialogue box pops up.

  4. in the dialogue box, add DT to the length in seconds that is shown on the left, and enter that new number on the right. Audacity isn’t helpful here – it seems to limit the data entry to 5 chars, but the 6th char is actually there, you just have to scroll sideways to see it.

  5. choose High Quality tempo change. start the process…

  6. …and go get a cup of coffee or something. It took 17 minutes on my (pretty hot) Hackintosh to stretch 1h30m of mp3 by 6.25 seconds.

  7. go to the start of the project, and listen to both tracks again. if the start sounds weird then you made a mistake and can’t trust your results. so start over. if the start sounds good, go listen to the end.

  8. if the end also sounds normal, you just won! congratulations. but if the end is still slightly phase-shifted, don’t panic. Find your clapper events again, zoom in as far as you dare, and figure out what the remaining error is. What I did then was to Undo the time stretch, go back to step 5, adjust my previous DT by the remaining error, and so on.

  9. SAVE YOUR PROJECT. RIGHT NOW. I invested more than two hours in this fixup procedure (I had two big chunks of audio to fix, the 90 minutes described here plus another 40 min or so). I’m a perfectionist so I did it a few times. You might invest fewer minutes with shorter tracks or being a little less fussy, but the fixup procedure is fiddly and boring and who wants do do it all over again?

Hope this helps someone someday.

Another, more expensive solution:

https://www.avshop.ca/recording-digital-recorders/zoom-h6-handy-recorder

record your material on just one device with multiple microphone support, like this H6 with up to 6 mic inputs :slight_smile:

Then you won’t have the issue of multiple clocks on multiple recording devices. But I don’t think I can afford an H6 at the moment.

Change Tempo & Change Speed are different:
Change speed is quicker, does not create artefacts like Change Tempo.
Change speech will shift pitch, but probably imperceptible.