Sync Drift from different Zoom Recorders

I’m using audacity 2.0.5 and am helping with a student video interview project.

One of my lav mics was acting funny on the morning of the interview so I decided to use my trusty Zoom H2, which got great audio from the subject (no budget and all). The interviewer used the other lav hooked up to a Zoom R24. I was intending to use the R24 for both lavs, but with the primary one not working, I just kept the interview setup as he was and setup the zoom H2 for a quick fix.

I imported both tracks into Audacity and synced them at the beginning, and by minute 30 there is significant drift, and I can’t figure out how exactly to fix them. I know that this sometimes has to do with the recording rates being off, but I checked both of the devices were recording at 44.1/16 bit, and the audacity project is working in 44.1 as well. I’m certainly an amateur so any suggestions as to where I should look for a fix would be greatly appreciated. Cheers.

They were both running at 44100 according to each one’s own individual internal clock electronics — the electronics that have manufacturer’s tolerances. So now you know why that person (usually Camera Assistant) goes out in front of the camera at the beginning of a film shoot and loudly says,“Camera Mark!” and slams that clapboard shut. The slam is for the sound people and the picture of the closure is for the camera.

In your case you’re going to also need an end mark. Given that nobody went out at the end of the shot with the clapboard upside down (End Mark), you’re going to need to find a sync point on your own and use Effect > Change Speed to match the two clips up. Hope that someone dropped a pencil or something for the sync mark. Trying to do it with lip sync is rough.

There is a formula for starting with your known time offset and deriving the weird numbers that Effect > Change Speed needs. I need to look that one up.


If you have to do this again, as you noticed, separate sound can work perfectly well and is generally preferred, somebody needs to appear on camera and clap at the beginning and the end — just like the big kids.



A note. You don’t have to find a sync point at the very end of the shoot. It’s best if you do that, but anywhere in the end quarter of the shoot should be close enough, particularly if it’s human lips. Koz

There is a plug-in that allows you to stretch the selected audio to a precise length:
(this may be easier than trying to calculate the necessary percentage change).

Remember, none of us are horrified you’re doing this. You can frequently get much better Separate Sound than you can with camera built-in sound. As long as you know there might be a sync issue, you can be ready for it (it’s the surprises that kill). You might even discover the correction is always the same number (as long as you don’t change hardware) and you barely have to think about it.

Attached is a Nagra III that the movie people used for years. It’s claim to fame it stunningly accurate motors designed to keep up with the movie camera motors. That’s why most times we only need the clap at the front. The two will run independently of each other in frame sync for hours.

Hi Koz

Why don’t all vision and sound recording devices have a button and circuit to synch the two inputs?

A button that when pressed set a timer and made an external noise at exactly the same time would do the job brilliantly.

The same circuit could be used on all devices making it really cheap. The frequency of the noise it makes could even be outside the normal hearing range and detectable only by the device (and thus making it immune to interference from random extraneous noises) meaning the operator could resynch at will whenever necessary during a continuous recording of any significant length.

I’ve got a Zoom Q2 HD which for the few pounds it cost me does a brilliant job of filming and recording; it just doesn’t get them together making it pretty useless at anything.

That’s what a “word clock” is for:

Hi Steve

I don’t think it’s quite what I had in mind. It seems to be a means of keeping devices on a network in sync a bit like GPS does.

I was thinking of a very simple, massively common (and therefore cheap as chips, ha ha) solid state device that would be completely self-contained, requiring no contact with the outside world or external services of any kind, that would simply make a timing mark on a video recording (a flash perhaps?) and at the same generate an audio noise which would be captured by the device’s external microphone.

Software in the device would use the measured timing difference (the lag or latency) to automatically correct the recording with no further human intervention, editing etc, required.

Like a clapper board ?

You can certainly get complicated, but for five bucks you can get a clapboard you can use with any camera or any sound device. Or multiple devices. Hand off the work to any good editor and you’re done. Please note the clapboard also has show information, time and date built-in. 80% of shooting a show is bookkeeping.

You do have to know to do one at the end if you suspect there is a speed issue as in this case.

Software in the device would use the measured timing difference (the lag or latency) to automatically correct the recording with no further human intervention, editing etc, required.

Sounds like a product you should be developing.


The clapper board is fine in instances where a) there is a clapper board operator separate from the cameraman b) there is control over the recording session such that no one cares if and when the clapper board is used and c) in the event of one person operation (opo) the recorder is mounted on a tripod.

Filming a performance you are a guest at or where a clapper board would be obtrusive or there is no room for a tripod limits their applicability.

I was thinking of a universal sort of device triggered by a simple button on every video recorder such that opo would be easily achieved without disturbing the audience and without requiring the operator to let go of the recording device.

I’m certainly not knocking the idea. I think it would be an excellent invention, but not applicable to the here and now because as far as I’m aware it doesn’t exist yet (perhaps worth doing a Patent search).

Assuming that you are recording video on a camcorder of some sort, and recording the audio on a separate audio recorder, you may not need to have the visual timing point of the clapper board on the video. Camcorders will record sound, even if the sound quality may be poor, so it may be sufficient to just have audio cues that are picked up by both the camcorder and the audio recorder. It could be something as unobtrusive as a cough or clap, as long as it is clearly defined and you can find it later when you come to synchronise them.