video soundtrack time difference

Hi,
I recorded slides of a presentation using OBS (mkv file) - each slide recorded separately.
I then extracted the audio using audacity, mixed each track to mono and combined them as a one track, then amplified it. The total track time was about 4:19.
I then used Powerdirector to combine the slides without sound - the sound was imported as the track I had just made with audacity.
The problem is that the time shown in powerdirector is about 4.21. So there is about 2 sec difference, which causes the audio and video to be out of sync towards the end.

I thought I might have combined the tracks in audacity wrong, so I have done it 2 times, but the result is the same.
I also checked with windows file explorer the total time of the videos and it was 4:21, like what powerdirector says. I suspect it is something with Audacity, but I am not sure.

I can’t seem to figure out why there is a difference and how to solve it. Any idea ?

You can “solve” it with Effect > Change Speed. The effect control panel has different ways to tell the error: start and stop times, percent, duration, etc. Pick the one you’re comfortable with.

This isn’t necessarily the error, but if you’re doing work in video land, I would do all the sampling in 48000, not 44100.


Screen Shot 2019-06-26 at 17.52.59.png
44100 is the common sample rate for Audio CDs. 48000 is the common sample rate for video.

It would be a very poor editor or processor that didn’t “know” how to convert between those two automatically, but still… If everything worked perfectly, you wouldn’t be here.

As a fuzzy rule of thumb, any time you have independent recorders, you may have the possibility of lip-sync drift. This is the video and audio independent recorders on a movie set thing.

You can also get time errors when the computer thinks the time is five minutes, but a five minute show when played to a stopwatch in your hand is not five minutes. That’s a computer internal timebase error.

That’s the kind of errors you get on a 7-Eleven/Tesco computer.

See if the problem changes if you change Audacity to a 48000 sample rate.

Koz

The audio is indeed 44.1Khz. I tried changing the Audacity project Rate to 48000 as you suggested. I have also resampled the audio to 48khz (Tracks->Resample), but the track time length stays the same. I don’t think it is related to the audio being sampled at 44.Khz instead of 48000, because (48000-44100)/44100 = 0.088. That means that for every second, there should be 8.8% time difference. If you apply that to a 4 minute audio, you get a time difference of 240*0.088 of about 21 seconds, which is far from what we have (unless I miss-computed it).

I think it is something with audacity opening mkv file. I experimented by importing these videos to Powerdirector, merged them and produced it into mp4 file. Then I imported it to Audacity. Then Audacity shows the time length is 4 minutes and 21 seconds.
This can also happen if for some reason audacity removes few ms of each imported track, so if you merge those video’s soundtrack in audacity, you can get to such a time difference. I will try to experiment with it more to get a better idea.

According to my findings, the problem is with Audacity importing mkv files.
The experiment I did:

  1. Went to https://sample-videos.com/
  2. I downloaded an mkv video of 30mb size and mp4 video of 30mb size.
  3. I opened them with Reaper and Audacity. These are the lengths:

Length of Mkv file in Reaper: 05.11.200 (5m,11s,200ms)
Length of Mkv file in Audacity: 5m,13s, 472ms
There is a significant difference.



Length of Mp4 file in Reaper: 06.08.200 (6m,8s,200ms)
Length of Mp4 file in Audacity: 6m,8s, 192ms
Seems almost identical.

Could be that related to FFmpeg library Audacity uses ?

I guess the MKV file was this one: https://sample-videos.com/video123/mkv/240/big_buck_bunny_240p_30mb.mkv

The length of the audio in that file is 5m,13s, 472ms, but that is longer than the length of the video!
Reaper (along with VLC and MediaInfo) reports the length to be about 5m 11s, which is the length of the video.
Audacity doesn’t do video, so it just extracts the entire audio track.

If you compare the audio extracted by Audacity, with audio extracted by VLC (and probably Reaper too), you will find that the VLC version is missing the first 2.304 seconds of audio. Other than the first 2.304 seconds, the tracks synchronise exactly.

I see. In my presentation recordings it is the opposite : the video is longer than the audio, but I guess it is the same problem you describe.
Do you think this problem might be caused by OBS (screen capturing software) ?
Any idea how can I overcome the problem when I have many small video clips that I need to combine (my real problem) with the use of audacity ?

No, I think it’s a “feature” of the Matroska Multimedia Container format.

Video works in “frames”, whereas audio works in “samples”. (To complicate the matter, some video formats have variable frame size). Video frames are much bigger than audio samples, but edits have to be on frame boundaries. If you make a cut at exactly x.xxx seconds, then the audio can be cut very accurately at that point (accurate to +/- 0.5 sample periods, which is typically about +/- 0.00001 s), but the video stream is cut on a nearby frame boundary. While in MKV format, the format can specify an offset to ensure that the audio and video remain synchronised.



  1. Combine your slides into the presentation first (with the original video sound). If all of the software is well behaved, the audio and video should remain synchronised.
  2. Then, export the full audio track from your video software as a WAV file (48 kHz sample rate).
  3. Then import the WAV file into Audacity and process as required (taking care to not change the length)
  4. Then export from Audacity as a WAV file, and import back into your video app.

(This type of problem is one of the reasons I don’t enjoy working with video - it’s a pain in the neck!).

Thanks steve, this could be a solution. I tried to avoid it, because then I loose the marks of the stitches between the videos (which I need), therefore it becomes harder to trace them and replace them with silence (or ambient sound).

As for my next recordings, could that be solved with some change of settings in OBS ?

I’ve never used OBS, but I believe it allows you to remux as MP4. Perhaps that would provide a solution?