Seamlessly joining pre-edited voice recordings / audio book type material

Hello fellow audiophiles,

First off: the issue I am going to describe here has certainly come up before but I could not for the life of me find the exact situation I am in. Plenty of people have a similar issue, but the fixes either don’t apply to my case or they would take way too much time if done manually. I have read the FAQ on the issue, with the same result.

“Why, pray tell, did you choose to post here anyway, if anything on the topic has already been said and you are just not happy with your options?”, you may ask. The problem I experience does not occur during playback before editing. What this means will become clearer in a second.

The issue:
I like to listen to radio play / audio book type stuff on the go. The publisher (in this case a subsidiary of Universal Music) sees it fit, to release these 30 - 60 minute long episodes cut to pieces of around 90 seconds. Each episode therefore clocks in anywhere between 10 and 30 pieces. For convenience, I always use audacity to join these pieces to a single file (reason: limited file tree structure on mobile device, excessive clutter on mobile device because each part would get listed etc.). With most publishers, this works perfectly.

The stuff from Universal always has cracks at the seams on account of the mp3 pieces having silence at the end and the beginning, which does not merge properly. I tried truncating silence, I tried normalizing the tracks. No automated way I can find gives satisfactory results. The only way I have found is to merge the pieces and the manually edit each transition by properly merging the pieces at zero crossings on both channels if possible. That is obviously way too much work, especially considering that the publisher could simply offer a single-track version in the first place.

The thing is though, when I play the pieces as a playlist in a software player - while by no means perfect - the transitions sound a lot better than in audacity (as pieces or merged). The player seems to automatically remove the silence and splice the pieces together with mostly satisfactory results.

The question I now have is: How does it do that and can I emulate that behaviour in audacity, preferably with as little manual work as possible?

Any assistance in this matter would be much appreciated. I’ve been battling with this for years and the publisher will probably not budge on the “single track” front, for whatever reason.

I’ve not come across that. Why do they do that? Is it to “encourage” listeners to purchase the full version?

My guess is that your player is emulating “gapless playback” by slightly overlapping the files on playback.

One option would be to play the playlist, and use Audacity to record the playback (see: Tutorial - Recording Computer Playback on Linux - Audacity Manual)

Thank you very much for your reply.

In my search for answers I’ve also been in contact with the producer of one of these series. They are not too happy about this practice either (I mean really, why would you not offer a single-track version?). He mentioned that the publisher probably cuts them up to make them more suitable for streaming. I don’t know why that would be necessary but I’m not too familiar with audio streaming technology and whether or not any benefits can be achieved by cutting the things.
Unfortunately there is no “full” version. The product you buy is the one I described, in pieces, like a regular music album would be. Other publishers are doing the same thing but apparently edit their files so you can just stitch them together.

I will look into the option you suggested, but it would still be quite time intensive to listen to all of them on the PC. In addition to that comes my personal preference of wanting to listen to these stories on my mp3 player and immerse myself. The sony A-45 both has a limitation to how deep a directory tree can go (so you don’t want to have an additional layer for sorting your episodes if you can help it), and relies heavily on the mp3 tag information when displaying content (so if I were to sort by “Artist”, with the Series name in that field for all files, it would list every part of every episode). So this piecemeal approach does seem even more outdated to me.


Addendum: I forgot some of the mandatory information in my first post, that might be relevant. I didn’t know I would not be able to edit that.

Audacity version: 2.3.3, installed from my standard repository

There’s a problem that MP3 files always have a bit of “padding” at the beginning. This is unavoidable due to a limitation of the MP3 format.

The LAME encoder has a clever workaround for this issue, which is that it can add a metadata tag that records the length of the padding, so that apps that support the tag can automatically trim off the padding. IF the files were encoded with a sufficiently recent version of LAME, then they will probably include that tag, which we “may” be able to make use of for the job in hand.

There’s an app available for Linux (available for Ubuntu / Mint from the main repository) called “mediainfo-gui”. That app can analyze media files and provide a lot of information about the format, often including which encoder was used. What does it say for the MP3 files in question?

Your help is most appreciated steve. I’ll get that information as soon as I can. The files are on my laptop, as are more tools for working on them and I’m sure I can extract the information from them. I should have an update by tomorrow evening or Monday, just so you are not waiting for a reply.
I have read that mp3 encoding always introduces silence, which made me wonder… I guess the other publishers, where I don’t have this issue, convert the raw material to mp3 first, then cut it to pieces, while Universal cuts first and then converts to mp3? In that case, it would be just a matter of altering their workflow slightly to accommodate apparent fringe cases like myself. Even if they did do so though, I don’t assume they would re-do their current library just to be nice…

The whole thing isn’t urgent in any sense. I’m just puzzled as to why they would do what they are doing. I contacted Universal twice about it via email so far… the closest I got to an answer was a message saying that my request had been forwarded to the correct department - which probably was under “B” for “Bin”.

If they are encoded with LAME and if they do have the “gapless” tag, then you should be able to import them into “Audacity 3.0.3” with the “padding” trimmed off.

Audacity 3.0.3 has not yet been released, but there’s a pre-release AppImage available here that you could try: https://www.fosshub.com/Audacity-devel.html

If your MP3s are all in the same folder and are named in alpha-numeric order, then you can simply:
“File menu > Import”, then select all of the files and import them.
Then “Tracks menu > Align > Align End to End”
Then “Export” in your preferred format.

If your files are not in the correct alpha-numeric order, then you could create a “LOF” file like this:

file "test1.mp3"
file "test2.mp3"
file "test3.mp3"

Place the LOF file in the same folder as the MP3 files, and import the LOF.
(See: https://manual.audacityteam.org/man/lof_files.html)

Again, thank you very much for your efforts. I have taken a look at the mp3s via “MP3 Diags”. It seems information about the encoder is not available and the “gapless” tag is not included. It is showing a “padding=100” tag though, but I’m not sure that has something to do with the silence in the file. I suppose if it were connected, it would refer to 100ms.
The files do have padding at the beginning and end, hence I’d expect 50 ms each if my theory was correct. Unfortunately, these figures don’t seem to relate to the file in any meaningful way, assuming they are measured in ms. The files are VBR which I assume could also be relevant here.

The more I analyse the cuts, the weirder they look to me. The spectrogram of one file in question shows about 25 ms of total silence in the front and about 15 ms in the back. However, they aren’t what I would call clean (0-width) cuts. You can see in the spectrogram that the audio actually seems to be “crushed” to 0 in the span of maybe 5 ms before the silence and subsequent cut, and inflated from 0 in about 15 ms after the silence and preceding cut (fade from red to blue and vice versa on all frequencies).

I’ll provide some screenshots for the curious in the coming days, if I manage it. I don’t think this is regular mp3 encoder behaviour but it is in line with what I have previously seen from this publisher.

MP3 is a very old format. Technically the issue is that the overall encoder / decoder delay is “not defined”, which is why there is “some” padding at the beginning but not a defined amount. Similarly, unless the decoder cuts off the end (which many do), there’s also “some” trailing silence. More modern codecs such as Ogg don’t have this problem.

It’s unfortunate that the stuff I’m interested in is not available in newer formats. As promised, I attached a file for the curious. That’s what all the cuts look like in structure (above: spectrogram, below: waveform)

The worst of them show spikes across the whole spectrum, audible as clicks at every cut, of which there are 12 in this particular episode.

Other publishers manage 0-width cuts just fine. So is Universal simply cutting the original file first and then converting to mp3, you reckon?

I guess I’m going to have to live with it or abstain from products published by this particular subsidiary of Universal Music, which is a shame, because the production company working with this publisher is producing very high quality content… and then it literally gets cut to pieces.
Universal Music - cut quality.png

Have you tried any of these programs?: Top 10 Free MP3 Joiner: How to Join MP3 Files for Free!

You could apply short fades to the start and end of the tracks. There will still be the gaps, but the fades should prevent loud clicks and so be less annoying.

Thank you very much jademan and steve for your suggestions, I’ll look into them. At the same time I’ll probably continue pestering the publisher about a one-track option because, and correct me if I’m wrong, there is no good reason for the cuts to look like that. In my opinion it is a problem with their order of operations.

Stay safe and have a good day everyone o7