Automatic error (click/pop) removal using a Neural Network (AI)

Hello Audacity community,

I am working on a personal project where I am coding a Neural Network (NN) (https://en.wikipedia.org/wiki/Artificial_neural_network) to perform automatic “cleaning” of ripped vinyls. Briefly but, a NN is a program that automatically learns to map an input to an output. In this case the input would be in the form of audio recordings with errors (clicks, pops, noise, etc.), and the output would be the same recordings without errors. The NN would therefore have to learn how to turn a new, never seen before, sample with an error into a cleaned version of it.

The project is more of an exercise than anything practically useful. The final outcome would be a program that will be able to automatically detect errors in a ripped vinyl and correct them, without any human input. I know that there are already a few good tools to perform vinyl cleaning (I’ve read good things about ClickRepair) but again, this would be more of an exercise than an attempt to replace any already existing solutions. Below is a simple scheme of how the program would work when correcting errors:

model_workflow.jpg
To start, I am concentrating on repairing “pops” and “clicks”, especially the very annoying ones that could damage speakers.

I have already coded most of the program, however, I lack two fundamental ingredients:

  1. some basic knowledge of what exactly is a pop or click
  2. a large quantity of “dirty” recordings paired with their “cleaned” counterpart, to train the model on

So far I have been using the NN on a single vinyl track, just to test out how everything works. I have done this by manually repairing the clicks and pops using Audacity’s Repair effect. Unfortunately, perhaps due to my inexperience in cleaning ripped vinyls, my “repaired” version has indeed removed most of the pops, but it has also replaced some of them the with new errors, specifically with faint “beeps” where a pop used to be. This has the consequence of “teaching” the NN to (sometimes) replace pops with faint “beeps” rather than removing them altogether. Furthermore, the time required for this operation makes it feasible to gain some preliminary data in order to test the NN, but not remotely enough to teach it to generalize to completely new audio samples.

One last addition, if the beeps and pops were reproducible manually, one could also perform the inverse operation: introduce errors on a clean recording by synthetically adding them via a program. This solution would indeed increase the speed of obtaining new data, but would have to be used in conjunction with the previously described “traditional” method, so as not to train the NN model too much on artificially produced errors (which, even if similar to the real errors, would probably still be slightly different).

I therefore ask the following questions to the Audacity community:

  1. Would anybody be willing to share .wav files of before and after click/pop removal?
  2. Could anybody share insight on click/pop removal, therefore possibly explaining why my results so far replace the pops with faint “beeps”?
  3. Could anybody help me with some insight as to what a click/pop is, therefore making it possible to reproduce these artificially on clean audio recordings?

) some basic knowledge of what exactly is a pop or click…

… 3) Could anybody help me with some insight as to what a click/pop is, therefore making it possible to reproduce these artificially on clean audio recordings?

Of course, this is the “big problem”. Clicks are largely “random” (they aren’t all the same) and there are often similar sounds in the recording (typically percussion). It’s really the context that makes it stand-out as a defect. i.e. Does the click occur on the beat and does it re-occur on the next beat, or is uncorrelated to the music?

A long tome ago, I read about a method that looked for high-frequency signals that decay quickly. The theory was that most musical sounds usually have a strong attack and then the sound “rings” and fades-out.

Usually the worst pops and clicks are the easiest to fix because they are the easiest to find.

  1. a large quantity of “dirty” recordings paired with their “cleaned” counterpart, to train the model on…

…1) Would anybody be willing to share .wav files of before and after click/pop removal?

I think I have a few “raw” digitized vinyl recordings. These are obscure recordings. There are no “clean” CD/MP3 versions available and I haven’t finished de-clicking them… In fact, I need to start over… They are in pretty bad shape and they were recorded on a low-quality turntable.

If there’s a way for me to easily upload/share the files I can do that, but I don’t have time to play-around with them.

You can may be able to find a few used records that match CDs/MP3s that you already own. Then you can “teach” the neural network the differences. And, you can add some additional scratches to the records. :wink:

… I had an idea once (or maybe I stole the idea) - The idea was to get two copies of the record. The If you compare the two (digital) recordings the clicks & pops should show-up as “louder” on the worst record, and you could choose the best copy moment-to moment to make a better 3rd-copy. That’s tricky because the analog-to-digital isn’t “perfect”. There can be timing differences and if you digitize the same record twice you won’t sample at exactly the same points and you’ll get different digital data, so you can’t simply do a sample-by sample comparison. But, having two records with different defects might help to train your neural network.

  1. Could anybody share insight on click/pop removal, therefore possibly explaining why my results so far replace the pops with faint “beeps”?

I have no idea why you’re getting beeps. I’ve used [u]Wave Repair[/u]. It’s not great at finding defects, but it has a handful of repair methods and in most clicks & pops and be “perfectly” repaired with one of the methods (if you can “find” the defects).

  • Copy the preceding audio (or other similar-clean audio)
  • Replace the defect with the surrounding spectrum
  • Copy left-to-right or right-to-left . (The loss of stereo for a few milliseconds is usually not noticeable.)
  • Interpolation
  • Smoothing (some kind of “adaptive filtering” I think)
  • Manual re-draw

…Usually the preceding-sample or spectral replacement works best, but if those don’t work I’ll try something else.

The project is more of an exercise than anything practically useful. The final outcome would be a program that will be able to automatically detect errors in a ripped vinyl and correct them, without any human input.

If it works it could be very practical and very useful, but of course there is shrinking demand for applications that clean-up digitized vinyl.

Thanks for the complete answer.

You can may be able to find a few used records that match CDs/MP3s that you already own. Then you can “teach” the neural network the differences. And, you can add some additional scratches to the records. > :wink: >

… I had an idea once (or maybe I stole the idea) - The idea was to get two copies of the record.

Unfortunately both the ideas of using a CD/MP3 version of the vinyl and that of using two different records that will end up having errors in different spots have the problems you rightfully pointed out: they will be too different to obtain a good match. This was the first thing I tried, but being that the NN works on 512 samples at a time, it is close to impossible to make two recordings sync so precisely, and even if synced they would still be a bit too different to compare easily. I would guess that a deeper (i.e. more complex) model could be able to work on this sort of data, but I am currently working on a simpler concept.

I have no idea why you’re getting beeps.

Just for clarity’s sake, here is an example of a click being replaced by a beep, with images of the respective spectrograms and waves:


click_vs_beep_spectrogram.jpg
click_vs_beep_wave.jpg

I’ve used Wave Repair.

I will try to play around with Wave Repair, thanks for the suggestion. In the meantime, if you’d be willing to share partially repaired recordings, that would be of great help. The only important thing to note is that I would also need the original “dirty” recording together with the “cleaned” one.

Of course, this is the “big problem”. Clicks are largely “random”

I understand, it is what I feared. I’ll try to see if there are any recurring “shapes” either in the wave or in the FFT. What I had tried previously was to simply “cut” a small segment of the wave (10-40 samples - numbers chosen arbitrarily) in order to create a defect. But these didn’t sound like clicks at all. I’ll see if I manage to figure something out.

Update:

The “beeps” I’m talking about are actually already present in the recording - they are not added by the repair effect. They were just difficult to hear because of the click in the recording, and became more evident when the click was removed (either by Audacity’s repair or by my model). These “beeps” always happen when there is a click, but not at every click. They appear in the spectrogram as a high amplitude at around 4 kHz that protracts for around 40-50ms, see image below:
audacity_2018-09-13_12-44-40.jpg
I will therefore have to study what these beeps are exactly, and how to repair them to train the model to do it too. Any insight on this front would be greatly appreciated.

hi Alelef, did you manage to proceed with your idea?

Hi Mitseuler, I did not, in the end. But just today I was thinking I might pick it back up, so after 2 years and a half I opened up this forum and read your message from last week… interesting coincidence, eh?!
I did not progress because of the lack of data (reason that lead me to ask some help here on this forum). I have improved my skills a bit since 2018 and I do have some new ideas, but I still have to “get back at it”.

Why do you ask?

The data fed to the AI may have to be language specific … https://youtu.be/lrK-XVCwGnI

Hello Alelef, did you proceed with this idea? If yes, could you please provide any insight on what procedure you used to compare the 2 files?

Didn’t really go forward, sorry. I did make a simple neural network at the time, that handled the input so that it would look like the sample output. I was missing enough data to train a generalised model, however (as is often the case with these applications).

What are you working on?

My dad is currently ripping a lot of vinyls with his brand new Audio-Technica player, would you be interested in the raw wavs or nah?

I was looking for similar stuff whether it is practically possible. Well, I came across one good reference:

Here they have tried for badminton shots. And it is from a commentary recording