Remove a specifc word in the entire file

Sun · August 28, 2018, 10:30pm

Hi,

This is my first day using audacity and I already started to love it. One technical question. I have a long audio file of a speech. The speaker likes to use “er” at the end of every sentence. I manually removed a lot of them, but there are many more. Is there a way to search the entire file with a sample of this sound specified, and replace it with no sound? In other words, can we remove every single one?

Mnay thanks

steve · August 29, 2018, 7:57am

Not really.
You may be able to see some of them by looking at the waveform for gaps between sentences that have a little blip corresponding to the “er”. That’s probably the best / quickest clue, other than listening to the entire recording.

Sun · August 29, 2018, 11:18am

I can probably see them, but there are hundreds of them. I already removed about half of them manually. Can we have a way to automatically pick up this and replace with silence?

steve · August 29, 2018, 12:49pm

That would be far beyond the capability of an audio editor (and no, Audacity cannot do that).

Consider these two phrases:
“It was a beautiful summer”
“Add the two numbers to get the sum er”

For a computer to be able to tell the difference between the “er” at the end of the first phrase, and the “er” at the end of the second phrase, it would not only need to be able to recognise the sound “er” (which in itself is extremely difficult), but would also need to understand the context in order to understand that in the first phrase it is the end of a word, and in the second phrase it’s just a filler sound. Computer’s are great at manipulating text and numbers, but rubbish at “understanding”.

DVDdoug · August 29, 2018, 3:16pm

This probably could be done by combining speech recognition and audio editing but I don’t know of any application that does that. But, you’d still need to check (and probably “tweak”) the results.

Audio (and video) editing usually requires human interaction and human judgement, and it’s time consuming… I usually figure a minimum of 3x real-time. That’s listening-through once before editing, once after editing, plus actually doing the editing.

Sometimes it can go faster if you need to run some processing/effects on the whole file, and you might not need to listen to the whole thing before you start, but more often it takes longer if you want to do the best job possible. For this project, I’d estimate 4X. That is, if it’s a 1-hour speech, I’d expect to spend 4 hours. But as a general rule, everything takes longer than expected.