Removing words from speech, and maintaining the tone

MinHooi · October 15, 2011, 2:09am

Hi, could some one please give me some tips?

Firstly, a little background on what I did. I have some audio clips taken from a movie, and had it edited to 10-seconds. I want my participants to judge the emotional tone of that clip, and not rely on the semantics to judge. In the past experiments, participants tend to categorise expletives or vulgar words with anger, and those without expletives are categorised as disgust. Is there a way to create this sound; just the tone but without the words? I am concerned if I have the words exchanged with other sounds e.g. “avatar” voice, chipmunk would reduce the quality of the tone. The ideal way would be to muffle the words, and leave the tone as it is.

I’ve attached a 5 sec clip of the original, the numerous expletives are not helping, and I want it either removed or muffled.

Appreciate any tips on how to resolve this.

Thanks.

kozikowski · October 15, 2011, 4:39am

Nothing I can do with the Equalizer or Low Pass Filter helps. I can remove almost everything that contributes to intelligibility and you can still understand what he’s screaming. I guess that’s what makes low quality radio (and most cellphone) conversations work.

I think one problem is the cadence. I once saw a violin talking with a human and it was no great stretch to figure out what the violin was “saying.” He did it all with rhythm and pitch. No articulation at all.

I’m out.

Koz

kozikowski · October 15, 2011, 4:40am

You know what you might try? Low Pass filter at about 300Hz or so and then export the voice as MP3 at low bitrate – lower than 32. Twice.

Koz

Trebor · October 15, 2011, 4:47am

Maybe “sine-speech” … http://www.scholarpedia.org/article/Sine-wave_speech

I don’t have a clue how to produce “sine-speech” on Audacity, but there is other free software to produce it … http://www.fon.hum.uva.nl/praat/

MinHooi · October 15, 2011, 8:55am

Hello Koz and Trebor,

Thanks for the replies. I’m guessing Audacity might not do the job. I have Praat and figure it out.

Koz - I tried doing the low pass filter twice at 300 Hz, and I can still hear the words clearly. You mentioned about exporting it as MP3 at low bitrate - how do I do this?

Trebor - Thanks for telling me about Praat. As to the sine-wave speech, I listened to the sample clip, and I can still hear the words but the sound has been distorted to an extent that I can imagine it would be lost when someone is uttering angry words or sobbing out words.

Min Hooi

kozikowski · October 15, 2011, 7:37pm

And in that one sentence we need to find out what Audacity you’re on. You should be on Audacity 1.3 for all the fancy-pants tools. You also need to download and install the “lame” MP3 software.

After you do that.

File > Export > MP3 > Options. You’re intended to apply the Low Pass Filter only once with 12dB or 24dB per octave. Then Export as ratty MP3, import and export it again. MP3 damage is cumulative.

You may be up against some interesting human characteristics. I can tell exactly what the cabbie outside is screaming to his traffic without knowing what the actual words are. Cadence and pitch will do it.

Audacity 1.2 is very old and no longer supported,
patched, corrected, or updated. Audacity 1.2 can
be unstable on newer computers.

Download and install the latest Audacity 1.3 from here…

http://audacityteam.org/download/

You can install both audacity 1.2 and Audacity 1.3 on
the same computer, but only use one at a time.

Audacity 1.2 will not open projects made on Audacity 1.3.

If you use MP3 or some of the more modern audio
compression formats, get Lame and FFMpeg software
from the same web site. Do not use older software
or software from other web sites, even though they
may have the same names.

MinHooi · October 16, 2011, 2:24am

Hi Koz,

I do have Audacity 1.3 installed. Thanks for your instructions and I have attached the clip below, after doing it twice. I can still hear the words. I am afraid this method isn’t working out as I hoped.

Thanks anyway,
Min

Trebor · October 16, 2011, 2:58am

Rectifying the waveform mangles the sound quite a bit …

but it does alter the frequency content.

Put the code below into the “nyquist prompt” (in Effect menu) to rectify

(snd-abs s)

kozikowski · October 16, 2011, 7:14am

You may not ever get there. By the time you eliminate all the intelligence in the voices, you may be down to earthquake rumble.

Koz

kozikowski · October 16, 2011, 7:19am

It’s amazing. Even with some weapons-grade damage to the monolog, you can still make out some of the words.

Koz

steve · October 16, 2011, 9:13pm

How about reversing the sound?

Trebor · October 16, 2011, 10:03pm

True of sine-speech, once you’ve heard it a few times …
http://www.mrc-cbu.cam.ac.uk/people/matt.davis/sine-wave-speech/

MinHooi · October 17, 2011, 9:12am

Hi Steve - Reversing the sound? How?

Trebor - the sine-wave speech link is very helpful. I can only hope that my participants are naive to everything. Unlikely The change in waveform which subsequently change the frequency is not good too. It will remove the emotional tone, while trying to remove the words.

Trebor · October 17, 2011, 9:56am

“Reverse” (i.e. backwards) is in Effect menu …

steve · October 17, 2011, 5:33pm

You could also select one word at a time and use the Reverse effect, which will produce something like this:

steve · October 19, 2011, 7:36pm

You could also try this plug-in: https://forum.audacityteam.org/t/reverso/21401/1

MinHooi · October 19, 2011, 11:50pm

Hi, I tried reversing the speech. I gave it to a person to try, and he laughed out loud. Didn’t even notice that there was anger tone. So, not good.
Thanks to Koz, Trebor and Steve for trying to help me. I will just have to figure something out.

steve · October 23, 2011, 11:45am

That’s a good idea.

A similar technique with audio would be to add a lot of “delay” or “reverb”. I think the problem with this approach will be that when the effect is strong enough to disguise what is being said, the sound will not be recognisable as speech (it’ll sound like an engine in a metal pipe) though the long term spectrum will be almost the same.

I think the interesting thing that has come out of looking at this question is the huge amount of damage that can be done to speech before it becomes incomprehensible.

How about obtaining some samples from a foreign language film?

Trebor · October 23, 2011, 1:34pm

LPC10 relies on that : it’s the bare bones of speech …

[LPC10 via SoX, very speak-n-spell]