Let’s say I walk into a large empty room with a recorder running at one end, stand in the location of a vocal performer who will arrive later and clap. Once. Now let’s say the voice performer shows up – by herself – and starts singing. Using the clap, we can cancel out the room echoes, right?
No audience, no additional noises, and nothing else changes.
I have tried to reduce echo by spectrally subtracting a time-shifted version of the original echoey recording from itself ,
(the time shift equal to the echo time). This does help a bit, but adds another very faint distorted echo at twice the original echo time.
If there were multiple echoes with different times, and echoes of echoes, this time-shifted spectral-subtraction technique would not be effective.
{I think so-called “echo reduction” is an expander/gate which reduces the echo during “silent” spaces between words where it is most conspicuous, a.k.a. squelching. This would not reduce echo/reverberation while the performer was vocalizing though.]
in theory it might help but
will a hand clap get all room resonances, and echoes at the right levels? maybe a ballon pop would do it better.
if the singer is a different size, stands a little differently, moves, that can impact the result too.
wouldnt you also have to have the singer’s voice projected along the same path as the hand clap. the hand clap would tend to force air pressure up and down, from near waist level. the singers voice would be forward, and from a different height. depending on the angle of the singer, the amount of echo would vary too dependign on how the voice bounced around the room.
Here’s a before-after example of echo reduction using the time-shift spectral-subtraction technique I described above.
The “before” has had plenty of synthetic echo added
It does remove a lot of the echo but you can hear spectral distortion where it bites the echoes out.
This before example used synthetic echo, the technique won’t work so well on real echos: i.e. multiple echos each with different echo times.
[here is the original Freesound sample which I applied the overdose of echo to]
It’s an engineering article of faith that if you had an impact noise before a live musical performance, it would be a relatively simple matter to make the room vanish, or at least suppress it to the point where the performance would be usable.
OK. Go for it. I’ll make one of these tracks for experimentation, but I suspect I can take my time. There are no common tools that can do this, engineering theory notwithstanding.
This effect/process is sometimes refered to as “deconvolution” or “deverberation”. It is certainly not an easy thing to achieve, though the open source project “Postfish” has attempted an implementation.
*** Deverb/Echo Cancellation:** > Echo Removal Tool/Deverb slider. This could improve back-of-the-room recordings. Sliders are “Liveness,” “Room Size,” “Room Oblongedness,” and “Dirt” (same as sliders in an Echo Generator). (6 votes)
Deverb is exceptionally difficult and virtually unknown in audio software except Postfish.
o Can’t both deverb and echo cancellation be implemented by convolving the signal with a modified version of the room’s impulse response?
o No, those operations can’t be performed by convolution, this is the case where deconvolution (aka inverse convolution) has to be used (well, not always).
I don’t know that I agree with the spectrum analysis part of the idea. If you have one very hard wall behind the performer, then you’ll get one really clear echo. The spectrum doesn’t change, just the amplitude and time. In an enclosed room, the clear echo bounces multiple times, but does not change spectrum. Just time, amplitude, and number of instances.
You can get actual spectrum disturbances by, for example, yelling really loud in front of a Helmholz resonator which would then ring at the frequency it likes, not always the frequency that arrived. The yelling into a churchbell thing. Very few bathrooms and conference rooms feature Helmholz resonators (or churchbells), so it’s probably safe to assume a constant spectrum.
I can think of a really painful way to do this. Figure out what the original impact sound was and select a unit of time delay. Generate the original sound at many different amplitudes and delays (sweep both variables) and keep the ones that reduce the “echo.” Rinse and repeat. Assume quiet echoes below a certain loudness are insignificant. Apply the resulting laundry list of delays and amplitudes to a singer in the same room.
The spectrum will change due to materials absorbing different frequencies to different degrees - this will not affect the frequency of the spikes in the spectrum, but the amplitude of the spikes will change as the reverberation decays.
Except at very low frequencies.
Ouch, that IS a painful method
I’ve used an “Echo Chamber” effect (in another audio program) that simulates room reverberation by a kind of audio equivalent to “ray tracing”. The results were very realistic, particularly when combined with a little “reverb”, but the processing was painfully slow because of all the iterations.
A less painful method would be to create a reverb profile (from the hand-clap/impulse) that maps frequency decay (amplitude change at various frequencies) against time. You could then assume that for each sound in your vocal recording there will be a reverberation that will decay according to the reverb profile. Looking at a short time period, it can then be calculated from the reverberation profile that certain frequencies will be present in amounts that are recorded in the profile in later periods. so for each sound in the recording, a dynamic filter could be constructed to reduce the level of those frequencies when the appropriate time period is reached. Essentially it is dynamically expanding according to the expected reverberation (as calculated from the impulse reverb profile).
Yes, but. You can’t have any tool that needs a zero start point because real life happens continuously. That’s why I picked an amplitude/delay series of profiles instead of trying to rip sound apart into frequency bands or analysis characteristics. That might work with one hand clap, but might not if I clapped my hands multiple times really fast (after the single profile capture). You need to, in software, reconstruct the room in manageable tools sets and in effect, send the sound through backwards.
I wouldn’t cringe if we ignored or filtered out frequencies below 100Hz or so and thresholded any sound below a certain level. I think that’s a small price to pay to deliver a voice track that in all other respects would be garbage.
If you have a stereo recording you can remove the echo/reverb by centre pan isolation (Kn0ck0ut)
The voice is common to both channels but typically the echoes reach the different channels at different times so are removed by centre pan isolation, (a.k.a. centre pan extraction).
That’s a tough one - too much background noise.
I’m still working out what some of the settings do (there’s no instructions for Postfish) so Postfish is probably capable of better even with as tough a sample as this.