Recording dialogue in the wilderness

I work for a wilderness school and we are making youtube videos and video accompaniments to our books. Much of what we do is taping people sitting in a circle and talking, or standing around and putting on/FAQ’ing a workshop.

We have a lot of footage already stored, and much of it is pretty rough to work with. There are mosquitos (some flying right by the mic - LOUD!), birds, a crackling fire, and who knows what else mucking up the dialogue. I thought I could get away from learning how to manipulate frequencies and do noise reduction by synchronizing our external voice recorder, but - alas, in some instances it does not make a difference.

So far I’m been looking for a program with a good live-action spectrum-meter so I can learn what freq. the different noises are at (e.g. insects versus birds versus plane overhead versus different human voices). WaveLab and Audacity both have decent meters it seems. I should add, I work at a small non-profit, and investing (time and or money) in getting the big Adobe Products (like Audition) like overkill.

My big question is: is it really feasible to reduce or eliminate nature sounds for an amateur such as myself? My perception so far is that most automated noise removers just pick up a frequency set (like the hum of electricity or the camera hardware) and ditch that. But what about things with varying frequencies like insects or bird songs? Will there be too many artifacts to keep track of? Should I just post-amplify the hell out of it and try to get rid of the resultant post-amplified hardware noise?


You can’t take out interference frequency by frequency like a shooting gallery. And no, Audacity doesn’t have any real-time tools.

I think your only choice is to get close to the microphone.

Mic each performer and connect to a field mixer and then on to a computer or recorder. The Big Kids would be using radio microphones, but I’m guessing you’re not willing to spend the bux.

Recording voices clearly in a hostile environment is not easy. You can’t just plunk a microphone down in the middle of a group and call it good. It will pick up everything 360 degrees and up in the sky. Possibly you already figured that out.

There’s a perception problem where people naturally think we should be able to design microphones that only respond to voices and ignore everything else. “My ears can do it, what’s the problem?”

Cellphones come remarkably close to doing that. That’s sometimes what they’re doing when a voice sounds bubbly and honky in the middle of a call. It’s not very theatrical.

Sometimes I’m thinking about the problem as I write this (but don’t get your hopes up).

Make everybody record their own voice on their cellphone Personal Recorder. You can make up a nice flannel pouch so the cellphone hangs on their chest or on their shoulder. Even better if the performer has a cellphone headset (real headset, not one of those things with the microphone half-way down the earphone cable. Those are abominable.)

Collect all the sound files at the end of the performance and mix them into one show. Audacity can do that before breakfast.

Import Alice.
Import John.
Import Jimmy.
Import Tony.

I haven’t tried it yet, but I’m supposed to be able to plug a microphone into my iPod. I have an early model and it’s not a gift to technology. I Googled it once and not only did they not laugh at me, they told me how to do it!! So that’s another option.

Nobody is going to help what you’re doing in post production. You have to shoot it.


Quick note if you go that way, provide a sync point. You know that clapboard thing the movie people use?

It’s job is to make a noise so the sound people can hear it and provide a moving arm so the camera can see it for sync later.

Make sure everybody is recording and say “Sound Mark” and bang your hands together. If you’re loud enough, each microphone should hear it a little. Later, move each sound clip until the claps line up. Sync may not last for hours, but it will work a lot better than what you’re doing now.


Audacity can display a recording as a multi-coloured spectrogram , rather than a waveform …

If you must have a real-time spectrogram : one which responds instantly to sounds picked up by a microphone, (which I don’t think will be of any use to you), you can get a free one for Windows computers here …

(e.g. insects versus birds versus plane overhead versus different human voices)

I can save you a lot of work. They’re going to turn out remarkably similar, or if not that, they’re going to overlap so much that it doesn’t make any difference.

This is a single piano note. G down there on the left somewhere.

Note not only is there a single tone at G (48.9994), but also rich harmonics and overtones periodically way up beyond that. This single note is still going strong at 3KHz which is in the middle of the voice range. So as low as it is, nobody would be able to separate this note from a voice track.

Contrast that with my voice. The illustration is my voice saying seven words into a high quality microphone.

It’s from this short track.

That’s why we say once you shoot it, anything that appears in your track is now a permanent performer whether you want them or not.

Screen shot 2014-01-12 at 9.44.51 PM.png

Thanks guys…hrm. So everything affects everything basicaly? I can’t remove one thing without removing another?

Why am I able to remove slight ambient hum (equipment hum) through noise cancellation, but not natural sounds? Because they are too random?

Two more ideas:

  1. Identify the frequency of the offending noise (like a cicada which is mostly at 8,000-10,000) and at least lower that. Is this only going to be possible with stuff way off the normal voice frequency, because otherwise it’ll dull out the voice?

  2. Post-amplify the heck out of everything. Which will makes bugs, planes, firecrackles louder too, but at least I’ll be able to hear the voice. And then see how good a job noise-removal can do of just dulling the hum of the ambient camera/whatever hum (not the individual noises though).

Thanks guys…hrm. So everything affects everything basicaly? I can’t remove one thing without removing another?

You can retire to Bermuda if you come up with a good way to split a mixed track accurately into individual performers and instruments. It’s a very popular request.

I point to two examples that even fool your ear. The Vox Humana (human voice) stop on the organ and the Charlie Brown TV shows where a trombone substitutes for the voice of the grownups. You can get a violin to do that, too. In those cases there is no tonal difference at all.

Noise Reduction works in two steps. Profile which “tastes” a sample of the interference and the application step, where it tries to remove that exact taste sound from the show. If the noise moves, you have to take another taste. This kills removing a jet going over, the metrobus starting up outside the window and recording inside your moving car. And the TV running in the next room.

You are recording in without question a very hostile environment. You will never fix it in post production. The only way is to carefully and closely mic the performers.

I would kill to get a picture of this, but when radio shows like This American Life do one of their perfect interviews, they’re doing it by taking a special long-distance microphone and jamming it in the speaker’s face. There’s nothing subtle about it and sometimes it takes the speaker a second to recover from the personal space intrusion.

I’m not making this up. They wrote about how they do it.

I’ve never actually done it this way, but I think the Personal Recorder on each cellphone is worth a shot. Without some technique like this, you have no show.


You can certainly make everything louder. Effect > Amplify [OK]. But as you suggest. That just brings everything up, and if you have really wild sound, it may not go up very far.

All “Amplify” does is increase volume until something somewhere starts to overload. If you have a really wild track, something may already be overloading so the tool will do nothing.


I’m reading it again. As in one of the posts, you have to come up with a Profile or sample of noise by itself. If you’re doing this with a wild, hostile track, there may not be anywhere in the show with the hum or buzz alone. Anything captured in the Profile will be active during the removal.

Another note. There is a provision in Noise Reduction where the reduction action goes around voices. This is set with Fequency Smoothing. This helps prevent Martian and Outer Space voices. However, the noise is still there behind the voices, so you could get froggy and birdie voices over a quiet background.


Noises which only occupy a very narrow constant frequency range can be attenuated with a notch filter : like hum, some insects , some frogs and vuvuzelas.

Wind, fire and flowing-water occupy a broad-band of the sound spectrum so can’t be filtered out in the same way.

There is a test signal called “pink noise.” I call it “Rain in the Trees.” This test signal has all audio tones in it. That’s its job. If you try to reduce rain in the trees, microphone hiss or tape hiss from a show, Noise Reduction will try to remove the whole show.

Noise Reduction is not the cure-all that it seems. That’s why we object to calling it “Noise Removal.” I don’t think it’s ever removed all noise.


We basically work in two domains: time and frequency.
The noise removal takes only the latter into account.
How do you make the recordings? Are the files in stereo or made from two recorders in mono? I ask because the actual sound source location can serve as a tracking key too.
Have you any links to such sample dialogs?

As far as source location…I know and have the camera one was taken on. I guess that would be in stereo?
And the other one is from a VHS conversion of a 10 year old tape, made by someone else, although someone in my organization did the converting.
Why does this matter?

-The one shot with our camera. It has a fire in the middle of a circle of people talking, and some bird/nature noises. I’d like to wean the fire…I kicked out every frequency 10k and above and that seemed to take the edge off, and without dulling the rest of the sound - it seemed. I wonder about amplifying the frequencies that voice is focused on, but I haven’t done this successfully yet.
-The VHS conversion. This particular segment has a lot of bird and insect noise, and the woman is talking low and not into the central mic. This is the worst of the worst. I’m guessing I can cut the highs and up the mids like above.

I’ve got an error (404) while opening the links.
The source location is as far helpful as sounds outside the center can be attenuated separetly.


How do you mean sounds outside the center? What is the center and how can it be located on the track?

On sample 2 You can notch out the mains hum for a start …
notch filter 150Hz.gif
then applying a wide notch at 5800Hz cuts back the bird noise …
notch filter 5800Hz.gif

The center is the part of the sound that’s directly in front of the stereo microphone.
The device is most often such placed that the speakers are in the center. A bird that sings from the left can therefore be discarded by isolating (or better emphasing) the center.
There’s a quick test to check if the sides (hard left or hard right) contain important speech:
Use the “Vocal Remover” from the effect menu. If the resulting sound has less or no speech, then the center isolation will increase the audibility.
For your first sample, the second speaker disappears after this effect. The first speaker is actually on the left side alone. Thus, instead of left + right, we can assembly the stereo track from left + center and delete the right side.

Here’s the second sample as it has been modified by Trebor. The last two stages are the stereo center alone and all that has been deleted (i.e. the side channels).

That’s a very clean centre-isolation : I can’t hear any digital artifacts , is that rjh-stereo-tool.ny ?
[ I use kn0ck0ut for that which usually adds some artifacts ]

Yes, it is indeed.
I’ve installed the kn0 plug-in too, just to compare with my own tool. Most often, it fails.
The side suppression is sometimes better but the artefacts are unproportionally noticeable.
I think that the tool uses bit masking for the source separation (i.e. either 0 or 1), whereas my tool uses soft masking.
It is in principle possible to create a filter for musical noise. Those artefacts are often isolated bins that have a short duration (window length of 4096 samples for instance).
It is a big field for experimentation. I currently try to attenuate percussive sounds. It works fine for bass and snare drum so far.