Implementing a VAD in Audacity

Audio software developers forum.

If you require help using Audacity, please post on the forum board relevant to your operating system:
Windows
Mac OS X
GNU/Linux and Unix-like

Re: Implementing a VAD in Audacity

Permanent link to this post Posted by Alonshow » Thu Aug 04, 2016 11:00 pm

I know this is a very old thread, but I'd like to know if you had any success with your project. It is just what I am looking for, it would save me an awful lot of time.
Alonshow
 
Posts: 40
Joined: Sat Oct 29, 2011 4:12 pm
Operating System: Windows 7

Re: Implementing a VAD in Audacity

Permanent link to this post Posted by kozikowski » Fri Aug 05, 2016 12:14 am

I show no account activity since 2009.

Did you try any of those programming links or Nyquist tools. We could use a few good Nyquist programmers.

I can guarantee your popularity simply by being able to code in Nyquist.

Let us know.

Koz
kozikowski
Forum Staff
 
Posts: 38230
Joined: Thu Aug 02, 2007 5:57 pm
Location: Los Angeles
Operating System: OS X 10.9 Mavericks

Re: Implementing a VAD in Audacity

Permanent link to this post Posted by Alonshow » Fri Aug 05, 2016 3:17 am

kozikowski wrote:I can guarantee your popularity

Really, will I get laid and all? :D

Seriously, I'm not sure if what you propose is very realistic. Off the top of my head it requires me to learn the Nyquist language, its development tools and environment, the basics of the Audacity code architecture, the VAD theory, and a suitable algorithm. Only then would I be able to start programming a plugin, with all the coding, testing and management that involves. The whole thing sounds like it would take months, maybe years. It seems more sensible to look for a program that is already created. I know that such programs exist, what I don't know is whether they are available to the public, since VAD is mainly useful for big companies.
Alonshow
 
Posts: 40
Joined: Sat Oct 29, 2011 4:12 pm
Operating System: Windows 7

Re: Implementing a VAD in Audacity

Permanent link to this post Posted by Robert J. H. » Fri Aug 05, 2016 12:46 pm

Alonshow wrote:
kozikowski wrote:I can guarantee your popularity

Really, will I get laid and all? :D

Seriously, I'm not sure if what you propose is very realistic. Off the top of my head it requires me to learn the Nyquist language, its development tools and environment, the basics of the Audacity code architecture, the VAD theory, and a suitable algorithm. Only then would I be able to start programming a plugin, with all the coding, testing and management that involves. The whole thing sounds like it would take months, maybe years. It seems more sensible to look for a program that is already created. I know that such programs exist, what I don't know is whether they are available to the public, since VAD is mainly useful for big companies.

It's not as bad as that...
Nyquist is an interpreted language, i.e. the source code is at the same time the execution code--a plain text file or the content of the Nyquist prompt.

I'm pretty sure that I could write a VAD plug-in in a couple of hours.
However, it will be an offline algorithm and not real-time.
Do you have any special requirements?

Robert
Robert J. H.
 
Posts: 1813
Joined: Thu May 31, 2012 8:33 am
Operating System: Windows 7

Re: Implementing a VAD in Audacity

Permanent link to this post Posted by Alonshow » Sat Aug 06, 2016 1:02 am

Robert J. H. wrote:I'm pretty sure that I could write a VAD plug-in in a couple of hours.
However, it will be an offline algorithm and not real-time.
Do you have any special requirements?


Wow, that's so generous! I don't want to abuse your generosity. As far as I know, only two Audacity users have ever expressed interest in this, and the other one hasn't been active in seven years. Still, I answer your question in case you want to do it anyway:

Robert J. H. wrote:Do you have any special requirements?

I don't think so. I don't need it to be real time, I just want to process my recordings, so offline is fine. Some of the recordings have only my own voice, some have the voices of several people. Some have a lot of noise in the background, some have a silent background. I've used different recorders, so I have several formats, including wma, mp3, amr, and aac. But any format would do, of course, because I can always convert between formats.

Needless to say, if you decide to do it I would be happy to assist you in any way I can. In any case, thank you for your interest! :)
Alonshow
 
Posts: 40
Joined: Sat Oct 29, 2011 4:12 pm
Operating System: Windows 7

Re: Implementing a VAD in Audacity

Permanent link to this post Posted by androclus » Fri Aug 19, 2016 12:02 am

okay, #3 here.

i record a brilliant lecturer, and post the lectures/dialogues online for free.

but unfortunately the surroundings are less than ideal (refrigerators, chimes, birds, airplanes, garbage trucks and street sweepers, coffee pot, etc. etc. etc.)

there are obvious recording strategies i have taken (better mikes -- especially dynamic -- and placed closer, etc.)

but then in editing the recording in audacity, to clean up and boost the signal-noise ratio for listeners (who will often be listening in their cars, without headphones, and in other less-than-ideal listening environments such as coffee shops), i also often use effects such as dynamic compression (the 3rd party one detailed at https://theaudacitytopodcast.com/chriss-dynamic-compressor-plugin-for-audacity/), noise reduction, low-cut / high-pass filters, and even simple de/amplification.

however, as far as i can tell, these tools are all based in various ways on amplitude and frequency. i would like something that would (again, not in real-time) simply reduce to 0 amplitude (silence) all sections which did not have voice detected. THEN, once that was done, any effects/filters which i applied (like those listed above) would obviously work MUCH better, because all the intervening junk (between voice segments) would be gone. (of course, the junk is still there DURING the speech segments too, but that is a different issue, and i can deal with it).

i myself had thought of programming something in nyquist (i do love the elegance of lisp's, and emacs can make the matching parens colored), but i am super busy with tons of other projects. but it does sound (if i could find the time) like it would be a great learning experience, and a wonderful way to learn about audio. but then again, if someone programmed a Nyquist VAD already, i wouldn't complain. :D

please let me know if anyone is still working on one.
androclus
 
Posts: 3
Joined: Thu Aug 18, 2016 11:48 pm
Operating System: Linux Debian

Re: Implementing a VAD in Audacity

Permanent link to this post Posted by steve » Fri Aug 19, 2016 12:48 pm

androclus wrote:all sections which did not have voice detected.

That's the hard part. Your computer has no idea whether the audio data is a voice, or a TV, or car horn, or probably even a spreadsheet. All it sees is "data".
Assuming that the data is a valid audio signal, we can analyze certain properties quite easily. Peak amplitude is one of the easiest to detect. Approximation of the frequency spectrum is more difficult but possible. Automatically detecting whether a voice is a "live recording" or a TV show is virtually impossible.

androclus wrote:i would like something that would (again, not in real-time) simply reduce to 0 amplitude (silence) all sections which did not have voice detected.

A simple approach is to use a Noise Gate. This operates on peak level, so the assumption is that if the peak level is above a specified threshold, then the voice is present.
There is a Nyquist Noise Gate available here: http://wiki.audacityteam.org/wiki/Nyqui ... Noise_Gate

This Noise Gate could be modified to better identify voices by pre-filtering the audio so as to reduce frequencies that are outside of the (main) range of voices, for example, with a 300 to 3000 Hz band-pass filter.
Code: Select all
(highpass2 (lowpass2 signal 3000) 300)  ;"signal" is the audio to be filtered.
9/10 questions are answered in the FREQUENTLY ASKED QUESTIONS (FAQ)
steve
Site Admin
 
Posts: 45000
Joined: Sat Dec 01, 2007 11:43 am
Operating System: Linux *buntu

Re: Implementing a VAD in Audacity

Permanent link to this post Posted by Robert J. H. » Fri Aug 19, 2016 2:54 pm

androclus wrote:okay, #3 here.

i record a brilliant lecturer, and post the lectures/dialogues online for free.

but unfortunately the surroundings are less than ideal (refrigerators, chimes, birds, airplanes, garbage trucks and street sweepers, coffee pot, etc. etc. etc.)

there are obvious recording strategies i have taken (better mikes -- especially dynamic -- and placed closer, etc.)

but then in editing the recording in audacity, to clean up and boost the signal-noise ratio for listeners (who will often be listening in their cars, without headphones, and in other less-than-ideal listening environments such as coffee shops), i also often use effects such as dynamic compression (the 3rd party one detailed at https://theaudacitytopodcast.com/chriss-dynamic-compressor-plugin-for-audacity/), noise reduction, low-cut / high-pass filters, and even simple de/amplification.

however, as far as i can tell, these tools are all based in various ways on amplitude and frequency. i would like something that would (again, not in real-time) simply reduce to 0 amplitude (silence) all sections which did not have voice detected. THEN, once that was done, any effects/filters which i applied (like those listed above) would obviously work MUCH better, because all the intervening junk (between voice segments) would be gone. (of course, the junk is still there DURING the speech segments too, but that is a different issue, and i can deal with it).

i myself had thought of programming something in nyquist (i do love the elegance of lisp's, and emacs can make the matching parens colored), but i am super busy with tons of other projects. but it does sound (if i could find the time) like it would be a great learning experience, and a wonderful way to learn about audio. but then again, if someone programmed a Nyquist VAD already, i wouldn't complain. :D

please let me know if anyone is still working on one.


Sorry for not replying back (to Alonshow).
I haven't forgotten the project but as you (androclus) say, we are all busy in one or another way.

I've accumulated some code snippets in order to extract some audio features, such as:
- zero crossing rate
- fundamental frequency
- RMS/Peak/Crest
- linear prediction error
- Spectral features

It might be worthwhile to follow an established standard, such as GSM 729 (if I don't err), at least as one algorithm choice.

The spectrum of possible algorithms is very wide, from simple energy/ZCR processing to something that is almost speaker recognition.
More sophisticated algorithms do often require sample data (with all segments properly labelled as voiced/unvoiced/noise/silence) and training.
With or without that, finding the proper threshold for the feature vectors is the crucial part of any VAD.
Robert
Robert J. H.
 
Posts: 1813
Joined: Thu May 31, 2012 8:33 am
Operating System: Windows 7

Re: Implementing a VAD in Audacity

Permanent link to this post Posted by Alonshow » Sat Aug 20, 2016 11:10 pm

I created another thread in the Adobe Audition forums in the hope that Audition could provide something similar to what we are looking for: https://forums.adobe.com/message/8951906. Unfortunately, it doesn't. Still, I got some interesting replies. I'll try to summarize them (and hope I got them right):

  • This is an easy operation when the signal to noise ratio is high. Both Audacity and Audition provide simple tools which will detect the signals with an amplitude above a certain level. One of those tools is called Noise Gate.
  • Unfortunately that solution doesn't work when the levels of the noise are similar to the levels of the voice. In that case, the operation becomes much more complex. Still, plenty of software exists that performs this kind of operation. However, it's not clear whether that kind of software exists in a form that we can use for the purpose described here, i. e., processing an audio file and providing an output with the sections of that audio file that contain speech.
  • An example of that kind of software would be CMU Sphinx: http://cmusphinx.sourceforge.net/. This is an open source toolkit for speech recognition developed by the Carnegie Mellon University. It seems to provide, among many other things, the kind of functionality we're looking for. I have asked in their forum about the possibility to use it, but the reply I received and the information I have found so far are way beyond my very limited skills: https://sourceforge.net/p/cmusphinx/dis ... /e31404b4/.
  • There are several public papers that discuss how this kind of software operates, for example: http://www.ece.umd.edu/merit/archives/m ... t_Kola.pdf. These papers only provide the theory, though, not a practical implementation.

It looks like the operation of VAD depends heavily on the kind of audio been processed. I have created a sample recording which might help in identifying what I'm looking for. It's a 14 minute long recording of a person who talks in his sleep. In this sample he talks for the first 8-9 seconds of the recording. The rest of the recording is noise, which in some parts is louder than the voice segment. The spectrogram view of the first few seconds shows a clearly recognizable pattern of human voice:

Voice spectrogram.png
Voice spectrogram.png (411.95 KiB) Viewed 4629 times


On the other hand, the spectrogram of one of the loud noise segments shows a very different pattern (or lack thereof):

Noise spectrogram.png
Noise spectrogram.png (417.92 KiB) Viewed 4629 times


Hope this helps.
Alonshow
 
Posts: 40
Joined: Sat Oct 29, 2011 4:12 pm
Operating System: Windows 7

Previous

Return to General Audio Programming



Who is online

Users browsing this forum: No registered users and 2 guests