Speech segmentation (not recognition!)

Share your Audacity/Nyquist plug-ins here, or test drive the latest plug-ins submitted by Audacity users.

After testing a plug-in from this forum, please post feedback for the plug-in author.
Post Reply
Paul L
Posts: 1788
Joined: Mon Mar 11, 2013 7:37 pm
Operating System: Please select

Speech segmentation (not recognition!)

Post by Paul L » Tue Apr 23, 2013 3:45 am

Update 23 April 5 PM EDT: Here it is, with an explanation of the many controls.
Update 26 April 11 PM EDT: Fixed a performance problem. Now it should scale to selections of a few minutes in length.

I'm developing an experimental Nyquist plug-in to take recorded speech and put labels around consonants and vowels. Preliminary work shows promise. The tool will have a dialog with lots of sliders for tuning parameters. I haven't discovered the best tunings.

The goal is only segmentation, not recognition: putting boundaries between speech sounds ("phones"), not identification of those sounds.

As a later goal I might implement CRUDE recognition distinguishing vowels, stops, sibilants, and pauses, and apply effects selectively to different segments. But first I need reliable segmentation.

Who's curious to play with it too?

Try the default settings first on some speech, then read the explanations, then play around.
  • Action: Make labels, or draw curves that may assist you in the selection of better parameters. To draw curves, it is best to make a mute duplicate of the track and apply the effect. A "sound" graphs the data for you, which you can view in linear scale with Waveform or logarithmic with Waveform (dB).

    Next six controls determine the function that is computed.
    • Discard: lets you throw away high frequencies from consideration. This has a noticeable effect on the expensive inner loop of the computation.

      Percentile: Find the frequency at this fixed fraction of the summation of the power spectrum, for each FFT frame.

      FFT window length, skip length, window type: Familiar to users of snd-fft. A longer window will distinguish a finer scale of frequency values but at the expense of less precise detection of changes in time.

      "Smoothing window," if at least twice the skip length, applies a convolution (to the logarithm, that is, the curve as it appears in waveform-db view) after everything else is computed and its width can vary independently of the FFT window. (Increasing the FFT window increases the resolution of the vertical scale, reducing the problem of sudden steps in the low end of log-frequency, but loses precise time resolution and adds computational expense, so let's try a simple post-processing convolution instead.) This might remove extraneous boundaries that are detected with lower sensitivity thresholds.
    Next controls choose a criterion for finding boundaries from the function.
    • Derivatives: If finding boundaries or labels, take either the rate of change, or rate of change of rate of change, of the (smoothed) logarithm.

      Threshold: Expressed in octaves per second (for one derivative) or per second squared (for two), find where the absolute value makes rising crossings of this threshold. Much larger numbers are needed for useful thresholds with two derivatives.

      Multiples: if drawing Boundaries, draw lines of multiple heights to show triggerings of multiples of the threshold. Best viewed as Waveform. Helps indicate how strongly marked the boundaries are. May assist in deciding on a threshold -- set a low value with many multiples, see what is just sensitive enough to get the intended boundaries and avoid extraneous ones.
    Minimum length of labels: controls the discarding of boundaries that come "too close" together. Label boundaries are then refined to zero crossings. (Boundaries view does not have zero crossing refinement applied.)

    Vertical scale: what frequency corresponds to 1.0 in Graph view.
Attachments
Segmentation.ny
First version 23 April 2013
(17.36 KiB) Downloaded 183 times
Segmentation.ny
Improved performance 26 April 2013
(17.8 KiB) Downloaded 280 times
Last edited by Paul L on Sat Apr 27, 2013 3:20 am, edited 4 times in total.

steve
Site Admin
Posts: 81653
Joined: Sat Dec 01, 2007 11:43 am
Operating System: Linux *buntu

Re: Speech segmentation (not recognition!)

Post by steve » Tue Apr 23, 2013 1:20 pm

I'm interested in "beat detection" for music, so we may have some cross-over of interest.

Paul L
Posts: 1788
Joined: Mon Mar 11, 2013 7:37 pm
Operating System: Please select

Re: Speech segmentation (not recognition!)

Post by Paul L » Tue Apr 23, 2013 4:50 pm

I thought there exists a "beat finder" already in the Analyze menu. I haven't tried it or read it.

I will update the original posting of this thread with an attachment when I've cleaned it up enough to share it which should be soon. I will likely update the attachment often. Older experimental versions might not be worth keeping up if fixes are minor.

Post Reply