My narrator's pause-trimmer

Steve, the tool I was talking about, critiques welcome. My preliminary tests are promising.

Preparation: speak some sentences leaving an excessive pause between.
(Perhaps apply a high-pass at 20Hz to remove any subsound with noticeable amplitude in dB view.)

Parameters (which I may hard-code in my own practice): a “Resolution” more or less determining a frequency floor for on-glide sounds; a “Loud” threshold defining the onset of voice; a lower “Quiet” threshold determining on-glides (like the breathing of letter h) before “voice” that should be preserved.

Usage:

  1. Select a range including the end of one sentence and the start of the next.
  2. Play, then “Stop and Set Cursor” just where you judge the next sentence should start. (Ctrl-A by default but I bind it to G.) (Note that the right edge of selection is unchanged and the left has moved rightward.) Mentally note what time you stopped at.
  3. Invoke the effect. (Nothing happens yet but the selection length is remembered in a global.)
  4. Move left edge of the selection leftward “somewhat.” (At least as long as any on-glide to the voice.)
  5. Invoke the effect again.

Result: some sound is deleted from the selection, bringing the onset of “loud” voice forward to the place noted in 2, but preserving any on-glide up to the voice that crosses the threshold and remains above it. The tool also allows some error in the length of the deletion (1/“resolution” frequency) at each end to make a neat deletion at 0-crossings.

Do you understand what I’m doing?

My Nyquist wish-list posting a while ago was for elimination of steps 4 and 5 by somehow allowing me to examine context of the selection. But even with the necessity of steps 4 and 5 I like it that this tool eliminates other manual work.

Finally, a weird thing I didn’t expect: sometimes this effect splits the track at the left edge of the selection! Even in the do-nothing first invocation! I think it only happens if the right edge is past the end of the track or clip. But why?
Trim Pause.ny (6.85 KB)

Sorry about the delay - I’ve not had much time for Audacity things recently. I’ll post a proper reply shortly.

Not really.
Does the plug-in do what you expect it to do?
I’m unsure of what precisely I should be expecting it to do.

I don’t know if I can describe it simpler than I have. I can say that I have been finding it a useful simplification for a common task. It spares me some zooming in and out to find precise endpoints for deletion. I’d like to march through a narration, fixing pauses as quickly as a can, without losing too much of the sense of the narrative flow.

The main thing is I want to just pick “by ear” where a long pause should be trimmed to, and have a program calculate what to delete so that the start of loud voice is moved up to just that point, but on-glides of the word are preserved too and may be moved even left of that point. I don’t have the simplicity of a single pick and keystroke but I have achieved important simplification.

The whole thing is only three pages of printout, and I want to know that I didn’t do anything too weird and crazy or inefficient.

It is indeed a little confusing with all the selecting steps.
Let me see if I got it right:

  • The purpose of the first selection is to preview the section you’re working with.
  • You then play this section and re-set the left margin where (according to your “feel”) the sentence should start.
  • difference between desired and momentary start (+ the beginning of the sentence) is stored.
  • This selection could be 0.5 s long. However, the louder part at the end is preserved because it lies above the first threshold. Thus, the actual difference could be 0.3 s.
  • You now expand the selection again to the left into the region between sentence end and breath taking.
  • The plug-in excludes then also the quieter sound from the stored selection (maybe 0.1 s) and we end up with 0.2 s that have to be deleted.
  • after the detection of the zero crossing, the sound without the removed silence of about 0.2 s is returned.

It may be advantageous for people that do not use a mouse to save the first selection (edit menu), and to restore it before the actual silence removing.

I guess I would prefer an one-click effect that removes in each call the quietest rms-sections.
Something like this:

  • You select the pause including ending and beginning of the concerned sentences.
  • You call the plug-in.
  • It takes the RMS measurement (let’s say at 20 Hz)
  • The curve is now multiplied by a raised half-period sine curve (bowl shape).
  • This should ensure that start and ending of the selection aren’t affected.
  • You make a list with the time indexes and the weighted RMS values.
  • after Sorting out the most silent ones, you can search for the zero crossings and remove these parts.
  • The amount of RMS values that have to be removed is of course hard-coded.

20 Hz would mean 50 ms per value.
You could now tell the program to remove 10 of these.
When you additionally multiply this with a random factor of 1, 2 or 3, it is going to be easier to remove longer silences.
By pressing play, you can control the result and undo the last step (and try with a hopefully smaller random value) if necessary.
You could of course also start with relatively high values for longer selections.
Besides, I’ve posted a snippet that returns a list of zero crossings without the use of snd-fetch.
https://forum.audacityteam.org/t/zero-crossing-detection/26237/1

That, and set the common right boundary of the selections passed to the two invocations of the tool. I would error-check that the right boundary is the same both times but I don’t know how. I choose something to play, listening to the end of one sentence and some of the pause between. The right boundary should be in the next sentence, but I stop before reaching it.

  • You then play this section and re-set the left margin where (according to your “feel”) the sentence should start.
  • difference between desired and momentary start (+ the beginning of the sentence) is stored.

I don’t know what “momentary start” means. The snd-length of the selection (as shrunk after stopping play) is remembered in a scratch property. That’s all. I would remember the track time of the end of selection if I knew how. The place where play started has no importance in the calculations. It only matters to my intuition of where to stop afterward.

The selection contains the beginning of the next sentence, and I could locate it in this pass but I don’t. I do that in the second. I suppose there would be savings if I did, with less to scan. It is the start of the second sentence that matters.

  • This selection could be 0.5 s long. However, the louder part at the end is preserved because it lies above the first threshold. Thus, the actual difference could be 0.3 s.
  • You now expand the selection again to the left into the region between sentence end and breath taking.

The difference between the start of this selection and start of sound above the loud threshold is important, but as mentioned, not calculated yet. That is the length that the deletion should have. But the right end of the deletion may need to be before the start of sound, meaning the left end of the deletion may need to be before the selection – meaning we can’t operate on this selection and need to stretch it left and call the tool again.

Left edge of the selection is somewhere in the pause between sentences. I move it left “some.” At least as far, as the breath before the sound is long. (Sometimes not a breath, sometime’s it’s the faint “m” before initial “b”… I use “on-glide” as the general term.)

  • The plug-in excludes then also the quieter sound from the stored selection (maybe 0.1 s) and we end up with 0.2 s that have to be deleted.
  • after the detection of the zero crossing, the sound without the removed silence of about 0.2 s is returned.

I’m not sure what these numbers mean, I don’t think they describe things.

It may be advantageous for people that do not use a mouse to save the first selection (edit menu), and to restore it before the actual silence removing.

I guess I would prefer an one-click effect that removes in each call the quietest rms-sections.

I’d like a one-click effect too, but do you understand now why that can’t work with Nyquist’s limitations? I don’t know how to communicate a selection, plus a certain point in the middle of it, into Nyquist in one call. The workaround is to make “middle” the left on the first call.

Obviously, I didn’t get it right. lol.
I had somehow the impression that the whole pause, including short sections of the sentences were selected in the beginning.
I wonder, wouldn’t it not be much more comfortable to select an arbitrary chunk within the pause, to preview the reminder with the c key and to delete it when the right length is selected?
Perhaps, I am hopeless off the trail and fail to see the concrete advantage of your procedure.
On the other hand, I haven’t to worry about proper zooming and scrolling though.

Of course that’s a simple way to do it, but that might take some fiddling each time. This operation is something I will do so repetitively that investment in simplification of the procedure seems really worth it to me. I want to listen and just have a “right there!” reaction with one finger. I want to take some of the tedium out of the work and keep up my “flow” as best I can.

Even so I suppose there are the problems of my reaction time and some playback latency, but even so I suppose I could adjust and train myself so that I can get my pauses just right on the first try.

You made suggestions I don’t fully understand for using rms. I wonder if I should use rms instead of peak as my criterion for sounds loud enough to define the onset, soft enough to be noise, and on-glides in-between. What advantage might there be either way?

OK, I see what it is doing.

Is there a reason why the following approach could not be used?

  1. Select a range including the end of one sentence and the start of the next.
  2. Play, then “Stop and Set Cursor” (shift+A) just where you judge the next sentence should start.
  3. Apply an effect that deletes “silence” (below the threshold) up to the start of the sound.


Unfortunately Nyquist does not have access to that information.


I’m still working through your code, but a couple of general points:

In LISP / Nyquist programming it is strongly preferred to use spaces rather than tabs. Indentation makes a huge difference to the readability of LISP, but indentation is rather hit or miss when using tabs as it depends on the tab settings in the editor. Unfortunately many of the older Nyquist plug-ins have very poor indentation, but this is not surprising as many of the older plug-ins were written by David Sky (now sadly departed) who was blind. Indentation is largely irrelevant for blind coders but for sighted users it can be a great help in seeing where commands start and end, and for seeing the structure of a program. Code written by Roger Dannenberg and Edgar-rft provide good examples of code indentation. There is also a good guide here: http://dept-info.labri.u-bordeaux.fr/~idurand/enseignement/PFS/Common/Strandh-Tutorial/indentation.html

;; following two functions cribbed from SilenceMarker.ny, with constant
;; .........
(defun mono-s (s-in)
  (if (arrayp s-in)
      (snd-add (aref s-in 0) (aref s-in 1))
      s-in))

(setq my-srate-ratio 1.0)

(defun my-s (s-in)
  (setq my-srate-ratio (truncate (/ (snd-srate (mono-s s-in)) resolution)))
  (snd-avg (mono-s s-in) my-srate-ratio my-srate-ratio OP-PEAK))

Unfortunately some of the older plug-ins do not provide great models for Nyquist programming. In this case the (mono-s sound) function is probably not the best approach as it simply adds together the left and right channels, which will of course make the “silences” have higher amplitude, so that the threshold settings will be relatively too low. For your plug-in it may be better to either:

;; Take an average of the left and right channels

(defun mono-s (s-in)
  (if (arrayp s-in)
      (mult 0.5 (sum (aref s-in 0) (aref s-in 1)))
      s-in))

or:

;; Take the absolute maximum of the left and right channels

(defun mono-s (s-in)
  (if (arrayp s-in)
      (s-max (snd-abs (aref s-in 0))(snd-abs (aref s-in 1)))
      s-in))

A handy function to simplify your zero crossing detection: (plusp expr) http://www.cs.cmu.edu/~rbd/doc/nyquist/part19.html#index1447

More to follow :slight_smile:

What effect deletes silence?

I believe the more complicated thing I am doing may match my intent better: define “sound” as crossing some threshold, move that crossing forward in time to the desired point, BUT do not simply delete from the point to the threshold: preserve on-glides too for more natural transitions. I believe I’m better at picking where that threshold crossing should be, not where the start of the transition is. I will have some dials to fiddle to figure that out.

As for indentation… I just re-briefed myself in long forgotten emacs and just doing whatever the lisp-mode is doing by default for indentation. Do you know how to improve that.

(extract …) or (extract-ebs …)

First thing, set it to use spaces rather than tabs.
There are also some Emacs specific tips in the “good guide” link that I posted.

Related thread I started is here, but I think all interested parties have seen and commented on that too. https://forum.audacityteam.org/t/installing-audacity-on-a-usb-drive/6346/1

I haven’t digested everyone’s comments, but I find that with the correction for latency I made as discussed in that thread, I have a very satisfactory tool. Here was a good test: sing Yankee Doodle at a moderate tempo with exaggerated pauses between lines. Use my tool to correct the pauses. See if the playback sounds proper. I’m fixing pauses acceptably on the first try, which wasn’t so with my latency problem.

I figured fixing musical rhythm is even more exacting than what I would usually do.

That leads to the conclusion that a latency control (MS) should be added At least if a wider public should Profit from the correction.

Aha! I just found out about how some of the LibriVox contributors use AutoHotKeys to make “scripts” for Audacity. This may almost give me the single-key convenience I want for this tool.

However there seem to be some strange timing issues involved. I have to insert sleeps into the scripts to make things work.

Any of you have experience with that?

And can anyone explain why my effect sometimes causes clip boundary splits to appear?

It’s just a bit of weirdness :smiley:
It shouldn’t do any harm, but probably it is a minor bug.

A known problem with Nyquist or am I just special?

It seems the splits happen when the selected range includes some of the blank space after the end of the track, but I think that’s not a necessary condition.

It’s a known problem.

I hope there are not known problems with any data loss in the output of user written effects. If this is all then ctrl-j fixes it.