Sound Finder / Silence Finder improvements

That’s the one about splitting audio books so that you get reasonable size “chapters”. I think this is fairly straightforward but for clarity we can look at that later as a separate issue. The “contentious” issue is about handling “allowed silence” and “ignored sounds”.

Thanks for the input Peter. A fresh pair of eyes is often useful.
In this particular case it’s not really possible to separate out the “allow silence” and “ignore sounds”. Short periods of silence always occur in audio. As said earlier in this topic, taking it to the extreme there is a short “silence” each time the waveform crosses the zero line (typically hundreds of times per second), so this must be “allowed for”.

Not really “proposing” it. I just suggested that such an algorithm may produce the output that you were asking for.

“Precedence” is the key word, in the sense of “the order to be observed”.
My previous post illustrates what happens if “allow silence” takes precedence.
I think that I wrote earlier that if “ignore sounds” takes precedence, then “bad things” happen. Perhaps it would be useful for us to look at that in more detail.

Here is a voice recording made with the built in microphone in my laptop. In this description the term “silence” refers to a sound level below the “threshold”, where the threshold has been set just above the noise floor.

There are 2 pauses longer that 1 second duration where we want to “split” the track. We do not want to split at the shorter silences. The longest “other” silence is just under 1 second. The obvious setting for out “allow silence less than” control would be 1 second, but we are not going to implement this just yet - we will give “ignore sounds” priority and implement the “1 second allowed silence” after.

On the first label track I’ve marked the position of where we want to split, and the largest “other” silence where we do not wish to split.
On the second label track I’ve marked a few of the “shorter sounds”. For the sake of clarity I am putting to one side the issue of zero crossing points - these “shorter sounds” all have silence of at least 20 ms on each side.
voice-recording-1.png
Clearly we do not want to ignore these marked “shorter sounds”, so the obvious setting for “ignore sounds shorter than” will be less than the shortest of these (32 ms). Let’s say we set it to 30 ms.

Going back to your post of Tue Feb 07, 2012 2:35 am.

In order to “not split at zero crossing points”, we are initially allowing silences up to 20 milliseconds.
We then “give priority to” ignoring sounds that are less than 30 milliseconds.
These settings produce the label points shown in the second label track in the image below:
ignore-short-sounds.png
Now we implement the “allowed 1 second silence”.
The questions here are, do we:

  1. Extend each selection to include the adjoining silence up to a maximum of 1 second but not merge the labels (as per your suggestion)
  2. Extend each selection to include the adjoining silence up to a maximum of 1 second and “merge” overlapping labels
  3. Something else

If we do (a) then we end up with 95 labels, which is not what we want.
If we do (b) then we end up with 1 label encompassing the entire track, which is not what we want.
What is (c)?

In contrast, let’s see what happens if we give priority to “allow silence between sounds” (as per my previous post) with “Allowed Silence” of 1 second:
allow-silence-priority.png

I am a little surprised it makes as many labels as that - is it making a “sound” out of every vowel?

Apart from that, I wouldn’t (with that algorithm) use “ignore”, any more than I would with “crackly” audio. And I would have to live with hundreds of labels in the pre-concert announcements, it seems. “Giving ignore precedence” could only be an option.

As to what (c) might be, in that case merging labels until it encountered a silence longer than 1 second might be useful. But that wouldn’t be what I was after in track.png:

as you point out. It would have to merge until it encountered a silence longer than the minimum, unless that silence was split by an ignored sound.

What does the algorithm that gives precedence to “ignore” do with:

With “Ignore less than” set at 0.3, does it exclude the sound from 1.2 to 1.4 seconds?

Another thing struck me about your microphone recording. There could be longish bleep noises between sentences due to a trialware restriction or timing noise, or music you don’t want between the speech - “ignore longer than” would then be useful. And given that audio content, it wouldn’t conflict, would it?


Gale

It’s making a label every time the waveform rises above the “silence threshold” and goes down below the threshold again, which can be more than once per word. For example, if a word has a “P” sound in it, there is likely to be a short “silence” immediately before the “P”, so there will be a sound before the “P” and another sound after the “P”.

Such behaviour would not be easy to explain in the “Help”.
What happens if, when extending the label to include the “allowed silence”, an “ignored sound” is encountered? Does the label stop when it reaches that sound, or continue beyond the ignored sound? Does the duration of the ignored sound count as part of the allowed silence or is it totally ignored? I think that “totally ignoring it” would be more logical, but may cause some surprising results (as may occur in this next example)…


With “Ignore less than” set at 0.3, the sound from 1.2 to 1.4 and the sound from 5.4 to 5.6 will be “ignored”. That is, they will not be labelled.

Initially (before considering how to extend the recognised sounds to include the “allowed silence”) the sounds from 0.2 to 1.0, 2.4 to 4.4 and 6.6 to 7.6 will be recognised.

If “allow silence” is set to 0.5 seconds:

If we “totally ignore” the ignored sounds,

  • The first sound 0.2 to 1.0, should extend from 0.0 (assuming that we do not extend before 0.0) to 1.7. We are ignoring 1.2 to 1.4, so a “half second” of silence will be made up of 0.2 seconds between 1.0 to 1.2, and a further 0.3 seconds from 1.4 to 1.7. This could look very strange if there are multiple short (ignored) sounds following a recognised sound.

If we “ignore the short sound, but don’t ignore the time that it occupies”:

  • The first sound 0.2 to 1.0, should extend from 0.0 to 1.5. This will look strange as it is only 0.1 seconds after the clearly visible (but “ignored”) sound that ends at 1.4 seconds.

The above two algorithms may be interpreted as incorrect because we have said that we want to ignore short sounds, yet they are included within the span of the labels. So another possibility:

“Allowed silence” must be “true” silence and not “ignored sound”,

  • The label will end when it runs out of the true silence - that is, the first sound will extend from 0.0 to 1.2. This resolves the interpretation of “excluding short sounds”, but now we don’t have 0.5 seconds of silence after the sound, we only have 0.2 seconds.

There is also the question of whether the “sound” from 1.2 to 1.4 is only 0.2 seconds or 1.2 seconds duration!
If we allow 0.5 seconds of silence either side of a sound, then some may think that sound has a duration of 1.2 seconds (0.5 allowed silence + 0.2 actual sound + 0.5 allowed silence), and so should only be ignored if we ignore sounds less than 1.2+x seconds.

There are probably other “rules and exceptions” that we would need to make if we give priority to “ignored sounds”, but even with those considered here, describing it clearly in a help file will be a nightmare.

As a way of moving this plug-in forward, how about if I continue with the “allow silence → ignore short sounds → don’t split less than” scheme (as per my last test plug-in) and we can then asses how well it performs with “real world” audio? I don’t expect that the plug-in can cover every situation and eventuality - sometimes there will be no alternative to manual labelling, but if people find the new features useful we can keep them, if not, then we can rip them out again.

I don’t think your proposed behaviour is that easy to explain either when the options conflict. :slight_smile:

Yes. That’s the primary point.

Yes.

You mean the 0.2 s of silence between 1.0 and 1.2s? That seems OK. We want to allow silences shorter than 0.5 s in the sound, don’t we?

Anybody wanting what I propose would see the sound as from 1.2 to 1.4, I’m sure. They would expect the labels as the image above, with the settings stated, except the first label is from 0.2 to 1.2.

Steve, we must keep the thing going forward, but the question I see is this. Assuming that giving ignore precedence can be done, and it looks to me it could, then could there be a control:

Behaviour: Ignore has precedence [No/Yes]

or however worded? Or has it got to be a whole different plug-in with otherwise the same controls?

I was actually going to suggest you going ahead with a basic plug-in where you could set both allow silence and ignore short sounds (which has “precedence”) and have it behave with the most reasonable rules we can come up with. Then I could see if it was likely to be viable in the reasonable real world cases where it’s likely to be suitable. I think the scheme in your test plug-in (with allows silence having precedence) is uncontroversial.


Thanks,



Gale

A further observation from one who is following this out of curiosity…

Surely “something is better than nothing”? By which I mean, if you think you have a solution that will address, let’s say, 80% of the items correctly, that will be worth having for someone who really needs this automation. If you’re getting 80 out of 100 passages correctly labelled then, providing the number of “false labels”, “missing labels” and “incorrectly placed labels” is not excessive, you’ve made using the tool preferable to doing the job manually.

As I read your discussion I get the impression you’re trying to achieve perfection (a laudable aim!) when what the user wants is “something that works well enough to be of value to me”.

Peter

I see your point PGA. My “Swiss Army Knife” does not have one of those things for de-scaling fish and it doesn’t have one of those things for getting stones out of horses hooves, but it fits neatly in my pocket and it’s still a useful tool that does many useful tasks. On the other hand, it does not have a small (jeweller’s) cross-head screwdriver, and that would have been useful on several occasions and made the tool perfect for me.


Yes we do, but we also want to ignore short sounds. There is a bit of a contradiction: on the one hand we are ignoring the sound (for the purpose of marking the sound), but we are recognising the sound for the purpose of calculating how much silence to add to the recognised sound. I think that in practice, this will make the length of silence included at the ends of the sounds unpredictable as a single sample value that is even slightly above the threshold will cause the silence after the recognised sound to stop at that point. However, if that’s what you want I think I can implement it.


I think it can be done. It will certainly add quite a lot of code.

There is also another difference between the two schemes.

  1. In “my” scheme, “allow silences between sounds”. Silences at either end are not added on.
  2. In “your” scheme “allow silences”. Silence up to the “allowed” amount counts as sound whether between, or at the end of sounds.

In (1) this behaviour is, I think, logical in this context.
In (2) that is the behaviour that you have consistently said that you want for that scheme. It will be tricky to implement, but I think possible.

The thing that makes (2) difficult is that the code needs to refer to the positions of “ignored” silence, which means that the code cannot really “ignore” them, it needs to know where they occur so that it can add on the right amount of silence.

To summarise the “ignore has priority” scheme:

  • Prevent splitting at zero crossing points (no user interaction)
  • Ignore sounds shorter than:… seconds
  • Allow silences that are less than set amount, whether they occur between sounds or at the ends of recognised sounds, but don’t include any “ignored sounds”.
  • “merge” short labels to implement minimum label duration (this was the “second” new feature - wording to be decided, but not problematic).
  • Adjust label start/end positions (as per current version)

I’ll give that a go - it may take a while.

Thanks, Steve for your patience. Yes I think your scheme is correct. One person following this off-forum said to me “the brain doesn’t have the worry that the coding has because it isn’t adding any silence; it’s deciding whether the silence already there is long enough to exclude.”

I hope this will turn out useful for the brainpower needed.


Gale

Here’s the plug-in.
For now I have called it “Advanced Sound Finder…” so as to distinguish it from the current Audacity Sound Finder tool.
It is an Analyze type plug-in.

As it is significantly different from the current Audacity version I’ve written a first draft of a manual page: http://manual.audacityteam.org/man/User:Stevethefiddle

OBSOLETE VERSION:
AdvancedSoundFinder.ny (9.15 KB)

Some “Release Notes” since my last post:

RMS Level detection.
The RMS window size is 2.5 ms. which allows excellent accuracy but only serves to reject the lightest crackles.
For RMS level detection to be a useful option I think that the window size needs to be increased to about 20 ms. This reduces the accuracy to about +/- 10 ms but for labelling vinyl recordings (the main purpose of the RMS detection method) I think that will be fine.
A side effect of the reduced accuracy is a speed increase of about 17% so I think overall this is a worthwhile improvement.
Status: Done.


Error Checking
Although there is fairly good error checking against user errors, in some cases the plug-in will return no labels and no message to indicate why.
I think that for most such cases I can return a message to give a good indication of why no labels have been returned.
Status: In progress.

Q. What’s a good default threshold level?
I think that -24 dB is too high.
On my test recordings even -26 dB (the default for the current Audacity version) seems rather high.


Additional tip for the documentation - when splitting “typical” songs it is unlikely that songs will be less than a couple of minutes.
Pushing “Disregard isolated sounds less than: XX seconds” up to the slider maximum (10 seconds) can help to avoid “false” labels.

I’m not sure that I’m quite happy with the wording of:
“Minimum label length (optional)”

I am happy with the behaviour for this setting, it does what it is intended to do, but the description does not always match the behaviour.

The purpose of this setting is to allow detected sounds to be “grouped” so that rather than separate labels for each sound their labels are merged into a longer section.

For example, consider a long lecture.
The user wants to split the lecture into 10 minute sections before exporting multiple as MP3s for their iPod.
They could use “regular interval labels”, but that is not a very intelligent effect and will split words if they fall on a time boundary.
Better would be to set the “Silence threshold” and “Allow gaps” so that labelled sections start and end at natural pauses such as between sentences. However this is likely to produce far too many labels with durations that are too short. This “Minimum label length” setting will group together labels, starting from the first label until it reaches an end label position that is more than 10 minutes later than the first detected sound. Then at the next detected sound it will start the next group. Thus the result is a series of labels that are 10 and a bit minutes long with breaks occurring during natural pauses.

So this works very well, but what if the lecture is 34 minutes duration?
We then have 3 labels of about 10 minutes each, plus a bit less than 4 minutes left over.
It is to be expected that the user will want to export that last 4 minutes, so it should be labelled, and it is, but it is less than 10 minutes so “Minimum label length” is not really accurate to the behaviour. The “10 minutes” is really just a “target” minimum duration, not a strict minimum, so is there a better name for this control?

Some minor updates to this one.

  • Error message added if no labels are returned due to the “Disregard isolated sounds” setting being too high.
  • RMS window size increased to 20 ms.
  • Silence threshold lowered to -30 dB (peak)
  • “Disregard isolated sounds” slider range increased to 60 seconds and default set to 10 seconds.

This is a replacement (same name) for the previous version.
AdvancedSoundFinder.ny (9.7 KB)

If it is decided that these enhancements are worth having (they have all been requested by users) then I think that it would be worth expanding the “Silence Finder” just a little so that there is a choice to either mark silences (point label) or mark sounds (region label). This would provide a simple alternative to the “Advanced” sound finder that may be too complicated for some casual users.

Perhaps such a plug-in could be called something like “Label Sound or Silence…
and the plug-in posted here remain as “Advanced Sound Finder…” ?

OK, I’m butting in to discussion about a feature that I will never use, so: reader beware!
@Steve,
Why not simplify still further and have three plug-ins: Label Sound, Label Silence and Advanced Sound Finder? Sometimes the mere fact of having multiple similar choices can be a source of confusion for an inexperienced user. I can visualize a situation where the user is uncertain which of your two choices to use; but where it would be clear which of the two basic “Label…” choices was going to be the right one for what they are wanting to attempt. As always with these “interruptions” of mine: just a thought!

That would be one possibility, and I think quite a good option.
We need to keep the number of plug-ins that are bundled with Audacity down to a reasonable size so that the menus are not unreasonably long, however the Analyze menu is certainly not overcrowded at present.

I don’t often use these effect myself either so hopefully someone that does will come along and have their say.

Done

Done

All three done for Sound Finder

Done


Done


Done


  • Have I missed anything?


  • Is the “RMS level detection method” useful (it’s easy to strip out if not).


  • Would it be useful to have a choice between “region labels” (as now) and “point labels before the sound” (similar to the Silence Finder)? This may obviate the need for an “Advanced Silence Finder”.


  • Would an option to mark the silences with region labels rather than the sounds be useful? “Labeled Regions > Delete” (and related) could then be used to automatically trim out long pauses in recordings. The main advantage that this has over “Truncate Silence” is that the regions to be deleted can be reviewed before deletion (also the detection method is more intelligent).

Thanks for all the work, Steve. I am still digesting it all and playing with my artificial tone scenarios at the moment :ugeek:

A four seconds tone generated to fill the screen then time shifted from -3s to +1s (allow gaps 0.1, disregard 0.0) appears to produce no label and to place the cursor in the label track at 1.1s. When you have dragged the audio back so that most of it is in front of zero, the single label marking the entire audio appears. Just before the drag makes the label appear (when the right edge of the label is at +3.5s), a black dot is painted at the top of the audio track at +1.95s (I suppose the initiation of label drawing). Anyway, this is an Audacity problem I assume, not yours.

I doubt it for basic users but I have not tested yet with crackly vinyl. On a consistency note, when Analyze > Contrast was brought in, we had decided that “rms” was more correct than “RMS”.

I think that’s a low priority addition to Sound Finder because (to me) region labels are better suited to marking sounds than point labels. A common use of Sound Finder is (I think) to accurately label the sounds, so eliminating the silence. This makes makes me wonder how useful the label start and end point controls are (unless the user must use them because the detection controls won’t produce what is wanted).

If the intention is to provide a pleasant silence padding, the default 0.1s at start and end barely seems enough and might be better set at 0.0 s (which seems to produce labels 2 or 3 ms before or after the sound, to prevent clicks, I guess)?

I have a feeling that a “region labels for silences” option would be more useful in Silence Finder than Sound Finder (even at the risk of slightly complicating Silence Finder). I think it would be easier to understand than having a “Simple Silence and Sound Finder” that had point labels for silence and either point or region labels for sound.

I think/hope that users of Sound Finder can accept a slightly more complex effect. I think the wording change away from “minimum duration of silence…” in the Advanced Silence Finder might be a minor problem for people adopting the new advanced effect but at the moment I won’t suggest any change. I think the inter-relationship between “Allow” and “Disregard” in affecting the result is what will bamboozle people the most, rather than the number of controls.

As an aside, if Nyquist supported a separate panel for the detection and labels controls I think the advanced effect would “look” less intimidating.




Gale

I’ve got a minor improvement - currently, with the normal peak detection method, the labels are placed a little late (on average about 1 or 2 ms). Not really noticeable unless the labels are set as close a possible to the sounds using test signals. This version should fix that.
AdvancedSoundFinder.ny (9.87 KB)

Yes I noticed that too. It also happens with manual placing of label (dragged to the left) so it’s not related to the plug-in. Possibly a P5 bug (I’ve added it to bugzilla).


I’ve not got much crackly vinyl to test on, but with the few bits that I have the rms detection method did appear to help. I don’t have strong feeling either way about this feature. We really need some of the vinyl transfer people to test it.


I agree that “rms” is technically correct, but “RMS” is far more common. When used in a sentence I agree that “rms” is better.
“Contrast Analyzer, for measuring rms volume differences between two selections of audio.”

For use as a title I think that lower case looks wrong, probably because it is hardly ever (never?) used lower case in a title.
It should never be “Rms”.

I’d prefer to go against the grain and allow “RMS” when used in title case, a heading or as a label, but I’m happy to stick with “rms” when used in a sentence.
Wikipedia (amongst many others) consistently used “RMS” (even in sentences) Root mean square - Wikipedia

RMS is an abbreviation for Root Mean Square (title case).
rms as an abbreviation at the start of a sentence look wrong.
Rms as an abbreviation at the start of a sentence is wrong.


There is a trade-off between accuracy and speed. This version is a lot more accurate than the current Audacity version. The error will usually be less than 2 ms in this version whereas the Audacity version may be up to 20 ms out. This version runs at about the same speed as the current Audacity version, but aiming for even greater accuracy will make it slower.

I agree that 0.1 seconds is a bit short for providing “a pleasant silence padding” (I took this value from the Audacity version default). I’ll be happy to change that to 0.0 seconds (which in practice will be average at 0.001 seconds) if there is consensus for that.


How would you feel about having a slightly more complex “Silence Finder” and a simpler “Sound Finder” and an “Advanced Sound Finder”? I think this may alleviate some fears about this plug-in being too complicated. Also a simplified Sound Finder may be better for novice users.

I don’t see the need for an “Advanced Silence Finder” as I can’t think of any user cases for such, however I can see some fringe user cases for alternative types of labels in an “Advanced Sound Finder”




  • The end point control is very useful for preserving the tail end of a fade out. It could be useful to allow a greater range for this control, perhaps up to 10 seconds?
  • It is often useful to leave a short gap before the start of a track as some players will fail to play the first fraction of a second.
  • For users that wish to just “split” a recording into multiple files, but retain the existing gaps between songs/tracks it would be better to just have point labels a little before the start of each song/track.


I can see what you mean, but after much consideration I think those terms fit most accurately with what the plug-in actually does.
Short gaps that are within a sound can be “allowed”. (not recognized as a valid gap).
Short sounds that are isolated within a period of silence can be “disregarded”. (not recognized as a valid sound).


Yes that would be nice. In fact it could be easily split into 3 “tabs”. Tab 1: Sound detection. Tab 2:Label placement. Tab 3: Label text options.

Thanks for the feedback so far.
I still have a couple of other ideas for refining the GUI and will post them later.

Following the suggestion of leaving Silence Finder and Sound Finder more or less as they are (simple and unsophisticated) and adding “Advance Sound Finder” as a new plug-in, here’s an updated version of Silence Finder and Sound Finder.

I’ve kept the changes to a minimum.

Changes:

Stereo to mono function changed from:

(defun mono-s (s-in) (if (arrayp s-in) (snd-add (aref s-in 0) (aref s-in 1))
s-in))

To:

(defun mono-s (s-in)
  (if (arrayp s-in) (s-max (aref s-in 0) (aref s-in 1))
  s-in))

The original code is incorrect (a bug) in that the silence threshold is the same regardless of the number of channels, but stereo tracks are summed which will raise (roughly x2) the background noise level. The update fixes this by using the maximum of the two channels.

Superfluous text removed from the ;info (in line with other bundled effects).

Square brackets in GUI replaced with round brackets (in line with other bundled effects).

License information added to code (in line with Nyquist plug-in recommendations)

Minus sign for silence threshold level moved to the value (in line with other bundled effects).

Add label at end option removed from Sound Finder (as previously discussed).

I’m still not convinced by the extremely long control names. They are out of keeping with other Audacity effects and look ugly, but perhaps easier for novice users.
silencefinder.png
soundfinder.png
SilenceMarker.ny (4.39 KB)
SoundFinder.ny (5.97 KB)

New version of “Advanced Sound Finder”.

Following on from the idea of retaining the simple versions I’ve expanded the functionality of the “Advanced” Sound Finder.
Here’s what it looks like with the current default settings. (I’m happy to tweak the defaults if there is a case for alternative default settings).
advancedsoundfinder.png
New features:

  1. New silence detection methods.
  • Peak Level : Same as SilenceFinder and Sound Finder. Most accurate label positions but most prone to false labels due to clicks.
  • RMS Level : Detects sounds based on the rms level of the sound. Less accurate label positions but improved rejection of clicks.
  • Filtered Peak : DC offset and high cut filter to improve click rejection. Very slightly less accurate than Peak method.
  • Filtered RMS : Best click rejection but least accurate label positions (roughly +/- 20 ms)
  1. “Group Sounds”.
    This is a renaming of “Minimum Label Length”. It describes what this plug-in feature actually does rather than just the effect on region labels. I think the new name provides a better idea of how this feature might be used, and can be more easily explained in the manual.

    \
  2. “Number of Digits” and “Digit Position” combined into one control. Only one combination will ever be used at a time, so it saves space to combine them. The options are:
    “None-text only” / “1 before label” / “2 before label” / “3 before label” / “1 after label” / “2 after label” / “3 after label”
    “Number Only” is achieved by leaving the “Label Text” field empty.

    \
  3. “What To Label”.
  • Start of Sound : Useful for splitting long recordings without removing intentional pauses/silences.
  • End of Sound : Counterpart to the option above.
  • Sound Region (default) : Same as SoundFinder
  • Silent Region : Marks the spaces between sounds. Useful for cutting out silences when a single output file is required.

There is fairly comprehensive error checking and hopefully meaningful error messages if wrong data is entered.
In cases where user settings will produce weird labels (such as before zero or labels with start points after end points) the plug-in attempts to make an “intelligent” decision about where to place the label.

There’s still a little tidying up to do to the code, but functionally this does everything that I want it to do.
Update to the (proposed) manual page to follow.

Note:
Marking Silences does not mean that the plug-in is “detecting silences”. The sound detection algorithm is identical to the other labelling methods. The only difference is that the gaps (silences) between detected sounds (or groups of sounds) are labelled rather than labelling the sounds themselves.
AdvancedSoundFinder.ny (12.9 KB)