Sound Finder / Silence Finder improvements

I just wanted to start a topic to keep ideas alive for these two.

Sound Finder could do with a new control “Ignore sounds shorter than [seconds]” . This would consider sounds that are shorter than the minimum duration to be part of the “silence”. The use case is that it would enable removal of a lot of loud but short duration sounds.

Sound Finder probably does not need a label at the end of the track (Silence Finder does not have this). There are probably so few cases it would be useful that it is easier for user to add the final label.

Silence Finder produces an unchangeable “S” label and Sound Finder produces an unchangeable numbered label. Since Regular Interval labels has extra fields to allow customised labels:

  • Label text
  • Minimum number of digits in label
  • Begin numbering from

there may be a case for adding these to Silence Finder and Sound Finder. This is requested from time to time.

Now we have removed the -ve sign from the Normalize GUI, should we review the corresponding -ve sign in
Sound Finder and Silence Finder? I appreciate this may not be user-friendly. Any thought?



Gale

Not sure I understand the user case - could you give an example.

Also, not sure how “Ignore sounds shorter than [seconds]” would interact with “Minimum duration of silence between sounds [seconds]”. For example, if the “Minimum duration of silence…” = 2 seconds and "“Ignore sounds shorter than…” = 0.5 seconds, what happens if there is 1 second of silence, then 0.2 seconds of sound, then 1 second of silence?

Wasn’t there an updated version of Sound Finder that prevented labels occurring before time=0 ?

The example is the one I e-mailed you about in the Summer where someone was recording ultrasonic vocalizations produced by animals. The vocalisations would never be shorter than a given length but could be fairly loud. He has lots of short (often loud) noises in the recording that cannot be animal noises and wanted a way to remove them from the analysis. This can’t be done currently unless the noises are below the silence threshold.

Someone else suggested such a feature afterwards but did not give a use case.

Rather than allow only one control or the other to work, I assume the “ignore” takes precedence and the “sound” starts at 1.2s.

There was an updated version http://forum.audacityteam.org/download/file.php?id=2122 discussed in this thread that puts the negative sign in the text box. I had forgotten that, but that’s fine.

Can’t find any evidence of a “behind zero” fix - if a sound is behind zero its label is behind zero in the above id2122 version.



Gale

Ah yes, I remember.

Another possible user case could be a live (Classical) concert recording - there are often coughs and clatters during the “silence” between pieces which should not be marked.

Another possible user case could be recording vinyl - clicks between tracks should not be marked as “sound”.

I’m not sure that I understand.
track.png
in this example there is “silence” from 1.2 seconds to 2.2 seconds. Do you mean that the period from 1.0 seconds to 3.2 seconds (2.2 seconds total) should count as silence, resulting in two labels - the first marking the sound from 0.0 to 1.0 and the second marking 3.2 to 4.2?

That’s the one - and it does have a “behind zero” fix.
If the sound starts before zero, then I’m not sure if it is “correct” or not for the first label to start before zero. I would think that in this case the label should start at the beginning of the sound.

What this version fixes is that if the sound starts at zero, the current version with default settings will place the first label to start before zero.

In this version the first label will only be behind zero if the sound starts before zero, and in such a case the label will start at the beginning of the sound, not before.

Here’s an actual user case: https://forum.audacityteam.org/t/problem-with-sound-finder/14274/1

I presume that there are some user cases, but I can’t think of one.
Anyone got any ideas why this feature was put in?

This is proving to be quite tough.

So if “Ignore sounds shorter than…” is non-zero, any sounds shorter than specified should be treated as silence?
This “filtering” should occur before computing which silences to allow?
Have I got this right?

Here’s Sound Finder with all if the request features except for “Ignore sounds shorter than…”
(also not added “Begin numbering from” but have included customizable text and minimum number of digits before or after - I think that is probably flexible enough).
SoundFinder.ny (5.49 KB)

I’ve just been reading this post: Batch Editor for Silence Labels - #4 by audiobookfan

This issue I am dealing with really is most pronounced in editing audiobooks rather than music. In music, each song has its determined ending point which is the logical placement for a track. However on audiobooks it would be very helpful to place the tracks at even intervals of say 60 or 120 seconds, especially when using an iPod or mp3 player.

How about something along this line: Add the ability to place the regular interval labels at the nearest silent point. The parameters for the regular intervals could be selected in the same way as they are now.

A possible approach to this would be to have a “Minimum duration of sound” control.
In effect, periods of sound less than the specified amount would not be split.
Setting the “Minimum duration of sound” to, say, 100 seconds for an audio book with the threshold an minimum silence set so as to detect gaps between words/sentences would then split at the first gap after 100 seconds.

Unfortunately this is the opposite of what I think we were looking at previously:

Comments?
Thoughts?
Solutions?

In that example, I was envisaging treating the “too short” sound not as silence but a separate “thing” to be ignored. That wasn’t my original idea but you correctly pointed out that there were potential conflicts between the choices. If the “too short” is a separate thing to be ignored, that would make the first sound from 0.0 to 2.0 and the second sound from 2.2 t0 4.2.

If the first sound is marked from 0.0 to 1.0 and the second sound from 3.2 to 4.2 as you suggest, so treating the “too short” as identical to silence, that is the same result as if the noise at 2.0 to 2.2 s had not occurred.

I’m not sure which would be more useful. I was tending to think treating the too short sound as a separate “thing” would be more representative of what happened because there was not actually silence in the recording for 2.2 seconds.

From a “classical concert” perspective I think I would rather have the little bit of silence which I had agreed was allowed as part of the sound, but perhaps this is not so logical.

I’ll ask the original enquirer who was recording animal noises meantime.



Gale

The problem is, what should happen if there are a series of short noises (shorter than the “minimum”) with short spaces between (shorter than the “allowed”)?
The “minimum sound” setting will ignore the sounds so that the silences will add up to be longer than the minimum allowed silence, but the “allowed silence” will make the sounds add up to greater than the minimum sound. One setting wants to call it sound while the other setting wants to call it silence.

Is the “minimum” the “ignore sounds shorter than…”? Are you happy with one or other solution in the image above if we don’t have the proposed “minimum duration of sound” for audio books?

If we also have “minimum duration of sound”, then we have two competing controls “ignore sounds shorter than” and “allow sounds shorter than” don’t we? So don’t we make that a choice with a slider underneath, or a slider centered on zero with “Ignore” at one end and “allow” at the other?


Gale

The “minimum duration of sound” (for audio books) isn’t really a problem.
I’ve restructured the code in my “development version” so that the sound/silence detection runs separately (before) the selection of which sounds/silences to label. This makes it a lot easier to modify the “selection process” without needing to rewrite the entire plug-in code. Once the selection of “valid” sounds/silences has been determined, it is pretty easy to specify a minimum duration of sounds.

Sorry, I wasn’t very clear.

The problem is, what should happen if:

  • There are a series of sounds shorter than the “Ignore sounds shorter than [seconds]
  • The spaces between each of these sounds is less than “Allow silence between sounds less than (seconds)
  1. The “Ignore sounds shorter than [seconds]” setting will ignore the sounds so that the silences between will add up to be longer than the “Allow silence between sounds less than (seconds)”.

  2. The “Allow silence between sounds less than (seconds)” will make the sounds add up to greater than the “Ignore sounds shorter than [seconds]”.

The “Allow silence between sounds less than (seconds)” setting is essential. If all silences are used, irrespective of silence duration, then there will be silence detected at every zero crossing point (every cycle of a periodic waveform). Obviously this is undesirable so very short silences must be ignored (counted as part of the sound).

Consider this example:
crackle.png
Lets assume that “Allow silence between sounds less than (seconds)” is set to 0.5 seconds.

The current behaviour is unambiguous. The silences in the region 1 to 3 seconds are less than 0.5 seconds, so they are “allowed” and there will be just one marked sound from 0.0 to 4.0 seconds.

If we have an additional feature; “Ignore sounds shorter than [seconds]”, the correct behaviour is less clear.
If Ignore sounds shorter than [seconds]" is set to 0.5 seconds, then one might assume that the “clicks” between 1.0 and 3.0 seconds will be ignored (because they are shorter than 0.5 seconds). If those sounds are ignored then the sum total of the silences will be greater than 0.5 seconds and two sounds will be marked, one from 0.0 to 1.0 and one from 3.0 to 4.0.

This seems very reasonable to me, except that it may not happen.
Because the short silences are ignored (allowed), the entire track from 0.0 to 4.0 is one sound and will be marked from 0.0 to 4.0.
This behaviour seems counter intuitive to the idea of “Ignore sounds shorter than [seconds]”.

The conflict occurs if there are short sounds separated by short silences - do we ignore the short sounds, or ignore the short silences?
If we ignore the short sounds, then 1.0 to 3.0 will count as silence.
If we ignore the short silences then 1.0 to 3.0 will count as sound.
The algorithm cannot be ambiguous or it will be caught in a never ending loop.

One solution would be “Ignore sounds shorter than [seconds] IF AND ONLY IF separated from other sounds by more than the “Allow silence between sounds less than (seconds)” duration”.

The other solution would be “Allow silence between sounds less than (seconds) IF AND ONLY IF both bounding sounds are greater than the “Ignore sounds shorter than [seconds]” duration”.

Both solutions are messy.
Any better ideas?

Ignoring what algorithms could logically do, both track.png and crackle.png would have (I suppose) a “reasonable” outcome for someone wanting “ignore sounds shorter than” if the starting tone and ending tone were both marked as separate “sounds” (with no other sounds marked).

1. Ignore sounds shorter than [seconds] IF AND ONLY IF separated from other sounds by more than the “Allow silence between sounds less than (seconds)” duration seems to mark the entire audio as one sound in both track.png and crackle.png. If I have that right, that’s clearly not OK for someone wanting to “ignore sounds shorter than”.

2. Allow silence between sounds less than (seconds) IF AND ONLY IF both bounding sounds are greater than the “Ignore sounds shorter than [seconds]” duration seems to mark the starting and ending tone as separate “sounds” (with no margin extending into “silence” and with no other sounds marked) in both track.png and crackle.png. If I have that right, that’s clearly “reasonable” for someone wanting to “ignore sounds shorter than”.

However in track.png I or someone else (and maybe even common sense??) might prefer to include the silences (either side of the short noise) in the starting and ending “sounds”. Why? Because that includes a silence “shorter than” that I said I wanted to allow, but doesn’t include a sound shorter than that I said I wanted to ignore. All my settings are satisfied.

In crackle.png I or someone might prefer the first sound to finish where the first crackle peaks, and the second sound to start after the final crackle peaks. I would not want to label each crackle or a group of crackles because the crackles are shorter than “ignore sounds shorter than” EVEN IF the silences between the crackles are shorter than “allow silences shorter than”. Same reasoning with the single “noise” in track.png. But if have a continuous silence shorter than “allow silences shorter than” that is NOT broken by sounds shorter than “ignore sounds shorter than”, I DO want to allow that silence in the sound. Maybe this is too sophisticated/conflicting for an algorithm?

The only other option I can see is to make “ignore sounds shorter than” incompatible with “allow silence between sounds less than” (choose one or the other). Do we need to be that inflexible?

Thanks for your time on this, Steve.




Gale

I think I may have it now:

That’s correct.

You “have that right”, but while not OK for someone wanting to “ignore sounds shorter than”, it is the logical result of the settings described.
Why? Because “allow silences” (in the current plug-in) tells the plug-in to treat audio with short gaps as if it was continuous audio. I’m not aware of anyone asking for the current behaviour to be changed, only to add new additional features.

In order for someone wanting to “ignore sounds shorter than” to get the required result, they must ensure that “Allow silence between sounds less than (seconds)” is set smaller than the shortest silence that they want detected.

In crackle.png the silences are about 0.1 seconds duration.

  • If “Allow silence between sounds less than (seconds)” is set greater than 0.1 seconds then those silences are explicitly allowed and (as now) 0.0 to 4.0 seconds is treated as one sound.
  • If “Allow silence between sounds less than (seconds)” is set to less than 0.1 seconds (say 0.05 seconds) then those silences are counted as silences. This is where the new control comes in - If “ignore sounds shorter than” is set greater than the duration of the clicks (say 0.2 seconds), then those clicks will be ignored and from 1.0 to 3.0 will count as silence.

All the settings are satisfied and the job can be accomplished.

To put this another way, there is an order to the processing:

  1. Allow short silences
  2. Ignore short sounds
  3. Don’t split shorter than

I’ll code this up so that we can try it and see how well it performs in practice.

There would still be a “problem” if someone had content that was typically like track.png but then they also had intermittent crackle as in crackle.png. They would never be able to set allowed silence high enough for what they really want if the crackle was as loud as that. Maybe not our “problem”, though.

But even with no crackle (track.png), there is no setting that will mark the first sound from 0.0 to 2.0 and the second sound from 2.2 to 4.2, is there?


Thanks


Gale

If they have crackle in silences as bad as that, the failure of this plug-in is the least of their worries. It also will not work very well if the noise floor in the track varies widely. In some extreme cases it may just be better to label manually.

No there isn’t.

  • If “allowed silence” is less than 1 seconds and “ignore sounds” is less than 0.2 seconds, there will be three labels (0.0 to 1.0, 2.0 to 2.2 and 3.2 to 4.2)
  • If “allowed silence” is less than 1 seconds and “ignore sounds” is more than 0.2 seconds, there will be two labels (0.0 to 1.0 and 3.2 to 4.2)
  • If “allowed silence” is more than 1 second, there will b one label (0.0 to 4.2)

I presume that the case that you are putting forward here is:
“allowed silence” is more than 1 seconds and “ignore sounds” is less than 0.2 seconds
Are you suggesting that the silence from 1.0 to 2.0 should be counted as part of the first sound (because it is “allowed”) and the silence from 2.2 to 3.2 should be counted as part of the second sound for the same reason, but 2.0 to 2.2 should not be included as part of either because it is “ignored”?
If so, I don’t think that strategy is workable because within normal audio there will commonly be very short silences that we wish to allow and these may occur at frequent intervals, so the duration of sounds between will also be very small (consider zero crossing points). We don’t want to split a sound just because there are a couple of brief silences in close succession.

Also, labels from 0.0 to 2.0 and a second sound from 2.2 to 4.2 would imply that sounds are found (this is the “Sound Finder” effect) from 0.0 to 2.0 and from 2.2 to 4.0. I don’t think that this is really desirable on two counts -
(1) the first sound actually ends at 1.0, not 2.0
(2) the “gap” between the labels (2.0 to 2.2) would imply “no sound” in that region. Not only is that not the case, but if it were the case then it should be allowed because it is only 0.2 seconds duration.

I’m making good progress with the algorithm proposed in my last post, so lets see how this works out in practice.

Of course. With that simplistic “track.png” example it might be “common sense” as I suggested to put that case, but that is without considering how an algorithm would handle real world audio.

Yes, but user asked for a second of silence (in fact, up to two seconds in the original example) to be included as “sound”.

It is the case, user asked for a sound as short as that to be excluded.

It could only be allowed if the algorithm was treating the ignored sound as silence.

I think an algorithm “ought” to be able to produce what I suggest in the artificial “track.png” example, and you do have to explain there why the allowed silence setting appears to be “ignored” because I am sure it will be asked.

There again, some may prefer just the sound to be marked without the silences they asked for where a “too short” sound intervenes.

Given the algorithm may have to face short sounds to be ignored with short silences in-between them, that probably makes marking only the sounds that were long enough (without silence) the only practical solution. User can of course adjust the label positioning to include something other than the marked sounds.




Gale

I agree that it is absolutely worthwhile to consider the “common sense” view, no matter how unrealistic the test case. Although I may sound adamant in some of my arguments, I’m really just trying to think through how different algorithms will behave and assess if the behaviour is useful and intuitive.

Rightly or wrongly I’ve always assumed that the purpose of “Minimum duration of silence between sounds (seconds)” is to prevent sounds from being split by (relatively) short periods of silence. I have not considered it to mean that short periods of silence should be added to the detected sounds, after all it does say “between sounds”.

The current plug-in is inconsistent regarding this.

  1. If there is a period of silence at the end of the selection,
  2. If it is of less duration than “Minimum duration of silence between sounds”, it is added to the last sound.
  3. If it is of greater duration than “Minimum duration of silence between sounds”, it is not added.
  4. if there is a period of silence at the beginning of the selection,
  5. If it is of less duration than “Minimum duration of silence between sounds”, it is NOT added to the first sound.
  6. If it is of greater duration than “Minimum duration of silence between sounds”, it is not added.

We have an inconsistency between 1a and 2a.
In my opinion, 1a is incorrect because the silence is not “between sounds”.

With the addition of “Ignore sounds shorter than” we have another case to contend with. If there is a period of silence, less than “Minimum duration of silence between sounds” which is then followed by a sound that is shorter than “Ignore sounds shorter than”.

In this case, if we follow the rule of 1a, then as you suggest the sounds should be labelled from 0.0 to 2.0 and from 2.2 to 4.2.
I think that the case for this behaviour is stronger than 1a because here the silence IS between sounds.
The weakness of the case is that we have said that the sound from 2.0 to 2.2 should be ignored.

My inclination at the moment is to be consistent throughout and not add silence to the end of a detected sound in any circumstance.
Following this rule, 1a would become:
If there is a period of silence at the end of the selection, If it is of less duration than “Minimum duration of silence between sounds”, it is NOT added to the last sound.
(because the silence is not “between sounds”).
The case for Track.png would also be that the silences 1.0 to 2.0 and 2.2 to 3.2 would not be added to either of the detected sounds (because the sound from 2.0 to 2.2 is being ignored so the silence is not “between valid sounds”).
So in all cases short periods of silence can only allowed if they are between two marked sounds.

I would suggest adding the word “valid” to “Minimum duration of silence between sounds (seconds)” except that it is already rather long.

It’s possibly worth remembering that there will still be options for “Label starting point” and “label ending point” so some amount of silence can be added each side of the detected sounds.