The “minimum duration of sound” (for audio books) isn’t really a problem.
I’ve restructured the code in my “development version” so that the sound/silence detection runs separately (before) the selection of which sounds/silences to label. This makes it a lot easier to modify the “selection process” without needing to rewrite the entire plug-in code. Once the selection of “valid” sounds/silences has been determined, it is pretty easy to specify a minimum duration of sounds.
Sorry, I wasn’t very clear.
The problem is, what should happen if:
- There are a series of sounds shorter than the “Ignore sounds shorter than [seconds]”
- The spaces between each of these sounds is less than “Allow silence between sounds less than (seconds)”
-
The “Ignore sounds shorter than [seconds]” setting will ignore the sounds so that the silences between will add up to be longer than the “Allow silence between sounds less than (seconds)”.
-
The “Allow silence between sounds less than (seconds)” will make the sounds add up to greater than the “Ignore sounds shorter than [seconds]”.
The “Allow silence between sounds less than (seconds)” setting is essential. If all silences are used, irrespective of silence duration, then there will be silence detected at every zero crossing point (every cycle of a periodic waveform). Obviously this is undesirable so very short silences must be ignored (counted as part of the sound).
Consider this example:

Lets assume that “Allow silence between sounds less than (seconds)” is set to 0.5 seconds.
The current behaviour is unambiguous. The silences in the region 1 to 3 seconds are less than 0.5 seconds, so they are “allowed” and there will be just one marked sound from 0.0 to 4.0 seconds.
If we have an additional feature; “Ignore sounds shorter than [seconds]”, the correct behaviour is less clear.
If Ignore sounds shorter than [seconds]" is set to 0.5 seconds, then one might assume that the “clicks” between 1.0 and 3.0 seconds will be ignored (because they are shorter than 0.5 seconds). If those sounds are ignored then the sum total of the silences will be greater than 0.5 seconds and two sounds will be marked, one from 0.0 to 1.0 and one from 3.0 to 4.0.
This seems very reasonable to me, except that it may not happen.
Because the short silences are ignored (allowed), the entire track from 0.0 to 4.0 is one sound and will be marked from 0.0 to 4.0.
This behaviour seems counter intuitive to the idea of “Ignore sounds shorter than [seconds]”.
The conflict occurs if there are short sounds separated by short silences - do we ignore the short sounds, or ignore the short silences?
If we ignore the short sounds, then 1.0 to 3.0 will count as silence.
If we ignore the short silences then 1.0 to 3.0 will count as sound.
The algorithm cannot be ambiguous or it will be caught in a never ending loop.
One solution would be “Ignore sounds shorter than [seconds] IF AND ONLY IF separated from other sounds by more than the “Allow silence between sounds less than (seconds)” duration”.
The other solution would be “Allow silence between sounds less than (seconds) IF AND ONLY IF both bounding sounds are greater than the “Ignore sounds shorter than [seconds]” duration”.
Both solutions are messy.
Any better ideas?