finer resolution in Silence Finder

Edgar · August 26, 2010, 8:37pm

The original code sets the minimum detectable duration value at 0.1 seconds. For these controls, I think the number value parameters are:
defaultValue minimum maximum

original code:

;control sil-dur "Minimum duration of silence [seconds]" real "" 1.0 0.1 5.0
;control labelbeforedur "Label placement [seconds before silence ends]" real "" 0.3 0.0 1.0

proposed change:

;control sil-dur "Minimum duration of silence [seconds]" real "" 1.0 0.01 5.0
;control labelbeforedur "Label placement [seconds before silence ends]" real "" 0.3 0.0 1.0

This allows searching for much shorter durations.

I have used these values now for many projects and they work just fine:

;control sil-dur "Minimum duration of silence [seconds]" real "" .04 0.01 5.0
;control labelbeforedur "Label placement [seconds before silence ends]" real "" 0.01 0.0 1.0

Gale_Andrews · August 26, 2010, 10:48pm

I don’t object if it’s useful - you can certainly hear silences shorter than 0.1 seconds. It just means another 9 steps on the slider. If so we should make the same change to Sound Finder. Anyone have objections?

Gale

steve · August 27, 2010, 10:51am

Silence finder cannot detect silence if the silence is less than about 0.02 to 0.03 seconds unless further modifications are also made.

Considering usability issues, I would imagine that in most cases 0.1 seconds is a useful minimum silence length. For (rare?) cases where a smaller duration is required, text entry may be used. If the minimum slider value was set at 0.01 seconds, then for a user to select 0.1 seconds they would either need to use text entry, or move the slider to the extreme left, then carefully nudge it back to 0.1, either of which are far less convenient than simply dragging the slider to the extreme left.

The attached modified SilenceMarker plug-in allows the minimum duration to be set to 0.01 seconds (10 milliseconds), but I have also increased the accuracy so that when set at 0.01 then it will reliably detect a 12 ms silence, but a 9 ms silence will not be detected. The effect on processing time is minimal when tested on my machine, though I think it would be worth testing on an old or low power machine if anyone has one available.

When dealing with very short time durations it becomes apparent that the “Minimum duration of silence [seconds]” is not the minimum length that will be detected, but is the duration that must be exceeded by the length of the silence. So if set to 10 ms, that does not mean that a silence of 10 ms duration will be detected. It means that if the length of the silence is greater than 10 ms it will be detected. The amount that it needs to exceed the value set is determined by the accuracy of the plug-in, which I have improved by changing the sample rate of the processed sound from 100Hz to (1/100 x the original bit rate), so for a 44.1kHz track that will be 441Hz.
Perhaps the Minimum Duration would be better described as “Detect silence longer than [seconds]” ?

At the moment I think my preference would be to not change the slider (leave it with a range of 0.1 to 5 seconds), but to make the other modifications and so allowing users to detect silences of around 10 ms. (by text entry). This is subject to the performance hit of the increased sample rate not being unreasonable on slower machines.

I have changed the name of the file to “SilenceMarker-1-1.ny” and the name of the plug-in to “Silence Finder v.1.1” to make testing and comparisons easier. If these changes are accepted it may be preferred to use the original names?
SilenceMarker-1-1.ny (4.66 KB)

Edgar · August 27, 2010, 3:08pm

Will give SM1-1 a good exercising once last night’s batch is finished…

steve · August 27, 2010, 3:54pm

Thanks Edgar, that will be really helpful.
I’ve attached a patch file which may help you see which bits I’ve changed. (had to add a .txt extension to upload it)
silencemarker11.patch.txt (1.89 KB)

Edgar · August 27, 2010, 5:34pm

I just used it to process a 2hr 45min mp3. It worked just fine but there was a long delay after the progress dialog disappeared before the markers & label track appeared. I am not sure if the old version had as long a delay. It also seemed to be noticeably slower–not something I would complain about for greater accuracy though. Do you wish me to time the old and new versions on an identical very long (~3hr) mp3 or M4A file? I could also compare exported label tracks.

If you make similar changes to Sound Finder I will be happy to try that as well.

steve · August 27, 2010, 6:03pm

I find that I get a huge amount of variation in processing time with Nyquist plug-in for no apparent reason, but I generally work on small files (anything from a few seconds up to a few minutes). Sometimes a 30 second file will take 10 seconds to process, and other times with similar audio and plug-in settings it will process virtually instantly.
If there are a great number of labels, then I would expect it would be quite slow to display them - while I was making adjustments I accidentally set it so that it detected every zero crossing point on a 1kHz sine wave - I only had a short section selected, but Audacity froze for several minutes then generated a couple of thousand labels.

Yes that would be useful. I don’t think it matters what the format is as all compressed files are copied into the project so there should be no difference from that.

Let’s see how it goes with Silence Finder - if all goes well I’ll have a look at Sound Finder.

Edgar · August 27, 2010, 6:20pm

With the new code, processing a file* that is six hours 15 minutes long, the progress dialog remained on the screen for four minutes 34 seconds**; after the progress dialog closed an additional five minutes and 51 seconds elapsed before the label track with the markers appeared.

Processing the same 6hr+ file with the old code but with only MY changes resulted in the progress dialog being on-screen four mins 19 secs; label tracked appeared 5 mins 23 secs later.

A barely noticeable difference, certainly not significant.

*a raw recording of a radio broadcast converted from MP3 into an M4A file, exit Audacity, then loaded via drag ‘n’ drop.
**all times are ±3secs

steve · August 27, 2010, 6:27pm

Sounds good to me.
How about the functionality? is it picking up the short silences OK?

Edgar · August 27, 2010, 6:35pm

Doing extremely well on that. In a 3hr recording of a radio broadcast it only missIDed 2 “grand pauses” as track separations–well within tolerance.

The new code is picking up some track splits the old code (with my changes) missed:

	OLD CODE		|||	NEW CODE
1	0.1	0.1	S	|||	0.004167	0.004167	S
2	3.69	3.69	S	|||	3.6	3.6	S
3	19.78	19.78	S	|||	19.689583	19.689583	S
4	203.33	203.33	S	|||	203.2375	203.2375	S
5	341.5	341.5	S	|||	341.402083	341.402083	S
6	506.12	506.12	S	|||	506.03125	506.03125	S
7	550.76	550.76	S	|||	550.670833	550.670833	S
8	725.88	725.88	S	|||	725.785417	725.785417	S
9	892.24	892.24	S	|||	892.14375	892.14375	S
10	1075.49	1075.49	S	|||	1075.4	1075.4	S
11	1288.95	1288.95	S	|||	1288.858333	1288.858333	S
12	1294.07	1294.07	S	|||	1293.977083	1293.977083	S
13	1297.03	1297.03	S	|||	1296.939583	1296.939583	S
14	1503.97	1503.97	S	|||	1503.877083	1503.877083	S
15	1645.25	1645.25	S	|||	1645.160417	1645.160417	S
16	1790.74	1790.74	S	|||	1790.645833	1790.645833	S
17	1794.31	1794.31	S	|||	1794.2125	1794.2125	S
18	1977.01	1977.01	S	|||	1976.920833	1976.920833	S
19	2020.4	2020.4	S	|||	2016.966667	2016.966667	S
20	2163.2	2163.2	S	|||	2020.30625	2020.30625	S
21	2306.92	2306.92	S	|||	2163.108333	2163.108333	S
[...]
217	21991.65	21991.65	S	|||	20142.525	20142.525	S
218	22163.31	22163.31	S	|||	20144.49792	20144.49792	S
219				|||	20147.6625	20147.6625	S
220				|||	20150.77708	20150.77708	S
221				|||	20151.5375	20151.5375	S
222				|||	20152.32917	20152.32917	S
223				|||	20155.22083	20155.22083	S
224				|||	20157.07292	20157.07292	S
225				|||	20158.40833	20158.40833	S
226				|||	20159.90417	20159.90417	S
227				|||	20318.025	20318.025	S
228				|||	20508.1625	20508.1625	S
229				|||	20677.39375	20677.39375	S
230				|||	20680.96667	20680.96667	S
231				|||	20685.37292	20685.37292	S
232				|||	20887.90208	20887.90208	S
233				|||	21127.63125	21127.63125	S
234				|||	21287.0625	21287.0625	S
235				|||	21490.54583	21490.54583	S
236				|||	21505.3875	21505.3875	S
237				|||	21652.21458	21652.21458	S
238				|||	21820.97083	21820.97083	S
239				|||	21991.55417	21991.55417	S
240				|||	22163.21875	22163.21875	S

Note that by marker 20 it had picked up one and by the end it had 22 additional markers. Pardon the formatting !

steve · August 27, 2010, 6:49pm

Excellent, thanks very much Edgar.

Gale_Andrews · August 27, 2010, 8:48pm

You can now upload a plain text file with .patch extension.

Gale

Edgar · August 27, 2010, 9:25pm

Thanks!

Gale_Andrews · August 28, 2010, 12:20am

If I generate 2 seconds of silence inside one of the tones of a 44100 Hz mono DTMF and set Steve’s 1.1 version to detect silence longer than 2 seconds, the silence isn’t detected, but it is detected if “longer than” is 1.99. So I think we must change the wording of that field, or explain in the ;info line that “minimum duration of silence” means “any duration longer than the minimum”. I’d rather reword the field, but to be consistent with the ;info line have the field say “Detect silence durations in excess of [seconds]”.

Having said that, if I generate silence of 0.02 s and detect with “longer than” at 0.02, the silence is detected. Similarly with silences of 0.02 s and 0.01s, both are detected with “longer than” at 0.01. But this is less bad than having a silence several seconds long not detected when a “minimum duration” field is set to that duration.

As for the slider minimum being 0.1s or 0.01s, I think it depends how common the use case is for detecting below 0.1s, and how likely there will really be silences shorter than 0.1s in common scenarios if the user hit HOME on a slider with a 0.01s minimum. There must be for example scientific scenarios with some kinds of audio where you would want to label the silences rather than simply remove the audible ones with Truncate Silence. I don’t object to the slider accommodating “advanced” use cases if “common” ones don’t throw up stacks of unwanted labels.

What is Edgar’s use case?

Obviously if the situation in the second paragraph is thought to be really bad when the wording says “longer than”, you could view that as a case for leaving the slider minimum at 0.1s. At 0.1s, it seems silences are not detected with “longer than” at 0.1s.

Gale

Edgar · August 28, 2010, 2:22am

First, don’t make changes to accommodate me–I can make the changes on my personal Audacity if the changes would impact others’ ease-of-use. My case is probably unique and borders on violating “fair use” rules. I am a music addict; I turn the stereo on as I get out of bed and set it to fade out over 45 minutes when I go to sleep. I listen while doing almost everything–reading, programming, driving etc. I like many styles but my favorite is Big Band/Swing; I often listen to a 24/7 commercial-free internet feed (KCEA.org)–all Big Band.

I record it while sleeping and away from home then break the resulting (large–2-10hr) files into 15± minute segments to listen to while driving, plowing or otherwise away from my home stereo. The format is non-stop but there is a definite, though small gap (~.06 secs) between many songs and always between the “station breaks” and the music. When I stop listening the CD player resets to the beginning of the track; if the resulting tracks are too long I find myself hearing the beginning of the same track over and over during short trips or such.

steve · August 28, 2010, 5:03pm

The plug-in resamples the audio to a lower sample rate so that there are less samples to analyse (bit-wise processing in Audacity is slow).
The unmodified plug-in resampled to 100 Hz, which produces rather large rounding errors - too great for detecting silences in the order of 10 ms.
The modified plug-in uses a higher sample rate of original sample rate/100. This allows greater precision without increasing the processing time too dramatically.
The only way to achieve exact duration detection is to keep the sample rate at the original rate, but this increases processing time dramatically - for example, a 5 minute track with 20 silences takes about 10 seconds with either the original or modified plug-in, but if the sample rate is kept at the original rate then the processing time increases dramatically to about 3 minutes 20 seconds (which I think is unacceptably slow).

I’ve applied the same modification to the sound-finder plug-in. I don’t think that the GUI wording needs to be changed for Sound Finder (“Minimum duration of silence between sounds [seconds]”
SoundFinder-1-1.ny (6.41 KB)

Gale_Andrews · August 28, 2010, 9:55pm

Isn’t it the same issue, though? “Minimum duration of silence between sounds” conveys to me that if you set this to 2 seconds it allows you to have a silence of 2 seconds between sounds. Clearly it doesn’t, you need to set it to 1.99. But this is awkward to word…

“Allow silence durations between sounds longer than [seconds]”

or from the other viewpoint:

“Allow silence durations in sounds shorter than [seconds]”

aren’t clear as to whether seconds refers to silence or sound.

“Silence duration between sounds can exceed [seconds]” ?

Other ideas?

I’m now inclined to think the duration slider minimum should stop at 0.1 s because the text alongside will be incorrect at lower values. People who want to detect shorter values will have to modify the plug-in but can now actually detect down to 0.01 s in most cases with Steve’s changes. I agree we can’t dramatically increase processing times.

Gale

steve · August 29, 2010, 2:15pm

In the context of finding continuous sections of audio, “Minimum duration of silence between sounds [seconds]” says to me that a period of silence must be at least …seconds to count as a break. Or to put it another way, any silence that is less than …seconds will be ignored.

If you set this to 2 seconds it will sometimes detect a gap that is exactly 2 seconds, but due to rounding errors it will sometimes miss them. It should ignore a silence that is less than 2 seconds and it should detect a silence that is greater than 2 seconds, so I think it is fair to say that (in this example) the minimum duration of silence between sounds is 2 seconds. (2 seconds is, given the available accuracy, the minimum that may be detected if set to 2 seconds).

I think that when detecting silences the implication is slightly different.
“Minimum duration of silence [seconds]” says to me that if I wish to detect a silence of 0.02 seconds, then I need to set the Minimum duration of silence to 0.02 seconds, when in fact I need to set it to less than 0.02 seconds.

It’s a fine point, and the issue is confused when the two plug-ins are examined side by side. I’m not overly concerned which wording is used because in real world examples, if silences are being missed due to being too short, the user will simply reduce the silence duration value.

or simply use text entry.

Gale_Andrews · September 3, 2010, 1:20am

I have tried SoundFinder with lots of tracks of various lengths at 44100 and 48000 Hz, with silences a few seconds long (integer and fractions). If the minimum is set to the length of the silence, the silence isn’t treated as a break. So I think on balance because there is no indication that there is an accuracy limitation, and because people may switch between Sound Finder and Silence Finder to see which suits them best, it would be better for Sound Finder as well not to use the term “minimum” for the duration field.

I agree in the real world it often won’t matter but I’m sure there are utilities around that place known periods of silence into audio for one reason and another and in those cases a “minimum” wording for duration could be frustrating.

So given you don’t feel strongly either way, I’ll commit Silence Finder and Sound Finder with your accuracy improvements (but leave the duration slider minimum unchanged) and use the best wording I can figure for the duration in Sound Finder.

Gale

steve · September 3, 2010, 5:11am

“Detect Silence Longer Than” ?