Discrepancy in zoomed display of the region and of cuts

Attached, see an audio track and a label track. I generated the labels with a Nyquist plug-in and was careful to define label boundaries midway between samples. The result of picking the right label is shown. Picking the left label highlights the complementary areas. The selected part of the timeline above the track display agrees with the label track. The boundary highlighted in the audio track appears one sample too far to the left.

If I then hit ctrl-i, the new cut appears to go THROUGH the extra highlighted sample of the audio track, not between samples. I am not sure which side the sample actually goes to. I hope it is the left side.
tracks001.png

Is that a question or a comment?
If it’s a question, could you be a bit more explicit in what the question is?

Well the question is, which region of time is really selected? And where is the cut boundary after ctrl-i, really? Because I don’t know which of two disagreeing displays to believe. This mismatch can’t be correct behavior!

The label selection is a selection of “time”. The audio track selection is a selection of “samples”.
The way that digital audio works is to “quantize” time into sample periods. You can’t have “half a sample”, so durations of sound are always quantized to a whole sample period.

The quantizing is NOT rounding to the nearest, because that produces (sometimes unexpected) “off by one” errors, but rather quantizing uses a “cut-off point” that is the mid point between samples.

  • If a sample is before the “cut-off point” at the start of a time selection, it is not included in the selection.
  • If a sample is exactly on, or after the “cut-off point” at the start of a time selection, it is within the selection.
  • If a sample is before the “cut-off point” at the end of a time selection, it is within the selection.
  • If a sample is exactly on, or after the “cut-off point” at the start of a time selection, it is not included in the selection.

So, odd as it may look on occasions, both are correct, but they are showing different things. The Label track (and the Timeline) show (not quantized) time, whereas the audio track shows sample periods (quantized).

To see an example of how this works:

  1. Create an audio track with a sample rate of 1 Hz.
  2. Select about 30 seconds and generate a sine tone, frequency 0.1 Hz, phase 90 degrees:
(osc (hz-to-step 0.1) 1 *table* 90)
  1. Create the following labels:
(list '(2.5 12.5 "2.5 - 12.5") '(2.49999 12.49999 "2.49 - 12.49"))

When you select the “2.49 - 12.49” label, the output “looks” a little odd because the start and end of the selection are a tiny fraction before the quantize “cut-off point”.


Try it and see.

Please explain with reference to the picture above. My label boundary looks to be midway between samples. How is a label boundary time quantized into a cutoff point, truncation or rounding? How is it that the “cutoff point,” not directly shown, looks yet another sample time left of the boundary? How should I change my own arithmetic in calculating my label endpoint, so that the right label covers one sample fewer and the left one more? Should I put it at a calculated sample time and NOT worry about roundoff in that calculation?

It seems I tried to label at midpoints between samples to avoid off by one errors… and in my misunderstanding of things, that only created such errors. Some of the boundaries I get do not look like this.

I wrote my Nyquist to put my label boundaries midway between sample times. Was that a bad idea? It looks like roundoff error put my label boundary slightly left of a cutoff point, therefore truncation of that time to a cutoff includes one more sample. Is that right?

Then the zero-crossing finder I’ve worked on needs another revision.

[quote=“Paul L”]
And where is the cut boundary after ctrl-i, really?
[/quote]

Try it and see.

I did and I get these pictures and they confuse me. It SEEMS that clips are drawn so that their edges pass directly through samples. Therefore a gap of one sampling interval appears between them. But this gap is either highlighted or not, depending on which of the clips is selected. THAT behavior, I find confusing, and I don’t know whether to call that a display bug. Is the leftmost sample of the rightmost clip a part of the second selection or not?
tracks007.png
tracks006.png

In Nyquist, assuming that you are using floats and not integers, label positions are calculated to 32 bit float accuracy (single precision).
In Audacity label positions are calculated to at least double precision (possibly greater).

The “cut-off” point (inter-sample division) is mid way between the time values of each sample. Sample positions are calculated to either double precision or a native 64 bit integer format (depending on context, but either way it is better than 32 bit float accuracy).

Yes they appear to be “on” the cut-off boundary, but there is no way that I can see the difference between 0.999999999999 and 1.000000000000 or 1.000000000001.
To guarantee sample accurate positioning of labels, you need to ensure that the label positions are within the sample period. If you use the (exact to 32 bit float) sample position, then that will be within a sample period.

Example:

;; The first sample is at time = 0
;; so the nth sample will be at sample time
;; (/ (1- n) *sound-srate*)
(setq start-sample 10) 
(setq end-sample 20)

; To include the 10th sample, we need to select it
; at time (/ (1- n) *sound-srate*)
(setq start-time (/ (1- start-sample) *sound-srate*))

; to include the 20th sample we need to select it
; so set the cut-off at the 21st sample
; [so that the 20th sample period is entirely within the selection]
; at time (/ n *sound-srate*)
(setq end-time (/ end-sample *sound-srate*))

; create label
(list (list start-time end-time ""))

Note that in the above example, the Selection Toolbar will show the Selection Start as “9 samples” (because there are 9 samples before the start of the selection) and the end of the selection as “20 samples” (because there are 20 samples up to the end of the selection) and the selection length as “11 samples” (because there are 11 samples within the selection).

If a “cutoff point” is defined as the midpoint between samples, then how can a sample ever be “exactly on” a cutoff point?

If “cutoff point” means the rounding of a label boundary to the nearest sample time, I can make sense of this list, but you have told me that is not what it means.

Count me still confused.

As for the display and how to interpret it: these pictures make it seem that midpoints between samples define the boundaries of colored regions of audio tracks, when there is no cut boundary. But when there is a cut boundary, the colored region extends to a sample, not a midpoint. Why that difference? How should I interpret what I see with a cut?

Change the term “cut-off point” for “time boundary” - does that make sense now?
I’m not offering “official documentation” here so please don’t expect highly refined and worked out wording. Try the examples - it should be self explanatory.

Okay, thanks, putting those things in the Nyquist prompt, and trying variations, clarified things. I should just calculate the times of the m-th and n-th samples, to define a label that includes the m-th through the (n-1) th and excludes the n-th. I can err up to half a sample time either way and not worry about it.

But now, help me with another confusion: the first argument of osc is invariantly in steps, but the second argument, the duration, has local time units (i.e. multiple of the selection length), and is not invariantly in seconds, right? That is what I conclude from varying it.

If that is right, then why is this the right expression to define a hann window (adapted from what you said here https://forum.audacityteam.org/t/help-me-understand-spectrograms/29230/4):

  (let ((window-freq-hz (/ window-length-seconds)))
    (mult 0.5
	  (sum 1 (osc (hz-to-step window-freq-hz)
		      window-length-seconds *sine-table* -90))))

Is this actually making a sound of more or less than a period, depending on the duration of s?

If more, and you pass it to snd-fft, no problem, because only as many samples of the sound are used, as are in the window width: but if less … ? Would code relying on this misbehave if the selection were less than a second?

Is this more correct:

  (let ((window-freq-hz (/ window-length-seconds)))
    (mult 0.5
	  (sum 1 (osc (hz-to-step window-freq-hz)
		      (/ window-length-seconds (get-duration 1)) *sine-table* -90))))

I’m sure there will be some exception to this :wink: but yes, that’s all you need to do.

Yes.

For “process” and “analyze” type plug-ins, yes, but not for “generate” type plug-ins.
For generate type plug-ins “local” time is in seconds, the same as global time, unless you modify the environment. Thus for a generate plug-in. (osc 60 3.0) will generate a sine tone of 3 seconds duration, irrespective of the selection length.

As I wrote in the original code example “; wlen is the windowsize in seconds (local time)”
So if used in a process or analyze type plug-in, you will need to either:

Calculate the frequency and duration relative to the selection length

;; calculate the duration relative to the selection duration
(osc 60 (/ 3.0 (get-duration 1)))

or evaluate the window behaviour in the default environment.

;; evaluate the behaviour in the default environment
(abs-env (osc 60 3.0))

In a generate type plug-in you can simply use:

(osc 60 3.0)

(assuming that you have not modified the environment, for example by stretching)

Some audio software places the “cut-off point” or “time boundary” effectively on the sample point, not between points. I still hold to the heresy this is less confusing than Audacity’s “Sphere of influence” behaviour.

I think that Audacity putting clip boundaries on sample points (and snapping the cursor to sample points when “Snap To” snaps to samples) is confusing in the context of otherwise having the time boundary between samples.

Even more heretical, and possibly not thought through, why not snap to samples as default if you are not snapped to any other format, and avoid the confusing and ugly display problems noted here?


Gale

Ooh that’s a cute workaround :ugeek:

In audio tracks we already do that, and I think that is the only sensible thing to do.

What we don’t currently do when “Snap To” is not enabled, is to snap the selection (hence labels) to sample periods. Quantizing the selection to “sample time” is more problematic because there may be more than one sample rate at play. Consider if we were selecting in two tracks. One has a sample rate of 44100 Hz, one has a sample rate of 48000 Hz and the Project rate is 100000 Hz. Which sample rate “should” we snap to, or should we snap to 1/44100th second in one track, 1/48000th second in the other track, and 1/100000th second in the Timeline and label tracks? Currently, when “Snap To” is enabled we snap to the Project Rate.

The problem that we have when considering very short time periods, is that there is a discrepancy between “analogue time” and “digital time”.
Consider a digital watch that displays seconds - there is nothing to distinguish the difference between the start of a second, the middle of a second and the end of a second. Unless we bias measurements and calculations based on our own (analogue) sense of time, the maximum precision of calculations will be 1 second.

In digital audio we usually represent samples as “dots”, which in one sense is misleading because a digital “sample” has exactly the same value in both “magnitude” and “sample number” from start to end. In fact it makes little sense to think of “start and end” of a sample because “1 sample period” is the smallest quantum of time, so the properties of a sample are not “start time, start value, end time, end value”, but simply “sample value and sample number”. The value and number define what a sample “is”.

When considering what the samples “represent”, the picture is slightly different. Assuming “perfect” conversion between digital and analogue, the sample values represent points on a curve and the curve will pass through each sample as plotted on a graph as amplitude / time. This is how samples are represented in audio tracks in Audacity (and many other audio editors).


It is arguable whether the “dots” should be at the start of the sample period, or the middle of the sample period (currently we put them at the start), and it is arguable whether to snap the selection to the dots, or mid way between the dots (we snap between dots). The important thing is that it must be entirely consistent so as to avoid “off by one” errors. We can see the drawback of selecting “between dots” in examples posted previously in this topic. Selecting “on the dot” has it’s own drawback in that it is ambiguous whether a sample that is “on the line” is “in” the selection or “outside” the selection (which is not ambiguous if we select “between” dots).

Blind users will probably wonder what all this fuss is about, because they are not thrown off by the visual appearance. If we disregard the graphical representation, what we are left with is a list of numbers representing a sequence of sample values. When we “make a selection”, we are selecting a sub-sequence from that list. An item from the list is either within the bounds of our selection, or not within the selection and there is no grey area.

You guys are debating display and selection behavior now, not suggesting anything with implications for how I generate labels in plug ins… right?

I’m glad I’m not alone in finding the zoomed display confusing.

We’ve possibly drifted a little way off your original point, but it is still definitely related very closely.

Thinking of “samples” as an indexed list (actually an array) of values (rather than of dots on a graph) may help to avoid much of the confusion when writing scripts.
The “nth sample” is the “nth item” in the list (since elements of lists and arrays are numbered from zero, this with be the “n-1” element)
The first sample in the list is at time = zero, the second at time = 1/sample_rate, and so on.
To convert from “analogue” time to sample number, round to the nearest.

So, for example, to “read” the nth sample (the n-1 element in the array), the sample time is (n-1)/sample_rate (array elements are numbered from zero)

;; generate 11 samples from 0 to 1
(setf s-array #(0 1 2 3 4 5 6 7 8 9 10))
(mult 0.1 (snd-from-array 0 *sound-srate* s-array))



;; read the sample values
(abs-env
  (dotimes (i 11)
    (print (sref s (/ i *sound-srate*)))))

Or with a prettier output:

;; read the sample values
(setq output "")
(abs-env
  (dotimes (i 11)
    (setq output
      (format nil
        "~asample # ~at element # ~at value ~a~%"
        output
        (1+ i)
        i
        (sref s (/ i *sound-srate*))))))
output

or if we want to “include” the 3rd, 4th and 5th samples with a labelled region, then we need to select from 2/sample-rate to 5/sample_rate

("sr" = sample rate)

Sample#  Array#   time
1         0       0
2         1       1/sr
---- from here ----
3*        2       2/sr
4*        3       3/sr
5*        4       4/sr
--- to here -----
6         5       5/sr

So it doesn’t much matter whether we select from 1.5/sr to 4.5/sr, or from 2.4999/sr to 5.4999/sr, as long as the quantized (rounded) values select from: 2/sr to: 5/sr

(let ((sr *sound-srate*))
  (list 
    (list (/ 1.5 sr) (/ 4.5 sr) "")
    (list (/ 2 sr) (/ 5 sr) "")
    (list (/ 2.4999 sr) (/ 5.4999 sr) "")))

Yes I was suggesting to snap selections to sample periods even if Snap To is off.

But visually it’s unconvincing to say that we snap selection to sample periods if Snap To samples is on. It looks like we snap to samples. If Snap To samples is off, you can drag the selection as finely as you can move it. If Snap To samples is on, you can only drag the selection so that it aligns with a sample point.This produces the ugly “gatelegged” appearance and the confusion Paul has described.

As another confusion, whether Snap To is on or off, you don’t get the pointing hand when hovering over the selection edge in the waveform - the selection edge has effectively moved into the Timeline.

It’s just as well the Manual doesn’t try to document this.

Consider this image:
gatelegged audio and label selections.png
I see that the selection in the 9000 Hz track is snapping between sample dots (fair enough on Audacity 's own terms) and because the audio track above is at a higher rate and also obeys snapping between sample dots, that selection is smaller.

In that image, I don’t see why having to choose which sample rate the selection snaps to would prevent the selection in the label track also snapping in-between sample dots (if that is the choice we make).

Also I’ve known several people claim strongly that a scenario as above where selections are of different lengths in different tracks causes clicks when you cut, though I have never proved it. Looking at that image makes me wonder if the selection should be quantised to the lowest track rate so as to prevent potentially large time selection discrepancies.

I don’t think it’s any more confusing than the question of where a “full” beat ends that we two amused ourselves with recently :slight_smile: Certainly no more confusing that clicking to Snap To a sample aligns the cursor with the right edge of the dot.

If all selections were to snap to dots instead of between them, then I suggest colouring the samples that are included in the selection differently.


Gale

I think that at this stage of the game, the chances of changing the relationship between selection time and samples is zero. Conversion between time and samples goes deep into the code As mentioned previously it must be consistent throughout so as to avoid “off by one” errors, and that has taken a huge amount of time and effort.

The one thing that could be done, is to just make a change in how we represent samples graphically. Instead of putting the dots at sample times, we could put dots in the middle of sample periods (which is the “cross-over” point that the selection snaps to). As an example, if we have a sample rate of 1 Hz: Currently the dots are shown at 0.0, 2.0, 3.0, 4.0 … and selection “snap to” positions are at 0.0, 0.5, 1.5, 2.5, 3.5… If the dots were moved to the middle of the sample periods, then the first sample would be at 0.5 rather than at 0.0. The dots would be at 0.5, 1.5, 2.5, 3.5… (the same as the “snap to” positions). This would require no code changes other than the code that displays the waveform, but I still expect there will be no developer interest (and quite probably there will be some opposition). There is still the conceptual problem of “does cutting a selection from 0 to 0.5 delete the first sample” (which is more or less where this topic started).

I’m sure this view is correct, but may not necessarily be correct if there is a complete rewrite for any future mobile version of Audacity. A lot would depend how efficient the current scheme is at waveform display (the current slowness of waveform display is a major obstacle to a mobile version).

The terminology is getting confusing here. Selection in the audio track snaps to this cross-over point (whether Snap To is off or on), but (if Snap To samples is on) selection in the label track snaps to the sample points.

The audio selection snap to positions, that are snapped to whether Snap To is on or off, yes?

What would happen to Snap To clicks which currently snap to the sample times (0.0, 1.0, 2.0 in this example)? Would they snap to the sample times as now, hence in-between dots if we moved the dots to the middle of the sample period?

I would probably wager a few pennies on that too.

And given that, and we would still have the “gatelegged” audio and label track problem, I am not sure if this helps overall or not.

Although I suggested that the selection in the label track should snap to sample periods, so removing the “gateleg”, I would not make a complete fresh start from there. I would say that audio (and label track) selections (and clicks) snap to the sample times if not snapped otherwise. Dots are at the sample times.

Wavosaur has dots at the sample times and clicks and selections snap to the sample times. Goldwave does not seem to have sample dots but seems to snap clicks and selections to sample times.

CoolEdit is almost like Audacity. Dots are at the sample times but selections snap to the cross-over points between sample times. Clicks snap to the crossover points, which seems to make more sense in that context than snapping to the sample times. A half-sample period is shown behind zero which makes it clearer what is going on. Cue Ranges snap to the cross-over points. I think even this is clearer than Audacity.


Gale

Yes, that is the current behaviour.

There are no restrictions about where the Timeline selection snaps to - this snap position can be at any interval that we care to define.
The snapping position in audio tracks must be between sample times because a sample must be inside or outside of a selection and there is no room for ambiguity.

If the dots were drawn in the middle of the sample period and “Snap To” is on, then if the selection in the Timeline/Label track snapped to the dots it would be snapping to the same position as the selection in the Audio track (assuming that the audio track has the same sample rate as the Project Rate).

Current behaviour with “Snap To” enabled:

  • Dots mark sample times.
  • Selection in audio track snaps to the middle of the sample period.
  • Selection in the Timeline/Label tracks snap to the dots.

This combination produces a dogleg in the selection when the track sample rate is the same as the Project Rate - the audio track selection is offset from the Timeline selection by half a sample period.


Possible behaviour if dots marked the middle of the sample period:

  • Dots mark the middle of the sample period.
  • Selection in audio track snaps to the middle of the sample period.
  • Selection in the Timeline/Label tracks snap to the dots.

The only thing that we have really changed is “what the dots signify”. They now signify “sample periods” rather than “the time of a sample”.
The selection in the audio track will now snap to the same position as the dots.
The selection in the Timeline still snaps to the same position as the dots.
When the audio track has the same sample rate as the Project Rate, there is no dogleg - the selection in the Timeline is in line with the selection in the audio track.

I’m not convinced that this would be better than what we have now, but it would “look” neater, provided that the audio track sample rate is the same as the Project Rate (which it usually is).

OK so you now think that by moving the sample dots to the middle of the sample period, you could do what I asked for (have selections in the Timeline/Label tracks snap to the sample period crossover points, avoiding the dogleg) and without risking off-by-ones?

And do I assume by the same logic that mouse clicks in the audio track would “snap to the dots” as now, so would now snap in-between sample periods rather than at sample times?

if so, then we would have a scheme that is more consistent, intuitive and hence more documentable. So I would see it as a potential advance. It is not the Wavosaur model which I think is the easiest to grasp (providing you could visually demonstrate when a sample was in the Timeline selection and when not).

Subject to my questions above, it seems to be a considerable improvement. Clicks and both types of selections would snap to the middle of the sample period, and that position is where the sample dots are.

The big drawback is moving the sample dots away from the sample times, which is about the only thing that is easily grasped with the current scheme :sunglasses: . There would probably be a mistaken assumption that we were snapping to sample times.

It could possibly be ameliorated by using a different representation than a dot e.g. ][ (essentially an I-Beam with some kind of impression that it was two boundaries attached to each other). If you imagine those in a waveform

 ][   ][    ][                             ][  ][ 
                  ][    ][          ][    
                               ][

it’s not totally unconvincing (to me). But before voting for that (as opposed to “mouse clicks and timeline/audio selections snap to sample times, dots at sample times” which needs a complete makeover) I’d want to be sure there was no risk.


Gale