Time scaling curiosity

I found out that there does exist a millionth algorithm for time scaling, after Audacity, Paulstretch, SBSMS, Photosounder and ones from many other programs. It’s based on splitting the sound into a lot of overlapping chunks (for example 4096 samples, or x), windowing them, and spacing them x÷2 after each other. The picture demonstrates it:

https://i.imgur.com/NM3VuPc.png

So, to stretch a sound 2 times, I would split it into chunks of 4096 samples that occur every 1024 samples, then space them so that they are every 2048 samples.

And for timesquishing, the opposite would be done; the sound is split into chunks of x samples that occur every x÷2 samples, then they are spaced so that they are every x÷2÷n samples, where n is the squishing factor.

Can you invent a 1000001st algorithm?

Try looking up “SoundTouch Audio Processing Library”. I think you’ll find it interesting.

However, my algorithm has a flaw that when timesquishing, the output will be louder. So the solution would be to shrink the volume by the timesquishing factor, when timesquishing.

When I want to change the pitch up by an octave, I would timestretch with a window size of 8192, then double the sample rate. Changing the pitch by two octaves? I would timestretch 4× with a window size of 16384, then quadruple the sample rate.

And, when pitch downing by octave, conversely I would pick a window size of 2048…

Why these adjustments above? This is to properly adjust the final time and frequency resolution. With a too low time resolution, the sounds will be low quality; with a too high time resolution, the frequency resolution will cry.

What window size would YOU pick for my algorithm for time stretching, time squishing, pitch upping and pitch downing? This algorithm does not use CPU–intensive operations like the Fast Fourier Transform or the Short Time Fourier Transform.

If the window size is 360, the windowing algorithm is cos(x)×-0.5+0.5. What is this window type named?

Obviously one could use a sinc windowing, but it takes infinite time for loop–friendly time stretching or time squishing, unless the sinc algorithm automatically stops when the x86 CPU (or whatever CPU your non–Windows system uses) rounds the nearest sinc peak to 0, although that would still take a lot of time for the loop–friendly versions.

It’s called a Hann (or “Hanning”, or simply “raised cosine”) window.

Does the “S–Curve” fade multiply the selected sound by half of a Hanning window?

Assuming that you are referring to the “S-Curve” preset in the Adjustable Fade effect, then yes.

You may also be interested in this simple pitch shift algorithm. It is written in C and uses delay lines with overlapping triangular windows. The sound quality is not fantastic, but considering the simplicity of the code it is not bad: https://github.com/audacity/audacity/blob/master/lib-src/libnyquist/nyquist/nyqstk/src/PitShift.cpp

This algorithm is available in Nyquist as the PITSHIFT function: http://www.cs.cmu.edu/~rbd/doc/nyquist/part8.html#index498
When lowering the pitch, there tends to be high frequency artefacts, so applying a low pass filter to the result will often produce subjectively better sounding audio.
Example code that can be run in the Nyquist Prompt:

(setf shift-ratio 1.2)
(setf mix 1)  ;range is 0 to 1
(setf hf-cutoff 7000)
(lowpass8 (pitshift *track* shift-ratio mix) hf-cutoff)

Could you visualize it, just like how https://i.imgur.com/NM3VuPc.png visualizes the algorithm in the beginning of this topic?

It’s similar to your suggestion, but using triangle windows.

That makes sense. I believe the reason there might be high frequency noise with triangle windows is that the peaks of the triangle window no longer align with the beginning of the next window and the end of the previous window, so the waveform forms some sort of triangle wave noise. I think even a stream of 0.5 samples would have this high frequency noise. This wouldn’t occur with pitch–upping or time–stretching because the opposite is done in this case; the destination has the windows in alignment. Hanning is a smoother window, so the problem doesn’t occur that much with Hanning. And I believe the problem would be completely eliminated with a sinc window.

There is a page about a problem similar to this one: Resampling Filters and Fairness which seems to have the same underlying cause (infinitely many windows denser than 1 unit do not necessarily add up to a straight line).

Does Nyquist allow writing such a timescale/pitchshift filter with various windows, like box, triangle, Hanning, Lanczos–2, Lanczos–3, the cubic family (see What is bicubic resampling?), etc.? I would like such a plug–in so that I can test the quality of such a timescale/pitchshift with various windows (and window sizes, not the width, but something like area), to see which one will give the best quality. I think Lanczos–3 will give the least high frequency noise because it’s the closest to Sinc.

That would be possible, but not an easy project.

Making the windows is easy enough, for example:

(defun hann-window()
  (setf step (hz-to-step 1))
  (abs-env
    (sum 0.5 (mult 0.5 (osc step 1 *sine-table* -90)))))

(hann-window)

or

(defun lanczos-window ()
  (defun lanczos (x)
    (if (= x 0)
        1
        (/ (sin (* pi x))(* pi x))))
  (let* ((points 44100)
         (ar (make-array points)))
    (dotimes (i points)
      (setf x (1- (/ (* i 2.0) points)))
      (setf (aref ar i)(lanczos x)))
    (snd-from-array 0 44100 ar)))

(lanczos-window)

The tricky bit is fetching overlapping blocks of the audio, processing them, and sticking them back together.
Probably the easiest way to do that (though not very fast), would be to grab arrays of sample values with SND-FETCH-ARRAY
… something like (not complete code and may have errors, but just to give the general idea):

(setf step (truncate (/ *sound-srate* 2)))
(setf half (truncate (/ step 2.0)))
(setf ln (truncate len))
(setf ratio 0.6)

(defun tri-window ()
  (pwlv 0 0.25 1 0.5 0))

(setf out (s-rest 0))
(do ((ar (snd-fetch-array *track* step half)
         (snd-fetch-array *track* step half))
     (t0 0 (+ t0 0.25)))
    ((not ar) (force-srate (* ratio *sound-srate*)(extract 0 1 out)))
  (setf window (snd-from-array t0 (* ratio *sound-srate*) ar))
  (setf window (mult window (abs-env (at t0 (cue (tri-window))))))
  (setf out (sum out window)))

Does Nyquist allow windows with user parameters, such as cubic? (http://entropymine.com/imageworsener/bicubic/g/formula.png, where B and C are adjustable parameters)

Sure. As you noticed, the “Adjustable Fade” is in effect creating half a window, and has three adjustable parameters; “Start”, “End” and “Mid-fade Adjust”.

What windowing does Adjustable Fade use if the Mid–fade thing is aside from 0.5?

It’s a hybrid that interpolates between “y = x^n1 ↔ linear ↔ Hann ↔ y = Hann^n2”
where n1 < 1 and n2 > 1
The full description is the code https://github.com/audacity/audacity/blob/master/plug-ins/adjustable-fade.ny

I can’t quote your last message because of:

"Sorry, you have been blocked
You are unable to access audacityteam.org
Why have I been blocked?
This website is using a security service to protect itself from online attacks. The action you just performed triggered the security solution. There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.
What can I do to resolve this?
You can email the site owner to let them know you were blocked. Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.
Cloudflare Ray ID: 4320388b6a977bc0 • Your IP: 2a01:119f:21d:7900:89a7:4322:8b46:1724 • Performance & security by Cloudflare • "

Type

[quote]

and

[/quote]

then copy and paste, like this:

[quote]I can't quote your last message because of:[/quote]

which will appear as:

I can’t quote your last message because of:

or simply write:
“I can’t quote your last message because of:”

(not a test; below is the actual quote reply I wanted to do)

What do these badly drawn arrows mean?

Interpolates between the definitions either side.

So there are 3 interpolations. But where are they used?