SMSMS algorithm inquiry

Hi Dev-Team and other people!

I want to CUDAfy Audacity’s time-stretching algorithm, so it can run parallelized on NVIDIA GPUs.

Therefore I’d need insight in the current implementation of SBSMS time-stretching (the one without pitch shifting).

On the technical side, it’s as good as completed, but I need to write / find a kernel that does that.

What works so far:

:heavy_check_mark: Read audio data as float-array

:heavy_check_mark:Move data between Host ↔ GPU-Device

:heavy_check_mark: Tranform data with Fast Fourier Transformation (FFT) forward and inverse (IFFT)

:heavy_check_mark: Normalizing after FFT → IFFT → ‘same’ data as input

Issues:

:x: When in FFT form, i cannot come up with an algorithm that stretches the now float2-array (on device) via a kernel.

I need an algorithm like:
“When in FFT-form, multiply each float2-value with float factor”

but I know it’s not that easy.

It would be very appreciated if you could help!

I’ve worked with Gemini and ChatGPT to create a kernel that at least compiles and does not fail with factor=1.0 (no stretch):

#ifndef M_PI
#define M_PI 3.14159265358979323846f
#endif

extern “C”
{
global void StretchWSOLA(int N, float2* inputArray, float factor)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
int halfN = N / 2; // 50% Overlap-Size

    if (i < N)
    {
        // Berechneter Index für Frequenzverschiebung
        float newIdx = i / factor;
        int lowIdx = (int)floor(newIdx);
        int highIdx = min(lowIdx + 1, N - 1);
        float weight = newIdx - lowIdx;

        // Frequenzinterpolation (Amplitude)
        float2 interpolated;
        interpolated.x = (1.0f - weight) * inputArray[lowIdx].x + weight * inputArray[highIdx].x;
        interpolated.y = (1.0f - weight) * inputArray[lowIdx].y + weight * inputArray[highIdx].y;

        // PHASENKORREKTUR für Overlap-Add
        float phaseLow = atan2f(inputArray[lowIdx].y, inputArray[lowIdx].x);
        float phaseHigh = atan2f(inputArray[highIdx].y, inputArray[highIdx].x);
        float phaseDiff = phaseHigh - phaseLow;

        // Phasensprünge korrigieren
        if (phaseDiff > M_PI) phaseDiff -= 2.0f * M_PI;
        if (phaseDiff < -M_PI) phaseDiff += 2.0f * M_PI;

        // Zielphase berechnen mit Phase-Locked Adjustment
        float targetPhase = phaseLow + weight * phaseDiff;
        float magnitude = hypotf(interpolated.x, interpolated.y);
        interpolated.x = magnitude * cosf(targetPhase);
        interpolated.y = magnitude * sinf(targetPhase);

        __syncthreads(); // Synchronisation für parallelen Zugriff

        // **Speicherung mit 50% Overlap-Add**
        if (i < halfN)
        {
            inputArray[i] = interpolated;
        }
        else
        {
            // **Weiches Crossfade für Overlap-Region**
            float fadeFactor = (i - halfN) / (float)halfN;  // Werte zwischen 0 und 1
            inputArray[i].x = (1.0f - fadeFactor) * inputArray[i].x + fadeFactor * interpolated.x;
            inputArray[i].y = (1.0f - fadeFactor) * inputArray[i].y + fadeFactor * interpolated.y;
        }
    }
}

}

^^^ This one creates fragments and stutter, sadly.

The upstream source code for the SBSMS library is here: GitHub - claytonotey/libsbsms