I surmized that the amount of returned Frames would also depend on the magical number 1020 from the internal buffer (not the case , see the code).
An example, where this number has an influence would be with ‘(snd-avg s 1020 1020 op-average)’:
Your Sound has
1020 samples > 1 returned value
1020 + 509 samples > 1 value
1020 + 510 samples > 2 values.
This means that the last Frame is only returned if the remaining samples are equal to half of the length of the (internal fetch-) buffer.
However, this seems not to be the case with fft.
It is enough when one sample remains to calculate the last Frame.