Enhanced autocorrelation and wrong peak

Hi!

I need to detect pitch in a speech signal.

I found this autocorrelation based pitch tracking technique trough audacity (plot spectrum, algorithm enhanced autocorrelation) and implemented it into an application I’m developing: http://legacy.spa.aalto.fi/u/mak/PUB/SAP2000Tolonen.pdf

Unfortunately, it occasionally finds multiples of the correct fundamental frequency, like in: http://i.imgur.com/W6DkWE9.png
The correct peak is 178hz/F3, but it finds 690hz/F5, both in audacity and in my implementation.

How could I work around this problem? Do you suggest using another method for tracking the frequency? It doesn’t have to be realtime but it should be very accurate.

The right algorithm depends on the type of music and the actual harmonic structure.

It requires sometimes more than one step to find the right fundamental depending on where it lies.

I would suggest that you add another resampling step in the auto correlation function.
Do you use 3 passes currently?
The result can be compared to e.g. the harmonic product or sum spectrum (Hps) or against the Cepstrum.

The Yin algorithm yields good results as well.

I think the enh. autocorrelation is also used in the Change Pitch effect (frequency box).
It’s also worthwhile to implement quadratic interpolation (or at least oversampling) since the result will be more accurate at higher frequencies.
(the detectable frequencies get narrower from top to bottom, i.e.
24000
16000
12000
9600
8000 Hz and so on.

First of all, thank you for the detailed response :slight_smile:

Speech, more specifically singing.

You mean pruning passes as described in the paper i linked? If I’m not mistaken, those remove subharmonics, but here an harmonic gets detected.
Anyway, while I don’t understand why, using 4 passess instead of one seems to give much better results:
1 pass: http://i.imgur.com/R06VEq6.png
4 passes: http://i.imgur.com/4fKvSIa.png

(I really need to add a scale and axis to the graph, sorry)

I would suggest that you add another resampling step in the auto correlation function.

What do you mean by resampling step? Right now i calculate the autocorrelation with the FFT, and then I find the “precise” frequency by interpolating with a cubic interpolator at the peak and finding the local maxima of that polynomial (I don’t fully understand how it works, but it seems to work very well).

Do you use 3 passes currently?

If by passes you mean the pruning steps, I do 4 now because that seems to give the best results.

The result can be compared to e.g. the harmonic product or sum spectrum (Hps) or against the Cepstrum.

This seems really promising, I guess it wouldn’t work if the right peak in the enhanced autocorrelation is 0 (gets completely cancelled out) though. Maybe, if no peaks are found, I could fallback to using a single pruning step, and then comparing the results again.

The Yin algorithm yields good results as well.

I will read about it and try to implement this as well.

I think the enh. autocorrelation is also used in the Change Pitch effect (frequency box).



It’s also worthwhile to implement quadratic interpolation (or at least oversampling) since the result will be more accurate at higher frequencies.
(the detectable frequencies get narrower from top to bottom, i.e.
24000
16000
12000
9600
8000 Hz and so on.

I (think) I’m implementing this correctly.

I speak of “passes” when I actually mean the pruning steps, i.e. resampling and removing negative values.
The original Toleran paper proposes more than 3 passes.
However, this may not be valid for all cases hence my proposal of double checking against Hps.
It might also be helpful to integrate spectral features like flatness, centroid entropy and other moments.

If you’re working with e.g. short time fourier transform, you can as well apply median filtering or another statistical method to remove outlayers.
A gauss window would be very effective when the interpolation is log-based.

Nyquist has the Yin algorithm implemented, you could review the source for that.