Hi!
I need to detect pitch in a speech signal.
I found this autocorrelation based pitch tracking technique trough audacity (plot spectrum, algorithm enhanced autocorrelation) and implemented it into an application I'm developing: http://legacy.spa.aalto.fi/u/mak/PUB/SAP2000Tolonen.pdf
Unfortunately, it occasionally finds multiples of the correct fundamental frequency, like in: http://i.imgur.com/W6DkWE9.png
The correct peak is 178hz/F3, but it finds 690hz/F5, both in audacity and in my implementation.
How could I work around this problem? Do you suggest using another method for tracking the frequency? It doesn't have to be realtime but it should be very accurate.
Enhanced autocorrelation and wrong peak
Forum rules
If you require help using Audacity, please post on the forum board relevant to your operating system:
Windows
Mac OS X
GNU/Linux and Unix-like
If you require help using Audacity, please post on the forum board relevant to your operating system:
Windows
Mac OS X
GNU/Linux and Unix-like
-
concerned__
- Posts: 4
- Joined: Sat Jan 03, 2015 10:45 am
- Operating System: Please select
-
Robert J. H.
- Posts: 3633
- Joined: Thu May 31, 2012 8:33 am
- Operating System: Windows 10
Re: Enhanced autocorrelation and wrong peak
The right algorithm depends on the type of music and the actual harmonic structure.concerned__ wrote:Hi!
I need to detect pitch in a speech signal.
I found this autocorrelation based pitch tracking technique trough audacity (plot spectrum, algorithm enhanced autocorrelation) and implemented it into an application I'm developing: http://legacy.spa.aalto.fi/u/mak/PUB/SAP2000Tolonen.pdf
Unfortunately, it occasionally finds multiples of the correct fundamental frequency, like in: http://i.imgur.com/W6DkWE9.png
The correct peak is 178hz/F3, but it finds 690hz/F5, both in audacity and in my implementation.
How could I work around this problem? Do you suggest using another method for tracking the frequency? It doesn't have to be realtime but it should be very accurate.
It requires sometimes more than one step to find the right fundamental depending on where it lies.
I would suggest that you add another resampling step in the auto correlation function.
Do you use 3 passes currently?
The result can be compared to e.g. the harmonic product or sum spectrum (Hps) or against the Cepstrum.
The Yin algorithm yields good results as well.
I think the enh. autocorrelation is also used in the Change Pitch effect (frequency box).
It's also worthwhile to implement quadratic interpolation (or at least oversampling) since the result will be more accurate at higher frequencies.
(the detectable frequencies get narrower from top to bottom, i.e.
24000
16000
12000
9600
8000 Hz and so on.
-
concerned__
- Posts: 4
- Joined: Sat Jan 03, 2015 10:45 am
- Operating System: Please select
Re: Enhanced autocorrelation and wrong peak
First of all, thank you for the detailed response 
Anyway, while I don't understand why, using 4 passess instead of one seems to give much better results:
1 pass: http://i.imgur.com/R06VEq6.png
4 passes: http://i.imgur.com/4fKvSIa.png
(I really need to add a scale and axis to the graph, sorry)
Speech, more specifically singing.Robert J. H. wrote: The right algorithm depends on the type of music and the actual harmonic structure.
You mean pruning passes as described in the paper i linked? If I'm not mistaken, those remove subharmonics, but here an harmonic gets detected.Robert J. H. wrote: It requires sometimes more than one step to find the right fundamental depending on where it lies.
Anyway, while I don't understand why, using 4 passess instead of one seems to give much better results:
1 pass: http://i.imgur.com/R06VEq6.png
4 passes: http://i.imgur.com/4fKvSIa.png
(I really need to add a scale and axis to the graph, sorry)
What do you mean by resampling step? Right now i calculate the autocorrelation with the FFT, and then I find the "precise" frequency by interpolating with a cubic interpolator at the peak and finding the local maxima of that polynomial (I don't fully understand how it works, but it seems to work very well).I would suggest that you add another resampling step in the auto correlation function.
If by passes you mean the pruning steps, I do 4 now because that seems to give the best results.Do you use 3 passes currently?
This seems really promising, I guess it wouldn't work if the right peak in the enhanced autocorrelation is 0 (gets completely cancelled out) though. Maybe, if no peaks are found, I could fallback to using a single pruning step, and then comparing the results again.The result can be compared to e.g. the harmonic product or sum spectrum (Hps) or against the Cepstrum.
I will read about it and try to implement this as well.The Yin algorithm yields good results as well.
I think the enh. autocorrelation is also used in the Change Pitch effect (frequency box).
I (think) I'm implementing this correctly.It's also worthwhile to implement quadratic interpolation (or at least oversampling) since the result will be more accurate at higher frequencies.
(the detectable frequencies get narrower from top to bottom, i.e.
24000
16000
12000
9600
8000 Hz and so on.
-
Robert J. H.
- Posts: 3633
- Joined: Thu May 31, 2012 8:33 am
- Operating System: Windows 10
Re: Enhanced autocorrelation and wrong peak
I speak of "passes" when I actually mean the pruning steps, i.e. resampling and removing negative values.
The original Toleran paper proposes more than 3 passes.
However, this may not be valid for all cases hence my proposal of double checking against Hps.
It might also be helpful to integrate spectral features like flatness, centroid entropy and other moments.
If you're working with e.g. short time fourier transform, you can as well apply median filtering or another statistical method to remove outlayers.
A gauss window would be very effective when the interpolation is log-based.
Nyquist has the Yin algorithm implemented, you could review the source for that.
The original Toleran paper proposes more than 3 passes.
However, this may not be valid for all cases hence my proposal of double checking against Hps.
It might also be helpful to integrate spectral features like flatness, centroid entropy and other moments.
If you're working with e.g. short time fourier transform, you can as well apply median filtering or another statistical method to remove outlayers.
A gauss window would be very effective when the interpolation is log-based.
Nyquist has the Yin algorithm implemented, you could review the source for that.