First of all, thank you for the detailed response 
Speech, more specifically singing.
You mean pruning passes as described in the paper i linked? If I’m not mistaken, those remove subharmonics, but here an harmonic gets detected.
Anyway, while I don’t understand why, using 4 passess instead of one seems to give much better results:
1 pass: http://i.imgur.com/R06VEq6.png
4 passes: http://i.imgur.com/4fKvSIa.png
(I really need to add a scale and axis to the graph, sorry)
I would suggest that you add another resampling step in the auto correlation function.
What do you mean by resampling step? Right now i calculate the autocorrelation with the FFT, and then I find the “precise” frequency by interpolating with a cubic interpolator at the peak and finding the local maxima of that polynomial (I don’t fully understand how it works, but it seems to work very well).
Do you use 3 passes currently?
If by passes you mean the pruning steps, I do 4 now because that seems to give the best results.
The result can be compared to e.g. the harmonic product or sum spectrum (Hps) or against the Cepstrum.
This seems really promising, I guess it wouldn’t work if the right peak in the enhanced autocorrelation is 0 (gets completely cancelled out) though. Maybe, if no peaks are found, I could fallback to using a single pruning step, and then comparing the results again.
The Yin algorithm yields good results as well.
I will read about it and try to implement this as well.
I think the enh. autocorrelation is also used in the Change Pitch effect (frequency box).
It’s also worthwhile to implement quadratic interpolation (or at least oversampling) since the result will be more accurate at higher frequencies.
(the detectable frequencies get narrower from top to bottom, i.e.
24000
16000
12000
9600
8000 Hz and so on.
I (think) I’m implementing this correctly.