Newbie audio programmer smoothing and windowing functions.

I’m tinkering with writing a piano tuning program for my own personal use. Part of that process is a real-time pitch detection process whose aim is to detect which note on the piano is being played and to compare its frequency to a previously computed target frequency. Here is my current approach and current results. I’ve read a little bit about smoothing functions and windowing functions which may help to improve the accuracy of this process but I’m pretty clueless about which smoother and window would be best for this application.

  1. I’m currently using the Windows waveform API (e.g., waveInOpen()) to collect buffers of size 2^14 (1 channel, 44.1kHz, 16 bits, Samson CO1U microphone).
  2. I collect 4 such 2^14 buffers in a 2^16 buffer. As I get more samples, I discard the first sample and shift the others to the left. This gives 44100 / 2^14 (about 2.7) pitch estimations per second.
  3. I then run a FFT on the current 2^16 buffer.
  4. I convert the output of the FFT into power spectra at two granularities. The finer granularity uses 12000 bins which gives 1 cent resolution from 10Hz to 10kHz. I didn’t think such a fine resolution was useful for step 5 so I also create a power spectra with 200 bin granularity. The idea with this number was that it was about 2 bins per piano key. Maybe this should be exactly 88 bins centered on each equal temper frequency or maybe it should be an exact integer multilple…any thoughts?
  5. I’m currently using the HPS approach to pitch estimation. For this, you basically have 4 more power spectra of the 200 bin granularity where you divide the frequency by 2, 3, 4, and 5 respectively before adding to the spectra. Then you multiply the original 200 bin power spectra by the 4 others and scan the result for the biggest peak. The idea is that when you divide by 2, 3, 4 and 5 that the harmonic peaks in the signal will overlap and so when you multiply you get a large spike at the real frequency.
  6. The biggest peak in the 200 bin case gives me a small frequency range to search in the 12000 bin case for 1 cent pitch resolution.

This seems to work okay for notes in the middle of the keyboard. However, for the lowest notes on the keyboard it has the following behavior. Let’s say I play C1. It may first identify the note as G2 (the third harmonic of C1) and then a couple samples later it identifies the result as C2 and then a couple samples later it gets the correct key of C1. I have some spectra I can post of this process if that would help. The upper notes have some problems as well. I wouldn’t expect much width to the peaks for the 200 bin case but there still seems to be some width to them. Is this the so-called variance problem that might improve if a windowing function is applied to original signal to prevent a sudden truncation?

Also, I originally tried to use the WASAPI but didn’t have any luck. pAudioClient->Initialize always failed in exclusive mode (I made multiple attempts to get 1 channel, 44.1kHz…). In shared mode, none of the functions failed but the data returned by pCaptureClient->GetBuffer() didn’t look like a PCM signal. What format is this data supposed to be in?

All ideas for improvement appreciated!

Todd

The far upper and lower strings on a piano are a little tricky. The low ones are springs with multiple harmonics and the upper ones are trios, but they’re eight inches long and the hammer hit is a significant part of the sound. Hammer hits don’t have pitch.

You know Audacity doesn’t do anything in real time, right? You’re just posting cold with audio programming questions?

Koz

Yes. I know Audacity doesn’t do real time and this question has nothing to do with Audacity. It is sort of hard to find experts in this area though and “General Audio Programming” sounded pretty generic and not Audacity related. So, I thought I’d give it a shot. If you know a better forum to ask this question I’d appreciate a pointer in the right direction. I figured some of the audacity developers might hang out here and I didn’t want to start off posting on the audacity-devel mailing list.

some of the audacity developers might hang out here

We’re the moat. The developers are not interested in why your turntable doesn’t work right.
But should you decide to help in the developer effort, that could be interesting.
Koz

The forum is essentially a community help forum for Audacity users, so it is usually Audacity users helping Audacity users.

Edge effect are likely to damage the results, HPS is usually applied with an overlapping Hann window function.

Problems with very low piano notes are likely due to the very strong harmonics caused by the thick strings. Increasing the FFT size may help.

Problems with very high frequencies are likely due to the few available harmonics. The top note of a piano has a fundamental at around 4 kHz, which does not give many harmonics before you run into garbage hiss - also the notes are relatively short with a fairly substantial “knock” of the hammer on the strings. Pre-filtering to remove low frequencies may help.

These may be helpful:
http://dsp.stackexchange.com/
http://www.hydrogenaudio.org/forums/index.php?showforum=30