Voice to midi

joan21749 · March 2, 2009, 9:22am

Hi!
I am involved in a project for converting “voice” samples into midi piano-notes.
I am recording 1 sec of voice into windows wav file, using no-good microphone,
and then running FFT on 4096 “chunks”.

I am facing (at least) 2 problems:

my microphone do not recognize properly low frequencies. That is,
sometimes it “shifts” frequency by whole octave!
The piano-notes are not linear (they are exponential).

Also, the FFT amplitudes look very “sharp”…

Any ideas?

Thanks a lot in advance!
j.

kozikowski · March 4, 2009, 5:55pm

That’s a good one. I can’t even project how I would do that if somebody paid me a lot of money.

I know exactly how to turn a voice into piano notes, any vocoder plugin can do that, but going from an electrical signal vibrating at 440 cycles per second with overtones every octave or so, into a series of MIDI instructions is rough.

Instrument=Grand Piano <003>
Press “A” below middle C with great force
Hold it for three seconds
Release with sustain for ten seconds.
Stop.

That’s a simplified MIDI block. MIDI isn’t sound.

Let’s see what everybody else thinks (I’m guessing not much or there would be a comment already).

Koz

joan21749 · March 5, 2009, 11:24am

Hi Koz,

thanks a lot for answering…
I tried some tests and I understand that “microphone response curve”
is important in pre-processing the signal.

Also, passing to decibels (just set db = log(of fft amplitude)) helps!

My approach was very stupid: I just tried to find the MAXIMUM of amplitudes
(or log-amplitudes) returned by fft, and then, to choose this point
as the Base frequency = required piano note.

Are You suggesting to look at overtones (harmonics)?
How to do that?

Best,
j.

kozikowski · March 5, 2009, 6:31pm

<<<Are You suggesting to look at overtones (harmonics)?>>>

I’m suggesting you should tell me how you’re getting from digitized audio signals to MIDI instruction sets. That’s the hard part. Everything else you’re doing is pretty straightforward audio processing.

Once we know the end, we can suggest ways of getting there.

Koz

joan21749 · March 6, 2009, 1:35pm

First, thanks again for answering.
I have some other (possibly interesting) questions for this forum…

Returning to “voice to midi”:
Using a simple microphone, Im recording (say) 2 seconds of voice
with 8000 Hz rate and 16 bit resolution. Next, I have Fourier transform
with (say) 4096 points. By Nyquist and other reasons (I am not a good singer)
I am considering only amplitudes from 60Hz to 1500Hz, converted to Db (ie log’s).
Im trying to find the “Base” frequency. Having this, convertion to MIDI note is simple.
Just find the piano note with closest (to the base) frequency.

My (stupid) approach was to look at the frequency index
with MAXIMAL amplitude. It doesn work correctly!
Displaying such a spectrum, shows that overtones (in voice) are also high!
So, sometimes overtone frequency is choosen instead of the base frequency!

If I undestand you correctly, I must search for entire “sequence” of maxima
in amplitude array…

Best,
j.

Transcriber · March 6, 2009, 4:59pm

I’m not sure, but I think you misunderstood Koz.

I don’t think he was trying to tell you that you had to consider or focus on overtones, but rather that the “conversion” to midi was not at all a “direct” sound-in-one-format to sound-in-some-other-format, but rather requires the creation of midi STATEMENTS which correspond correctly to the desired voice pitch and duration.

he said:

I know exactly how to turn a voice into piano notes, any vocoder plugin can do that, but going from an electrical signal vibrating at 440 cycles per second with overtones every octave or so, into a series of MIDI instructions is rough.

Instrument=Grand Piano <003>
Press “A” below middle C with great force
Hold it for three seconds
Release with sustain for ten seconds.
Stop.

That’s a simplified MIDI block. MIDI isn’t sound.

So, if you have access to a “vocoder plugin”, that should, if I understand him, give you the piano NOTES, then it’s up to you to write a program that runs thru those notes and spits out midi “programming” statements which correspond to creating those notes for their durations.

Unless I’ve misunderstood, in which case…nevermind.

Transcriber

joan21749 · March 6, 2009, 6:09pm

Dear Transcriber,

I don’t understand exactly what do you mean by “Midi Statement”.
Midi uses (basically) only two commands: NoteOn and NoteOff.
Notes are coded by the numbers (say, from 1 to 255) .
To get frequency of a given note, use:

const
BaseFreq = 440.0; // 440 Hz = a4

function n2f ( n : integer ) : real; // midi note → freq (Hz)
begin
n := n - 612 + 3;
Result := BaseFreq * exp( nln(2)/12 )
end;

I want to “extract” the dominant frequency from voice sample.
“MAX amplitude” algorithm doesnt work…

I’am trying to think “positively”…
Koz’s suggestion seems to be ok…
Look at overtones! Great!

Best,
j.

Balaratnam · July 5, 2014, 1:21am

Hi Koz,
Could you kindly tell me how to turn a voice into piano notes using Vocoder plugin. I am trying to convert the vocal in a song to piano notes without much success.
Ben

kozikowski · July 6, 2014, 6:23am

I think maybe this isn’t what you actually want. The “conversion” is the result of combining a voice with a musical instrument to produce a talking guitar, for one popular example.

http://www.youtube.com/watch?v=YR_Bjf-a5rs&feature=kp

So rather than producing a tone or combination of tones, it just combines the character of the two sounds — the tones of the guitar with the enunciation of the voice. So while you’re expecting to get the vocal tones out, the Vocoder actually throws them away.

I wonder if you can take on faith that the voice is trying to sing a certain note and enter that note in the midi instructions and apply those instructions to a “vocal instrument” sample instead of a sampled piano or sampled oboe. There is the Vox Humana stop on the organ (human voice) and my keyboard has a “human chorus” effect. I used to enjoy forcing midi to play a tune on the wrong instrument.

I don’t know of any good way to tell which note a voice is projecting when the presenter is just speaking. If they speak in a monotone like I do, then there is no expression at all (I can put you to sleep faster than warm milk). You can drag-select any portion of my voice and Analyze > Plot Spectrum with an 8192 size and Log Frequency. Note the largest peak. That’s usually the fundamental.

Hope that helps.

Koz

Trebor · July 11, 2014, 9:44pm

Maybe try a [frequency] tracker plug-in like Alien Solo , ( it does voice to synth-organ , rather than voice to piano).

steve · July 14, 2014, 11:37am

Unfortunately I missed this topic first time around, but if “joan21749” is still watching this topic, please post to say that you are still here. This is an area that I’m interested in and I may have some useful information / research for you concerning it.