I’m new to this forum, I registered particualry to share a though about new feature which came to my mind.
I’m not a pro in music editing, but I’m a vocalist and I’m doing a lot of homemade recordings. An easy way to see how much “in tune” is my sample would be extremely usefull. I’m talking about some analysis plugin for audacity which would siplay something like this:
This is of course a vocal tuner, can be used to tune vocal tracks, this kind of plugins already exists, like autotalent by Tom Baran.
What I’m talking about is only a way to view how in tune is our sample, I’m talking about this chart of frequency over time. I know there are features like spectrum analyser et cetera, but those are much too complex for a simple musician. I do not need to view the exact frequency, this is too much for a vocalist who wants only to see how in tune is his vocal sample.
Another example of this kind of display feature:
Here is Canta:
Canta is a free software, but it doesn’t work well on Linux, plus it cannot analyse already registered samples, it works only in realtime with a microphone input.
I’m a computer science student, I’m starting my masters degree, so I’m not very experienced, but I’m not a newbie neither, I’m interested in developing this kind of feature for Audacity with additional help of course.
The plug-in is a Nyquist plug-in, which unfortunately have quite limited GUI options, so nothing as fancy as the pictures in your post. However, the algorithm could be implemented in C++ if a skilled C++ programmer were interested in doing so (my experience of C++ is very limited), or perhaps some clever way could be devised to provide a useful output even with the very limited GUI options in Nyquist. For example a series of labels showing the pitch, perhaps something like:
Thank for replying me so quickly. This plugin looks very interesting but yes, the GUI options are very limited, it is not as pretty but the problem is different, it does not display how much in tune is the sample.
On the examples I posted we can see how pitch changes over time, when it is exact, when is a little bit too high or too low, but in “Pitch Detect” we can only see if it is for example A#5 etc. So for example if vocalist makes vibrato and the pitch of his voice varies a little bit, this we cannot see in “Pitch Detect”.
I hope it is understable (sorry as you can see I am not English native speaker).
There is a physical problem with accurately plotting pitch over time. Some algorithms are “faster” than others, but all pitch detection algorithms that I’m aware of trade off pitch accuracy against time accuracy. Human hearing is quite remarkable in this respect in that it is possible to hear pitches very accurately over a very small time frame. When I see graphs that plot a changing frequency against time that shows tiny variations over very short periods of time, I am highly suspicious of how accurate the graph plot really is. For example. for FFT analysis, an accuracy of +/- 1.5 Hz at a sample rate of 44100 Hz requires a window size of over 16000 samples (about 0.4 seconds). For more accurate frequency detection, the time domain resolution becomes worse. For increased resolution in the time domain, the frequency detection accuracy becomes worse.
As you do not require real time analysis, the pitch detection plug-in could quite easily be modified to output numerical frequency and amplitude data over time, which could then be analysed, and possibly made into a graph, in some other application (such as gnuplot or Microsoft Excel). The default time resolution for the Pitch Detection plug-in is 0.2 seconds, but for a smoother graph, overlapping time windows could be used so as to output measurements each 0.1 seconds. Pitch detection is likely to be accurate to with about +/- 1 cent.
The FFT magnitudes can’t seldom be taken, just as they appear.
A Pitch tracking algorithm searches for the highest peaks and gets the (nearly) exact frequency and magnitude by some sort of interpolation.
Let’s say we have for the first 3 bins (at 0, 100 and 200 Hz) the following values:
0 > 0.01
100 > 0.5
200 > 0.45
The highest peak is at 100 Hz but we’ll use a quadratic interpolation to get the actuall position and the maximum of the peak.
(all three points are considered and a polynomial fitted to match these)
The position is nearly half way to the 200 Hz bin.
The tracker would now store: 140.7 Hz, Mag 0.544815.
That’s why we use windows and zero padding to get peaks that are easier decerneable from one another. But a Window size of 4096 (less than 0.1 s, depending on the overlap value) should be perfect for this application.
Across frames, linear interpolation is used for the magnitude. The phase value, on the other hand is much trickier to handle - cubic interpolation along with unwrapping is common.
Can’t I just detect all the peaks like Mr Robert J. H. said and by calculating time interval between the peaks calculate the frequency with a frequency from period simple formula? (T = 1/f)
The point which would represent this frequency on the time axis would be right in the middle between those two peaks.
Let’s say I already have all the points of this graph and I generated it using Gnuplot for example, would it be hard to integrate it in Audacity window the way it could be right below the sample and when we select some time interval on my graph it would be also selected?
You could do that, but it will probably be very inaccurate because complex waveforms frequently have multiple peaks within a single cycle of the fundamental frequency.
If we ignore for a moment the pitch detection algorithm (there are several algorithms that could be used, so for now if we just think of that as a black box), then I think the method would be to step through the audio, taking a “frame” (window) of “n” number of samples, send those samples to the black box and record the frequency “f” as detected by the black box. Then move the window along the time line by “t” seconds to get the next “n” samples. “n” would be chosen the suit the black box algorithm, and “t” would be selected to achieve the required time resolution. The frequency “f” values would need to be plotted some how.
“Would it be hard” depends on who is coding it It would be extremely hard for me as I’m a rank amateur with C++.
The “easiest” approach that comes to mind for full integration in Audacity would be to write a module that either hijacks a Pitch (EAC) track, or creates a new track type based on the Pitch (EAC) track code and then writes the “f: t” points into that track.
A less pretty way that uses Audacity’s existing GUI elements could be much easier.
I suppose the decision at this point depends on, how good is you C++ programming, how much time do you want to spend on this, and which is more important for you (a) producing a slick looking effect (along the lines of your initial examples) or (b) achieving something that is usable without spending a huge amount of time and effort on it.
What I’m thinking of is perhaps something like this (which could be written as a Nyquist plug-in and run in the current version of Audacity).
The upper track is the original vocal recording.
The label track in the middle is produced by the first run of the effect over the recording, The labels indicate the average pitch of each note to the closest semitone.
The bottom track is the result of applying the effect a second time, but this time it is applied to a copy of the original recording, and the waveform shows the amount of deviation from the “in tune” pitch that is shown by the label. A deviation of +/- 1 on the vertical scale could represent +/- 1 semitone. In this (fake) example the first note (Bb2) can be seen to be a bit flat and the second note (C3) is a little sharp. The third note (C#3) warbles a bit but is more or less in tune.