Is there an objective way of measuring the quality of a voice recording ? Some operation that can take a voice recording and give me a figure? I am doing automated transcription and am trying to see if there is a correlation between input file quality and final transcription results.
I have been able to find SNR and ASNR. Is there anything else ?
Some operation that can take a voice recording and give me a figure? I am doing automated transcription and am trying to see if there is a correlation between input file quality and final transcription results.
I have been able to find SNR and ASNR. Is there anything else ?
There are lots of other measurements, like distortion, fi. But these do not tell you about the quality of the recording, but they allow to compare the quality of your gear. And most likely, your gear isn’t the limiting factor.
Distance to the mic, is more important than a very good mic. You can do fine with a mediocre mic, if you know the trade.
What you need, are reference tracks to compare with. And a pair of decent headphones and speakers. And, of course, lots of time to learn how to listen.
You need to get to know your speakers and you can only do that by listening to lots of examples. Good ones and bad ones. After a while, you’ll start recognizing what went wrong.
And then there’s human talent. A question like "How can I sound like … " comes up quite often on this forum. And the answer is “acting talent” usually.
There’s lots of things that can be measured, but probably the most important aspect of “quality” for a voice recording is “intelligibility”, which is “subjective”.
One thing that can be measured is “contrast”. That’s the difference in level between the voice and the background noise: http://manual.audacityteam.org/man/contrast.html
This is one of the measurements for which there is an actual “standard” (see: WCAG 2.0)
Another important aspect of voice intelligibility is the frequency bandwidth (range of frequencies). “Telephone quality” is often used to describe low bandwidth audio where the upper frequency limit is around 3000 to 4000 Hz. 3000 Hz (3 kHz) is about the minimum required for speech, though with a 3 kHz limit it can be difficult to distinguish between an “S” sound and an “F” sound. For clear speech, the upper frequency limit should be 7000 Hz (7 kHz) or more.
Here’s the spectrogram view of a low quality voice recording. You can see that the upper frequency limit is around 4000 Hz (4 kHz):