Finding Number of Syllables per minutes (Speech Rate)

Can Audacity help me get the information about the number of syllables per minute in the speech without me to manually count the syllables because the data set I have is about 9 hours long (altogether 90 different wav files of about 3-7 minutes duration each) and it is very time consuming and taxing to do this for each file. Is there an alternative. I need this for analyzing the data of my research.

Thank you,

The short answer is no. This is a job for a software package that can understand what’s being said. Audacity is far too simple for that. A designer elf may be able to program something that can count volume changes, but it still would not be able to correct its own mistakes and would be a slave to noise and volume changes. Many people want tools like this and also want the software to “clean up” the vocal because it’s buried in noise. That’s hard to do as well.


The best solution that I can think of would be to use a text to speech program such as “Dragon Naturally Speaking” (commercial non-free software) to convert the recording to text, then open the text in a word processor to get a word and character count. From the word count and character count you should be able to make a reasonable approximation of the number of syllables.

Text to speech programs require “training” to get the most accurate results, which may not be possible if the recordings are from different people, but if you use one of the better speech to text programs the default settings will probably be good enough for an approximation.

“Paul L” has done some work in Nyquist for recognising phonemes automatically ( but his work is probably not yet at a stage where it is really useful for giving you an answer.

I can try something in Nyquist.
I’ve had an idea how it could be done.
The code does the following:

  • The pauses are removed that lie under an arbitrary threshold.
  • The gapless sound is examined with regard to the cyclic character, i.e. if the sound is voiced or not.
  • The derivative shows every change from one state to the other.
  • We now count each time when the voice has its highest periodicity- under the assumption that it is a vowel within a syllable.
    The result depends greatly on the used threshold and the minimum pause length.
    But those can be calibrated if you’ve already processed some files.
    It may be necessary to compress the audio if the volume goes up and down.
    Are the files of a good quality?
    I presume those 90 files are all from different persons?
    A sample could perhaps be helpful, if it doesn’t violate any personal rights.
    Maybe you’ve recorded yourself already - with the same equipement/settings.

You’ve not said what sort of research you are doing, but if it’s language research you may already have the necessary tools to count the number of syllables from a transcript. Getting 9 hours of recording professionally transcribed will probably cost something like $600, but if you are at University you may be able to get some support from the University office services.

Here’s a first attempt which may need some proper adaptation to meet your needs.
I’ve tested it with a 1:30 h long audio-book chapter (mediocre mp3 quality).
I’ve only amplified it to 0 dB.
As I’ve mentioned, you probably want to set threshold by first testing on a few seconds - and counting the syllables for comparison.
It has a preview function to ensure if not any syllables are regarded as silence (at least not the whole one).
I hope it is useful (if you’ve not already changed to another help desk…)
Download the tool and save it to the Plug-in folder of Audacity. On the next start, it will appear in the Analyze menu.
rjh-syllable-count.ny (1.43 KB)

Nice one Robert. That works better than I expected.
As you say, it’s highly dependent on setting the dB level correctly, but after that it’s giving me results with better than 90% accuracy.

There was an illegal (non-printable) character at the beginning of line 30, so I’ve removed it and updated your posted file attachment.

Thank you Steve, I was just going to upload a corrected version.
Strange how such things always happen in the last moment.
Rather hard to find such intruders if the screen reader doesn’t speak the character.
The tool is really very simple, plenty room for improving.

Nowhere does the poster say the speech is clear and intelligible. I’m betting on short wave broadcasts recorded on cassette tape. These messages always come with a gotcha somewhere. Koz