Audacity for speech recognition and assimilated?

I’m working on semantical tagging of non-words in speech. I did a laugh detector and now I’m on to semantical tagging of hums. I do not mean artificial humming noises, I mean people saying “hum” or “hum hum” either as a way as saying “Wait, I’m searching in my mind for the next word” (fillers) or to express agreement, disagreement, and the neutral “I’m listening, go on”.

I already have a tool to record, visualise and analyse sound files (coded for the laugh detector) but I’m using Audacity to pre-process audio files so I was wondering about using it for the analysis and displaying the results. This would mean

  • changing the display mode so that it doesn’t jump when the cursor reaches the end of the currently displayed audio section. Instead the cursor would stop in the middle and the audio wave starts scrolling underneath it.
  • having a new window scrolling in synchronicity where tags could be displayed on demand in text mode. These tags would include the output of speech recognition and additional tags such as “h+” to indicate a agreeing hum. Tongue clicking, lip smacking, laughter, change of speaker, stutter, etc. could also have their tags.

This would make it easy for me to check the sound file and see where the tags are correct and where they are not.

It could then lead to a tool to clean speech audio files, for example before releasing an interview on the radio. From an AI perspective these tags give indications regarding the mood of the speakers. I work in CADIA (cadia.ru.is), this is part of my work. I’m a researcher there (who else would write a work-related post on a Sunday night?).

I’m interested in everyone’s feedback, but in particular I’m interested in hearing from Audacity developers. When you start looking into somebody else’s code there is always an acclimation period before you can start implementing a modification. How heavy is it with Audacity? Would you advise me it is worth my time and it will eventually make my progress faster or would you advise me to forget it and stick to my own code instead? The opensourceness is not an issue as my supervisor is in favour of open source and once we have published one or two scientific papers about our results we will make the code available anyway.

Mariane

I’m interested in everyone’s feedback, but in particular I’m interested in hearing from Audacity developers. When you start looking into somebody else’s code there is always an acclimation period before you can start implementing a modification. How heavy is it with Audacity? Would you advise me it is worth my time and it will eventually make my progress faster or would you advise me to forget it and stick to my own code instead?

Hi. I would expect using Audacity as the basis to be worthwhile for what you are trying to do. Getting Audacity to compile at all http://wiki.audacityteam.org/index.php?title=Developer_Guide is the main step. You should be able to get a long way towards that kind of mark-up using a label track. You might want to create a VAMP plug in for creating the labels, because that plug-in would then be usable in both Audacity and Sonic Visualiser. There are examples which you can use as templates.

Sorry for the long delay before a reply. Most talk about audacity development happens on the http://lists.sourceforge.net/lists/listinfo/audacity-devel audacity-devel mailing list. If you want more specific concrete information on extending audacity asking there is good. Possibly even better is to start a user: page on the wiki http://wiki.audacityteam.org/index.php?title=Audacity_Wiki_Home_Page and say more and put more specific questions there. Other people will then answer, point you to resources and comment on the talk page.