I’m working on semantical tagging of non-words in speech. I did a laugh detector and now I’m on to semantical tagging of hums. I do not mean artificial humming noises, I mean people saying “hum” or “hum hum” either as a way as saying “Wait, I’m searching in my mind for the next word” (fillers) or to express agreement, disagreement, and the neutral “I’m listening, go on”.
I already have a tool to record, visualise and analyse sound files (coded for the laugh detector) but I’m using Audacity to pre-process audio files so I was wondering about using it for the analysis and displaying the results. This would mean
- changing the display mode so that it doesn’t jump when the cursor reaches the end of the currently displayed audio section. Instead the cursor would stop in the middle and the audio wave starts scrolling underneath it.
- having a new window scrolling in synchronicity where tags could be displayed on demand in text mode. These tags would include the output of speech recognition and additional tags such as “h+” to indicate a agreeing hum. Tongue clicking, lip smacking, laughter, change of speaker, stutter, etc. could also have their tags.
This would make it easy for me to check the sound file and see where the tags are correct and where they are not.
It could then lead to a tool to clean speech audio files, for example before releasing an interview on the radio. From an AI perspective these tags give indications regarding the mood of the speakers. I work in CADIA (cadia.ru.is), this is part of my work. I’m a researcher there (who else would write a work-related post on a Sunday night?).
I’m interested in everyone’s feedback, but in particular I’m interested in hearing from Audacity developers. When you start looking into somebody else’s code there is always an acclimation period before you can start implementing a modification. How heavy is it with Audacity? Would you advise me it is worth my time and it will eventually make my progress faster or would you advise me to forget it and stick to my own code instead? The opensourceness is not an issue as my supervisor is in favour of open source and once we have published one or two scientific papers about our results we will make the code available anyway.
Mariane