extraction of clips and comparison to text

I am looking at a way from my program to jump into the middle of an audio recording and play a short clip, maybe 5 sec. I have a large audio speech library with the audio files and the transcribed texts. My purpose is to search the audio file for a particular text clip. I will have an approximate starting point in seconds where to look for the first audio clip. Then I use speech-to-text software to convert the audio clip to text, then compare with the transcript. After several iterations, I should be able to select the exact audio clip that matches the transcription text clip.

Can I do this? Without programming in audacity? I can get into the audacity code if necessary but would be great if not required. Thanks in advance for any help!

I don’t see how an audio editor (Audacity) is going to help you with that. And, this sounds like a programming task.

There is an [u]LRC[/u] file format that “tags” each line of text with a time code for synchronizing lyrics when you play a song. There are similar formats for creating video subtitles. There is also a format for karaoke with a time-stamp for every word, but that’s probably out of the question.

You’d have to tag every line of text, but then you wouldn’t need voice recognition. It would be a lot of work, but programming is a lot of work too! (You’d still have to do some programming, but it should be a lot simpler if you can directly search for the text.)

Here is part of an LRC file from my computer:

[00:58.93]It’s close to midnight
[01:01.36]and something evil’s lurking in the dark
[01:07.08]Under the moonlight
[01:09.51]you see a sight that almost stops your heart
[01:12.93]You try to scream
[01:15.54]but terror takes the sound before you make it
[01:20.96]You start to freeze
[01:23.63]as horror looks you right between the eyes,
[01:26.62]You’re paralyzed
[01:29.24][02:25.40]'Cause this is thriller, thriller night

SoX is a command line audio editing program. http://sox.sourceforge.net/
To play a specified time range in SoX you would use a command something like:

sox myfile.wav -d trim 30 =35

which will play the file “myfile.wav” starting from 30 seconds and stopping at 35 seconds.
-d means the default output device.

If your speech-to-text software supports piped input then sox will be able to pipe the chosen audio directly into your other application. (see “−−sox−pipe” in the sox manual http://sox.sourceforge.net/sox.html)