wild blue-sky idea: spoken word transcript ingestion

I routinely use a service provided by Wreally (no commercial interest, I’m just a customer of theirs) that provides pretty accurate plaintext transcript of spoken-word audio. You upload mp3, you get text file. Moreover, you can get text file with timestamps embedded at any interval of your choosing. So I’m often working with the transcript on my iPad next to the desktop monitor, or in another window on the desktop monitor, finding selected bits of a 2-hour interview in the text editor, getting the reference times, and then instantly locating them in the audio using Audacity. So the transcript says at 1:25:05 the interviewee started talking about what to feed your pet gerbil, and bingo I can scroll to 1:25:05 to find “the gerbil feeding advice bit” in the audio.

So of course after doing this a few hundred times it occurs to me, how cool would it be if Audacity could read a plaintext file with embedded timestamps and actually display that text below the audio tracks (or above them) in a special transcript track, which would only become legible when you were zoomed in far enough that the text would fit in the available space? So there was a kind of tickertape of transcript running along below or above your audio tracks? Not markers, but a continuous banner? And maybe a popup text window of the transcript, all of it clickable so you just click on the words you want and the audio automagically scrolls to that location.

I know, dreadful things happen as soon as you think about editing such a hybrid mess. Maybe using this feature automatically makes that project a reference file, not editable, only used as the source from which to copy clips to paste elsewhere? It would just be so nifty to be able to see the transcript in the same app with the audio signal. In my fantasy world, the AI that ingests the text is so darned smart that it’s able to correlate the syllables of the words with the audio envelope… [cue insane laughter]

It’s one of those 3AM “bright ideas” that probably is completely impossible to implement, yet it would be so very cool… OK that was fun, now back to reality.

I suspect that the proposal as you describe it, would be such a huge amount of development work for a niche application, that you would be unlikely to attract any developers to implement it (unless you could persuade a “Wreally” developer to do so :wink:).

I think the closest that we get to that is “labels” (see: https://manual.audacityteam.org/man/label_tracks.html)
Label tracks can be imported from text files (times are in seconds): https://manual.audacityteam.org/man/label_tracks.html#export
There is also the “Label Editor” that allows you to jump to any label in the list: https://manual.audacityteam.org/man/labels_editor.html