Audio/Video Transcription Software

This read-only archive contains discussions from the Adding Feature forum.
New feature request may be posted to the Adding Feature forum.
Technical support is available via the Help forum.
Robert J. H.
Posts: 3633
Joined: Thu May 31, 2012 8:33 am
Operating System: Windows 10

Audio/Video Transcription Software

Post by Robert J. H. » Sat Dec 29, 2012 8:01 am

I personally prefer to improve the transcription abilities of Audacity itsellf. As Gayle wrote, most of the features are already there. And a lot more can be done, even with a Nyquist plug-in.
When I am transcribing and translating conversations, I need the freedom to type in the results. This asks for further features, such as:
  • expanding pauses
  • Slow down of speech
  • Automatic repetition of certain chunks (paragraph>Sentences>phrases)
Unfortunately, There is no interaction with the plug-ins possible, after they are started.
The standalone version of Nyquist provides these features but has poor file conversion abilities, especially when dealing with compressed data types.
Some foot pedals offer the possibility to configure there pedals such that they are translated into a sequence of normal keyboard shortcuts. This means that the usual playback buttons within Audacity could be controled by the pedal.
Incidently, a lot of transcription software is freeware, it's the proprietary pedal that costs a lot.

Gale Andrews
Quality Assurance
Posts: 41761
Joined: Fri Jul 27, 2007 12:02 am
Operating System: Windows 10

Re: Audio/Video Transcription Software

Post by Gale Andrews » Sat Dec 29, 2012 11:32 pm

I split this from http://forum.audacityteam.org/viewtopic ... 81#p201581 .
Robert J. H. wrote:When I am transcribing and translating conversations, I need the freedom to type in the results. This asks for further features, such as:
  • expanding pauses
  • Slow down of speech
  • Automatic repetition of certain chunks (paragraph>Sentences>phrases)
Is expanding pauses addition of silence, if so where - centred on the existing selection or added before or after it?

As you know we have Transcription Toolbar, but it has several limitations. What features do you have in mind by "Slow down"?

Have you read this Proposal http://wiki.audacityteam.org/wiki/Propo ... ion_Editor ?


Gale
________________________________________FOR INSTANT HELP: (Click on Link below)
* * * * * Tips * * * * * Tutorials * * * * * Quick Start Guide * * * * * Audacity Manual

Robert J. H.
Posts: 3633
Joined: Thu May 31, 2012 8:33 am
Operating System: Windows 10

Re: Audio/Video Transcription Software

Post by Robert J. H. » Sun Dec 30, 2012 2:11 am

I didn't want to start a feature request. As you know, I look at things from a VI-person prospective. I am always concerned with the accessibility of Audacity's upcoming features. The dealing with labels is not yet fully satisfying and it is a long way to go till transcription in the proposed manner can be realized.
My plug-in solution is very simple in contrast.
Let's say, you load a record of a meeting into audacity.
You then start the plug-in and do some settings (more of that later).
The code starts to analyze the audio. It removes unwanted clicks and noise and boosts the speech relevant frequencies.
In the next step, the distribution of the pauses will be examined. Very long silent passages are reduced to a maximum length. The speech is now present as a single chain of words, phrases and sentences. Since we now know the length of the unabridged "soliloqui", we can make a guess how many words are to be contained in our cleaned audio.
From the length of the pauses we can make a guess where phrases and sentences start and end.
In the controls, the user can enter the repetitions for each speech element (phrases/sentences/paragraphs). Additionally, the pauses can be stretched. There may also be cases where the speaker is unusually fast, hence the slow-down routine (with same pitch).
You can now listen to the result or to the sound that was removed.
Since the transcription is done in an external program, a start offset is added to facilitate the change to the other program.
If the user writes all that is contained within a repeated element on a single line, the plug-in can do even mor:
In a second call, the text file can be imported and the plug-in can return the text in a label track. That can be done because the program already knows where the starting points of the phrases in the original audio were placed.
Note that the playback normally comes from within the plug-in itself but the corrected audio can of course directly be returned to Audacity (it replaces a copy of the original). The labels refer to the original and the import must be done immediately (in the same session) or the times are lost.
It is clear that such a plug-in only works with the english language because the average lengths of the different speech elements vary from language to language.
It's a pity that no interaction is possible during the plug-in execution (with the exception of stopping the playback), therefore, a lot of testing will be needed to put all in place. A lot of functions were omitted when Nyquist was implemented which now would be very useful. In the original, you can for example:
- set audio-markers during playback.
- get keyboard inputs and keystrokes
- read sliders
- etc.
But what do I cry over spilled milc?

Gale Andrews
Quality Assurance
Posts: 41761
Joined: Fri Jul 27, 2007 12:02 am
Operating System: Windows 10

Re: Audio/Video Transcription Software

Post by Gale Andrews » Sun Dec 30, 2012 5:31 am

Robert J. H. wrote:I didn't want to start a feature request.
You said
This asks for further features
Certainly no account will be taken of your views if you left your post where it was because there is no mechanism to do so. Feature Requests are for VI users too.

So you want a transcription "plug-in" or interface. But you want to do it in Nyquist not C++? And you don't think the Wiki Proposal is relevant?



Gale
________________________________________FOR INSTANT HELP: (Click on Link below)
* * * * * Tips * * * * * Tutorials * * * * * Quick Start Guide * * * * * Audacity Manual

Robert J. H.
Posts: 3633
Joined: Thu May 31, 2012 8:33 am
Operating System: Windows 10

Re: Audio/Video Transcription Software

Post by Robert J. H. » Sun Dec 30, 2012 1:29 pm

I do not ask for a plug-in, since I would write it myself.
The original post referred to a quest for developers of a transcription software, and as I said, I prefer that the transcription abilities in audacity itself are pushed.
A feature request that is not mature is most likely to fail and good ideas may therefore lost in a vortex, that's why I didn't want to start a seperate thread.
The "features" Ive been talking about should be included in the simple plug-in to even out the lack of playback controls during the execution.

It's pretty obvious that Audacity would have all facilities to make a simple manual transcription. However, the main problem is that Audacity can't be controlled from within an external program. The enabling of global keystrokes would solve a lot of issues connected to transcription and other tasks that involve remote playback and recording.
I vaguely seem to remember that global shortcuts were already proposed elsewhere but cast in the wind due to platform dependent issues, I don't know if this subject is pursuit any longer.

But there are other possibilities that could solve the problem.
Imagine that we had a "Writing mode". It's essentially a playback mode where the cursor is placed in a label track. The writing position would follow the audio timeline . As soon as you have entered one or more words the text is put in where the typing of the phrase has begun. It is clear that the normal shortcuts are disabled in this mode and replaced by special ones that let you jump back and forward, stop the playback and so on. This functionality goes hand in hand with an improved Label editor since the phrases have to be moved easily. For example, if you open the label editor, you are placed in the label list. with up and down you could select the label and wwith left and rright (shift for longer moves)you'd move them in the timeline while the you hear at the same time the audio that lies at the current position. That seems to me more intuitive than only changing the position with a direct entry in the time box.
There are for sure some good ideas in the above mentioned proposal in the Wiki and I don't say that it is bad at all. All I ask for is that we always have an eye on possible accessibility issues. It is the ongoing anticipation and respect for the special requirements of handicaped people that make Audacity an outstanding program in comparison to so called "professional" ones.

Gale Andrews
Quality Assurance
Posts: 41761
Joined: Fri Jul 27, 2007 12:02 am
Operating System: Windows 10

Re: Audio/Video Transcription Software

Post by Gale Andrews » Mon Dec 31, 2012 9:45 am

Robert J. H. wrote:[...]the main problem is that Audacity can't be controlled from within an external program. The enabling of global keystrokes would solve a lot of issues connected to transcription and other tasks that involve remote playback and recording.
I vaguely seem to remember that global shortcuts were already proposed elsewhere but cast in the wind due to platform dependent issues, I don't know if this subject is pursuit any longer.
Have you had your vote counted yet for "global shortcuts"?

The platform problem as I understand it is that widgets only supports global shortcuts on Windows.
Robert J. H. wrote:Imagine that we had a "Writing mode". It's essentially a playback mode where the cursor is placed in a label track. The writing position would follow the audio timeline . As soon as you have entered one or more words the text is put in where the typing of the phrase has begun. It is clear that the normal shortcuts are disabled in this mode and replaced by special ones that let you jump back and forward, stop the playback and so on.
Jump back and forwards where/by how much?

How does this differ from arrowing down to the label track so you can then type in the label track without using CTRL + B first?
Robert J. H. wrote: This functionality goes hand in hand with an improved Label editor since the phrases have to be moved easily. For example, if you open the label editor, you are placed in the label list. with up and down you could select the label and wwith left and right (shift for longer moves)you'd move them in the timeline while the you hear at the same time the audio that lies at the current position. That seems to me more intuitive than only changing the position with a direct entry in the time box.
OK that sounds like another feature request. However if you are moving labels don't you need to move them very accurately, not just in gradations of a few seconds?


Gale
________________________________________FOR INSTANT HELP: (Click on Link below)
* * * * * Tips * * * * * Tutorials * * * * * Quick Start Guide * * * * * Audacity Manual

Robert J. H.
Posts: 3633
Joined: Thu May 31, 2012 8:33 am
Operating System: Windows 10

Re: Audio/Video Transcription Software

Post by Robert J. H. » Mon Dec 31, 2012 12:07 pm

Gale Andrews wrote:
Robert J. H. wrote:[...]the main problem is that Audacity can't be controlled from within an external program. The enabling of global keystrokes would solve a lot of issues connected to transcription and other tasks that involve remote playback and recording.
I vaguely seem to remember that global shortcuts were already proposed elsewhere but cast in the wind due to platform dependent issues, I don't know if this subject is pursuit any longer.
Have you had your vote counted yet for "global shortcuts"?
No, I've not voted yet.

Gale Andrews wrote:
Robert J. H. wrote:Imagine that we had a "Writing mode". It's essentially a playback mode where the cursor is placed in a label track. The writing position would follow the audio timeline . As soon as you have entered one or more words the text is put in where the typing of the phrase has begun. It is clear that the normal shortcuts are disabled in this mode and replaced by special ones that let you jump back and forward, stop the playback and so on.
Jump back and forwards where/by how much?

Since one wants to rehear certain parts, the jump should be some seconds. I have 10 seconds for long jumps in my preferences. However, since new hotkeys are needed anyway, one could assign << 1 min to F4, 25 s to F5, 10 s to F6, whereas F7 to F9 have the mirrored values in the forward direction. The left and right arrows are of course also still available. Furthermore, the number keys (with a modifier) could be used to jump at 10% . . . 90% of the audio.
Gale Andrews wrote:How does this differ from arrowing down to the label track so you can then type in the label track without using CTRL + B first?

I am not working very much with labels, so it may well be that I am mistaken.
Maybe the procedure you are describing above can be used similarly. However, my entries always seem to start at 00:00:00 during playback. I should really take a look at the manual and study the label features... ;)
The actual difference is that you are continuously entering text which is automatically inserted at the current playback position. If one has entered a couple of words, these would afterwards snap to a regular (distance 2 s or so) or accurate label position. The program must provide a semi-intelligent mechanism to re-arrange the words if necessary and move them from one label container to the next. Since one can jump back, there should also be some kind of "punching in" or overwrite mode or just a second line/entry at the position that is already occupied - maybe the best solution.
Gale Andrews wrote:
Robert J. H. wrote: This functionality goes hand in hand with an improved Label editor since the phrases have to be moved easily. For example, if you open the label editor, you are placed in the label list. with up and down you could select the label and wwith left and right (shift for longer moves)you'd move them in the timeline while the you hear at the same time the audio that lies at the current position. That seems to me more intuitive than only changing the position with a direct entry in the time box.
OK that sounds like another feature request. However if you are moving labels don't you need to move them very accurately, not just in gradations of a few seconds?

Well, the start and end time boxes are still there. For the purpose of a transcription, it should be enough to move the labels 0.1 s and 1.0 s (left/right arrow keys with and without shift). If you're handicaped, it is so or so a hard job to place the labels accurately in the current appearance of the editor. At the moment, the label list can be left with the arrow keys, this would of course fall away (and be replaced with the Tab key, as it is the case in almost all dialog boxes).
Labels seem to be a permanently and broadly discussed subject, as some recent feature requests show.
It wasn't my intention to bring them in this discussion too but they seem to loom up evweywhere.
Wish you a happy New Year!

Gale Andrews
Quality Assurance
Posts: 41761
Joined: Fri Jul 27, 2007 12:02 am
Operating System: Windows 10

Re: Audio/Video Transcription Software

Post by Gale Andrews » Thu Jan 03, 2013 11:42 am

Robert J. H. wrote:
Gale Andrews wrote: Have you had your vote counted yet for "global shortcuts"?
No, I've not voted yet.
OK your vote will be counted.
Robert J. H. wrote:
Gale Andrews wrote:
Robert J. H. wrote:Imagine that we had a "Writing mode". It's essentially a playback mode where the cursor is placed in a label track. The writing position would follow the audio timeline . As soon as you have entered one or more words the text is put in where the typing of the phrase has begun. It is clear that the normal shortcuts are disabled in this mode and replaced by special ones that let you jump back and forward, stop the playback and so on.
Jump back and forwards where/by how much?

Since one wants to rehear certain parts, the jump should be some seconds. I have 10 seconds for long jumps in my preferences. However, since new hotkeys are needed anyway, one could assign << 1 min to F4, 25 s to F5, 10 s to F6, whereas F7 to F9 have the mirrored values in the forward direction. The left and right arrows are of course also still available. Furthermore, the number keys (with a modifier) could be used to jump at 10% . . . 90% of the audio.
Another VI user was making this same point recently, so again your vote could be counted.

F1 to F6 and F11 are not available by default of course, unless you envisage shortcuts only working in this "transcription mode".
Robert J. H. wrote:
Gale Andrews wrote:How does this differ from arrowing down to the label track so you can then type in the label track without using CTRL + B first?

I am not working very much with labels, so it may well be that I am mistaken.
Maybe the procedure you are describing above can be used similarly. However, my entries always seem to start at 00:00:00 during playback. I should really take a look at the manual and study the label features... ;)
CTRL + M is used for adding a label at the current playback position.
Robert J. H. wrote:The actual difference is that you are continuously entering text which is automatically inserted at the current playback position. If one has entered a couple of words, these would afterwards snap to a regular (distance 2 s or so) or accurate label position.
Why would you want to regularly space arbitrary words?



Gale
________________________________________FOR INSTANT HELP: (Click on Link below)
* * * * * Tips * * * * * Tutorials * * * * * Quick Start Guide * * * * * Audacity Manual

Robert J. H.
Posts: 3633
Joined: Thu May 31, 2012 8:33 am
Operating System: Windows 10

Re: Audio/Video Transcription Software

Post by Robert J. H. » Thu Jan 03, 2013 2:51 pm

The words wouldn't be spaced regularly, only the "containers". However, it may be better to set the end manually by pressing enter. Like this:
- Start writing mode and playback
- start typing (time stored)
- end sentence with enter (label inserted at stored time)
- Listen to Audio, hotkeys to jump back (in intervals or to last label)
- begin typing again at desired position (current time stored)
- end sentence with enter (or jump keys, label inserted)
- . . .
- leave mode with escape.

The other method would create the labels automatically every 5 s or so, if something was written during this time, otherwise discarded.
A label could also be closed after ",.?!:;" etc.
It really depends on what you want to do or achieve, there are a bunch of possible behaviours ponderable.

steve
Site Admin
Posts: 81651
Joined: Sat Dec 01, 2007 11:43 am
Operating System: Linux *buntu

Re: Audio/Video Transcription Software

Post by steve » Thu Jan 03, 2013 7:27 pm

Robert J. H. wrote:It really depends on what you want to do or achieve, there are a bunch of possible behaviours ponderable.
I've been pondering :grin:
The problem that I keep coming up against is that navigating labels via the keyboard is so limited. I really think this needs improving - for example, being able to jump to the first label after the current play position, then tab forward/backward from there (rather than always jumping to the first label in the track).
9/10 questions are answered in the FREQUENTLY ASKED QUESTIONS (FAQ)

Locked