Disappearing Labels

I’m using Audacity 2.0.4 as a transcription tool as part of my research in psycholinguistics. Basically, I used Audacity to record conversations between study participants, and I’m now attempting to transcribe their conversation word-for-word in a label track alongside the audio. After I have a recording fully transcribed, I export the label track to txt file so that I can some analysis on the number and length of the labels.

I’ve been running into a bug for where after I save the file and export the label track to txt, as I’m interacting with the txt file, I notice that several labels have gone missing. The missing labels are always the longer ones - usually 10+ seconds long. Frustratingly, they are always missing from both the label track in the .aup file and the exported txt file, meaning I generally have to go back in and re-transcribe anything that goes missing. Initially it seemed that it was a result of waiting too long between finishing transcription and exporting the labels, so as I’ve been going back and double-checking all of my previous transcripts for missing labels, I’ve been trying to make sure I always do them in one go. However, today I finished one and immediately noticed that several labels I just went over had already gone missing (some of them the same as the ones that were initially missing that I had just put back in, some of them new ones).

All the files involved are being synced between my home computer, laptop, and lab computer, if that’s likely to be causing the issue. However, turning off syncing doesn’t seem to fix anything, so I suspect it isn’t. It’s hard to say, though, because the issue appears to happen more or less at random - sometimes a file will turn up with a lot of missing labels, sometimes with few or none, and no obvious reason for it.

At this point though, since new labels have clearly gone missing after (and possibly as a result of!) my double-checking them, I don’t feel like I can trust that any of the files are accurate and complete after I save them, and I’m concerned for the integrity of my data. What can I do to stop these labels from disappearing??

The current Audacity version is 2.0.5, but what you describe is not a known problem.

Exactly how are you interacting with the TXT file, using what application?

Are you suggesting that labels disappear from the label track while the project is open?

I don’t understand how editing the TXT file could affect the AUP file. Is it something a synchronisation program does?


Gale

However, today I finished one and immediately noticed that several labels I just went over had already gone missing (some of them the same as the ones that were initially missing that I had just put back in, some of them new ones).

We have to build this in our imaginations as we go. So you put several labels into their proper positions on the timeline of a recording and then, while you were watching the screen a label or two just vanishes? Just the text or the text, tag, position and all — like it was never there.?

I know this is going to be impossible, but do you like to put “special characters” in the labels that go missing? Is everything straight text, or do you use oddball characters like ?/][#%!_? Do you remember the exact text of a label that doesn’t “stick?”

Koz

I use only the following character set: letters, numbers, spaces, apostrophes, colons, periods, dashes, question marks, parens, curly braces, angle brackets, asterisks). I don’t remember the exact text of any of the labels that didn’t stick, but the main thing they seem to have in common is that they’re all longer than the others.

The labels disappear entirely, like they were never there. They disappear when I’m not looking.

I open the txt files in notepad++, do a few regular expression search and replaces to change the formatting, then import them into excel for my analysis.

I’m not saying that editing the txt file affects the aup file - I’m saying that I notice the problem while I’m editing the txt file. I don’t think the labels disappear while the project is open but I’m not sure, because they never disappear while I’m looking at them. They seem to disappear in between my adding them and my exporting them to txt, while I’m not looking. Maybe it’s when I save, maybe it’s when I export them, maybe it’s if I close the program before I export, maybe it happens while the program is still open. I’m not sure.

Oh, something else: I just realized that while my desktop has Audacity 2.0.4 installed, my lab computer and laptop both have 2.0.5. I don’t think it’s a version interaction problem, though, because I’ve had the problem occur even when I did all the work for a file on one computer and never opened it on the other.

Also, while my personal computers are both PCs, the lab computer is a Mac. I never actually interact with the files on the lab computer, though; it’s just one place the Dropbox syncs to. I figured I’d mention in case someone thinks of a way that it might be relevant.

Argh, I wish it were possible to edit my previous posts! “I don’t use the following characterset” should read “I use only the following characterset” (I started out saying I don’t use any weird characters, then changed to specify precisely which characters I do use, and must have missed the don’t.)

Then do it on a copy of the file. If you have a special program that is doing unwanted file synchronisations on unrelated files you may be able to fool it that way.

Does that mean it happens before you export them to text (which contradicts what you said) or after you export to text?

If you think there is an Audacity bug, please give exact steps to reproduce it 1, 2, 3, 4… . Please be completely specific, even if it takes a lot of words.


Gale

When you become an “established” user with more approved posts, you will be able to edit.

The following characters are illegal in file and folder names on Windows:

   /  :  *  ?  "  <  >  |

Gale

Colon is not legal for file and folder names on Mac.

Are you using Notepad++ on a Mac running Windows, if so how are you running Windows?


Gale

I’m not sure how it contradicts what I said, but I guess I’m not being clear enough. Here’s my an attempt to list what I did, so that maybe it could be reproduced.

  1. Use Audacity to record approximately an hour of two people conversing.
  2. Save the project to a Dropbox folder that syncs with other computers.
  3. Use labels to transcribe each turn of the conversation (turns being continuous sections of speech - they are broken up by a speaker pausing longer than one second or by a change in speaker).
  4. Save the file repeatedly as you go (but do not close it).
  5. Once you reach the end of the file, export the labels to text.
  6. Save and close the Audacity project.
  7. Edit the txt file and save the edited version under a different name in a different subfolder in the synced Dropbox folder.
  8. Import the edited txt file into Excel, run some analyses, and save that as an xlsx file.
  9. Somewhere between steps 5 and 8, notice from the txt or the Excel file that several turns you remember labeling have gone missing. (Or don’t notice, but worry that some may have gone missing without you noticing.)
  10. Reopen the Audacity project.
  11. See that indeed some labels have disappeared, like you never labeled them in the first place.
  12. Retranscribe the missing turns.
  13. Export labels to a new txt file.
  14. Save and close the Audacity project.
  15. While editing the txt file, notice that some new turns are missing, which were not missing last time.
  16. Reopen the Audacity project.
  17. Import the old labels from the first exported txt file into a second label track, alongside the most recent label track.
  18. Compare both label tracks to the audio and notice that each of the label tracks have gaps, and the gaps are in different places.
  19. Fill in the gaps on one of the label tracks.
  20. Export the filled in label track to yet another txt file.
  21. Edit that txt file and don’t notice any errors.
  22. Import that txt file to Excel and run analysis on it, hoping that you haven’t missed any new gaps.

Probably some of these steps are irrelevant, but this is a fairly typical example of how my workflow on these files has gone.

I am using notepad++ and Audacity on my Windows machines. I don’t interact with the files on a Mac at all, but they DO get synced to a Mac via Dropbox. I don’t think that that’s causing the problem, but I figured I should mention just in case.

The characterset I listed above is used only within the label text, not in the filenames. The filenames only use letters, numbers, and underscores. There is either a colon or an angle bracket in every label, so that doesn’t correspond to missing labels.

What happens if you don’t do that? Forget about syncing files. Do your steps, but save the project and export the label track to a local file that other applications don’t have access to?

If the steps then don’t reproduce, could it be a bug in your sync set up?


Gale

What was the exact filename of a show that has failures? “MyLongVoiceFile.aup?”

Koz

Our shop had to be able to transfer audio and video work to all three computer types in all time zones using all available transfer tools, completely ad-lib. We adopted the safe characters upper and lower case letters, numbers, underscore and whichever dash mark is safe. We also gave up spaces in filenames. Most if not all the file transfer problems vanished, and most important, the calls stopped.

Koz

I’ll try it without the sync setup and see if I encounter problems.

The filenames follow the paradigm of eg “s_5.aup” where s stands for “subject” and the number corresponds to the experimental number. Every file (s_5 through s_16; 1-4 were discarded) has had this issue.