Not-well-formed (invalid token) errors with .aup files

This section is now closed.
Forum rules
This forum is now closed.

For help with current Audacity, please post to the 2.x. board for your operating system.

Please post feedback about the current 2.x version on the 2.x.feedback board.
Locked
edgar-rft
Posts: 347
Joined: Sun Jan 20, 2008 12:03 am
Operating System: Please select

Not-well-formed (invalid token) errors with .aup files

Post by edgar-rft » Sat May 07, 2011 10:25 am

With Audacity_1.3.12 and 1.3.13 beta in the german Audacity support forum I have an increasing number of complaints that Audacity cannot read .aup project files that Audacity itself has written before without raising errors. The effect is that the complete project cannot be opened anymore.

Examples from the last week (text is in German language of course):
My main business meanwhile is fixing .aup-files of german Audacity users with a text editor. :(

The question is: why does Audacity not complain when writing the .aup-file with characters it then can't read anymore?

The problem seems to be on Windows only, but not 100% sure.

I have found nothing in the Audacity bug-tracker about this. Is there anything else known?

Thanks,

- edgar

steve
Site Admin
Posts: 80693
Joined: Sat Dec 01, 2007 11:43 am
Operating System: Linux *buntu

Re: Not-well-formed (invalid token) errors with .aup files

Post by steve » Sat May 07, 2011 12:38 pm

There was a case that came up recently where an Audacity Project had been made using an ANSI build which would then not open on an (updated) Unicode build. The problem was that the AUP file contained extended ASCII characters (if I recall correctly it was an umlaut) that are not in the Unicode character set. Simply converting the character to the Unicode equivalent did not fix the problem as the file was still using ANSI character encoding. Opening the AUP file in GEdit (Linux) and changing the character encoding to Unicode fixed the problem.

In each of those files, the character encoding appears to be ANSI. GEdit is reporting it as ISO-8859 but I'm not sure if that is accurate, or just the closest character set that I have installed, but it's not Unicode. Re-saving them with UTF-8 encoding enables them to open in Audacity without am "invalid token" error.

I'm not sure if this is a "bug", but I think it should be in the release notes/known issues that the Unicode build of Audacity cannot open projects created with the ANSI build if the AUP file contains extended characters.
9/10 questions are answered in the FREQUENTLY ASKED QUESTIONS (FAQ)

edgar-rft
Posts: 347
Joined: Sun Jan 20, 2008 12:03 am
Operating System: Please select

Re: Not-well-formed (invalid token) errors with .aup files

Post by edgar-rft » Sat May 07, 2011 12:55 pm

The bug has nothing to do with Unicode vs. non-Unicode builds, its always one and the same Audacity build that cannot read its own project files any more.

It's correct that non-Unicode Audacity builds write ISO-8859 encoded .aup-files, what is strictly spoken wrong according to the XML specification, but non-Unicode builds just simply cannot write Unicode files.In this case it's a bug to use an Unicode-dependent format for the project files as long as not all supported operating systems use Unicode.

But the bugs from above happen with Unicode as well as with ISO-8859-1 encoded .aup-files.

steve
Site Admin
Posts: 80693
Joined: Sat Dec 01, 2007 11:43 am
Operating System: Linux *buntu

Re: Not-well-formed (invalid token) errors with .aup files

Post by steve » Sat May 07, 2011 1:05 pm

edgar-rft wrote:But the bugs from above happen with Unicode as well as with ISO-8859-1 encoded .aup-files.
When I downloaded the files, GEdit says that they are all ISO-8859 encoded. Why would a Unicode build of Audacity encode the AUP file that way, or is the encoding being changed due to uploading/downloading to/from the web server? Or are you saying that the Unicode build of Audacity, when running on an operating system that does not fully support Unicode, produces ANSI encoded AUP files that can then not be read by the same version of Audacity?
9/10 questions are answered in the FREQUENTLY ASKED QUESTIONS (FAQ)

edgar-rft
Posts: 347
Joined: Sun Jan 20, 2008 12:03 am
Operating System: Please select

Re: Not-well-formed (invalid token) errors with .aup files

Post by edgar-rft » Sat May 07, 2011 1:40 pm

I don't know what GEdit says but I know that Gedit is crap. GEdit can handle UTF-8 only and everything else is pretty much a mess. That's why I don't use it.

Emacs says:
There is no 100% way to determine a text file encoding, but the last two are definitely not ISO-8859. The "Japanese Shift JIS Unix" is probably a wild-ass-guess from Emacs because of the asian multibyte characters.

But you are right: From now on I will ask if they use e.g. non-Unicode builds on Unicode-capable Windows versions. That's a good idea, thanks.

Gale Andrews
Quality Assurance
Posts: 41761
Joined: Fri Jul 27, 2007 12:02 am
Operating System: Windows 10

Re: Not-well-formed (invalid token) errors with .aup files

Post by Gale Andrews » Sat May 07, 2011 7:01 pm

edgar-rft wrote:The bug has nothing to do with Unicode vs. non-Unicode builds, its always one and the same Audacity build that cannot read its own project files any more.

It's correct that non-Unicode Audacity builds write ISO-8859 encoded .aup-files, what is strictly spoken wrong according to the XML specification, but non-Unicode builds just simply cannot write Unicode files.In this case it's a bug to use an Unicode-dependent format for the project files as long as not all supported operating systems use Unicode.

But the bugs from above happen with Unicode as well as with ISO-8859-1 encoded .aup-files.
Edgar, I know there are reports like this, but if you believe the German Forum reports are legitimate (we cannot read German) please give steps to reproduce the issue.

Also see the topic Steve referred to:
http://forum.audacityteam.org/viewtopic ... 10#p137502

I cannot reproduce any issue on English Windows.

* If a project containing umlauts or East Asian characters is created in a Unicode build (including importing files with an umlaut in the name), it can be reopened with correct characters in Unicode Audacity. It can be reopened in ANSI Audacity with incorrect characters displayed.

* If a project containing umlauts or East Asian characters is created in an ANSI build (including importing files with an umlaut in the name), it can be reopened in Unicode Audacity with umlauts displayed correctly but East Asian characters displayed incorrectly. It can be reopened in ANSI Audacity with all characters displayed incorrectly.

If users are creating projects with other than Latin characters (including importing files with non-Latin file names or metadata) they should use Windows 2000 or later which supports Unicode and they should use the Audacity Unicode build meant for those versions of Windows. Anything else is user error.

Incidentally, how did German users manage with 1.2.6 and Windows 98?

If users are pasting characters directly into the .aup file (which is the only way I know to get an umlaut displayed as such in an ANSI-encoded .aup file) then it is user error.

We need to know more about how the German users are inputting umlauts into their project. Are they typing into the track name? With what input method? Are they importing files? Then we need an example file that creates the issue.

In any case, after 2.0 we do not expect to offer ANSI builds any longer.



Gale
________________________________________FOR INSTANT HELP: (Click on Link below)
* * * * * Tips * * * * * Tutorials * * * * * Quick Start Guide * * * * * Audacity Manual

edgar-rft
Posts: 347
Joined: Sun Jan 20, 2008 12:03 am
Operating System: Please select

Re: Not-well-formed (invalid token) errors with .aup files

Post by edgar-rft » Sat May 07, 2011 7:51 pm

A few hours later the next one, this time on a Mac:

[*]http://www.audacity-forum.de/post/21795 - error: "Reference to invalid character in line 9"

@Gale: I will try to produce a broken Audacity project with detailed descriptions and attach it here in a post, so you (or any other person) may test it on english Audacity versions please. I never had these problems on Debian Linux, so I first must find a Windows computer to produce a broken project.

Gale Andrews
Quality Assurance
Posts: 41761
Joined: Fri Jul 27, 2007 12:02 am
Operating System: Windows 10

Re: Not-well-formed (invalid token) errors with .aup files

Post by Gale Andrews » Sat May 07, 2011 8:36 pm

edgar-rft wrote:A few hours later the next one, this time on a Mac:

[*]http://www.audacity-forum.de/post/21795 - error: "Reference to invalid character in line 9"
If they are on Audacity 1.2 they will get that problem (also if the folder containing the project has accented characters).

Incidentally on Mac, "Über Audacity" is under the "Audacity" menu.
edgar-rft wrote:@Gale: I will try to produce a broken Audacity project with detailed descriptions and attach it here in a post, so you (or any other person) may test it on english Audacity versions please.
Thanks, that would be helpful.

Does the German Forum have a FAQ about this that stresses using Beta Audacity (Unicode)?



Gale
________________________________________FOR INSTANT HELP: (Click on Link below)
* * * * * Tips * * * * * Tutorials * * * * * Quick Start Guide * * * * * Audacity Manual

Locked