Error: reference to invalid character number at line 6

Help for Audacity 2.x.x on Windows.

ImageThis forum is for Audacity 2.x.x on Windows.

  • Please state which version of Windows you are using, the exact three-section version of Audacity from "Help menu > About Audacity".

  • Audacity 1.2.x and 1.3.x are obsolete and no longer supported. If you still have those versions, please upgrade at https://www.audacityteam.org/download/.
    The old forums for those versions are now closed, but you can still read the archives of the 1.2.x and 1.3.x forums.

Re: Error: reference to invalid character number at line 6

Permanent link to this post Posted by Yarn366 » Thu Aug 17, 2017 3:58 am

In case any devs see this, I posted a message about this and my proposed fix to audacity-devel about a month and a half ago here: https://sourceforge.net/p/audacity/mail ... /35931978/. I'm still waiting for a response.
Yarn366
 
Posts: 12
Joined: Sat May 20, 2017 11:31 pm
Operating System: Windows 7

Re: Error: reference to invalid character number at line 6

Permanent link to this post Posted by steve » Thu Aug 17, 2017 8:46 am

Yarn366 wrote:In case any devs see this, I posted a message about this and my proposed fix to audacity-devel about a month and a half ago here: https://sourceforge.net/p/audacity/mail ... /35931978/. I'm still waiting for a response.

There are some responses to your pull request: https://github.com/audacity/audacity/pull/197
9/10 questions are answered in the FREQUENTLY ASKED QUESTIONS (FAQ)
steve
Site Admin
 
Posts: 45092
Joined: Sat Dec 01, 2007 11:43 am
Operating System: Linux *buntu

Re: Error: reference to invalid character number at line 6

Permanent link to this post Posted by Yarn366 » Thu Aug 17, 2017 4:55 pm

steve wrote:There are some responses to your pull request: https://github.com/audacity/audacity/pull/197


Only one of those responses (not counting the "milestone") isn't mine, and I already explained in the post after that why the suggestion in that post would be an unnecessary complication. And it was after the fake milestone was applied to my pull request that I posted to audacity-devel, and apparently nobody noticed that post.
Yarn366
 
Posts: 12
Joined: Sat May 20, 2017 11:31 pm
Operating System: Windows 7

Re: Error: reference to invalid character number at line 6

Permanent link to this post Posted by steve » Mon Aug 21, 2017 12:03 pm

Yarn366 wrote:Only one of those responses (not counting the "milestone") isn't mine, and I already explained in the post after that why the suggestion in that post would be an unnecessary complication. And it was after the fake milestone was applied to my pull request that I posted to audacity-devel, and apparently nobody noticed that post.


There seems to be confusion from several angles.

One is that although this bug has been known about for a long time, no-one can find it logged on our bug tracker, and our main bugzilla guy is no longer with us (sadly he died a few weeks ago). So the first thing that we need to do is to get this properly logged on bugzilla. Are you able to provide a small Audacity project to demonstrate the problem? (I'm on Linux so I've never seen this bug first hand).

Secondly there appears to be some confusion about whether your current pull request is intended to be a full fix for the problem, or whether there remains a "much deeper problem" as your Git comment of June 9th suggests. Could you clarify that?
9/10 questions are answered in the FREQUENTLY ASKED QUESTIONS (FAQ)
steve
Site Admin
 
Posts: 45092
Joined: Sat Dec 01, 2007 11:43 am
Operating System: Linux *buntu

Re: Error: reference to invalid character number at line 6

Permanent link to this post Posted by waxcylinder » Mon Aug 21, 2017 5:18 pm

steve wrote:
Yarn366 wrote:Only one of those responses (not counting the "milestone") isn't mine, and I already explained in the post after that why the suggestion in that post would be an unnecessary complication. And it was after the fake milestone was applied to my pull request that I posted to audacity-devel, and apparently nobody noticed that post.


There seems to be confusion from several angles.

One is that although this bug has been known about for a long time, no-one can find it logged on our bug tracker

I don't believe it ever got logged as a bug ...

We do hwever have a long-standing (very long) FAQ about it in the Audacity Manual:
http://manual.audacityteam.org/man/faq_errors.html#nwf

Peter.
________________________________________FOR INSTANT HELP: (Click on Link below)
* * * * * FAQ * * * * * Tutorials * * * * * Audacity Manual * * * * * Audacity Wiki * * * * *
waxcylinder
Forum Staff
 
Posts: 9050
Joined: Tue Jul 31, 2007 11:03 am
Location: Manchester, UK
Operating System: Windows 10

Re: Error: reference to invalid character number at line 6

Permanent link to this post Posted by Yarn366 » Sat Sep 23, 2017 7:51 pm

Sorry for taking so long to reply.

steve wrote:One is that although this bug has been known about for a long time, no-one can find it logged on our bug tracker, and our main bugzilla guy is no longer with us (sadly he died a few weeks ago). So the first thing that we need to do is to get this properly logged on bugzilla. Are you able to provide a small Audacity project to demonstrate the problem? (I'm on Linux so I've never seen this bug first hand).

Test project is attached. The project file is valid and loads fine on any platform. Its title contains "🎧", which represents a supplementary character (the headphone emoji). (I could have encoded the character directly and the file still would have been fine, but I chose to use the escape sequence.) However, if you resave it with the Windows version of Audacity, that character becomes "��", which is invalid in XML and won't load in any version of Audacity. (To clarify, the problem is with saving supplementary characters as "&#xd###;&#xd###;", not with failing to load files that contain characters encoded in that manner.)

steve wrote:Secondly there appears to be some confusion about whether your current pull request is intended to be a full fix for the problem, or whether there remains a "much deeper problem" as your Git comment of June 9th suggests. Could you clarify that?

The fix that I provided should be enough to fix the problem.

The first change that I described my "much deeper problem" post would make unnecessary to check the size of wxUChar, but it would likely require major changes to Audacity; thankfully, it's not really necessary for fixing this problem. (And wxString appears to use a 2-byte character type on Windows anyway, so that change probably wouldn't do much good unless the string type is also changed.)
Attachments
supplementary_char.zip
(671 Bytes) Downloaded 8 times
Yarn366
 
Posts: 12
Joined: Sat May 20, 2017 11:31 pm
Operating System: Windows 7

Re: Error: reference to invalid character number at line 6

Permanent link to this post Posted by steve » Wed Sep 27, 2017 4:46 pm

Yarn366 wrote:Sorry for taking so long to reply.

No problem, I've only recently returned from my vacation ;)

Thanks for the test project. I had some time today to test it on Windows 10, and test your proposed fix.
I've certainly got enough information about the problem now to log it as a bug, and your work gives a good lead-in to understanding the problem.

I don't personally have in-depth knowledge about wxWidgets XML / Unicode handing, so I don't know that your fix is the "right" way to fix it.

I can see how your fix prevents the problem from occurring, but I have a niggling feeling that there should be a better way to fix this. In particular, I don't understand why or how surrogate pairs are being created. If I'm reading your code correctly, your fix handles the surrogate pairs when they occur, but why / where / how do they occur in the first place? I thought that UTF-16 encoding should only happen when conversion to UTF-16 is explicitly called. :?
9/10 questions are answered in the FREQUENTLY ASKED QUESTIONS (FAQ)
steve
Site Admin
 
Posts: 45092
Joined: Sat Dec 01, 2007 11:43 am
Operating System: Linux *buntu

Re: Error: reference to invalid character number at line 6

Permanent link to this post Posted by steve » Wed Sep 27, 2017 5:20 pm

9/10 questions are answered in the FREQUENTLY ASKED QUESTIONS (FAQ)
steve
Site Admin
 
Posts: 45092
Joined: Sat Dec 01, 2007 11:43 am
Operating System: Linux *buntu

Re: Error: reference to invalid character number at line 6

Permanent link to this post Posted by Yarn366 » Fri Sep 29, 2017 3:44 am

I found an article in the official wxWidgets 3.0.2 documentation describing how wxString works:

http://docs.wxwidgets.org/3.0.2/overview_string.html#overview_string_internal

The section that concerns this issue is "Internal wxString Encoding," which I suggest reading thoroughly. Here are the important bits that I gathered from there:

  • wxString can store strings in UTF-8, UTF-16, or UTF-32 encoding, depending on the platform and compile-time flags.
  • By default, wxString uses UTF-16 under Windows, and either UTF-8 or UTF-32 under Linux and macOS (the article is a bit conflicting here, although it appears to suggest UTF-8 more strongly).
  • When wxString uses UTF-8 encoding, it indexes code points rather than code units (bytes in the case of UTF-8). It also handles encoding and decoding of multi-byte sequences automatically. This means that programs don't have to do anything special in this case; they can just treat each unit as being one character.
  • This is the most important part: When wxString uses UTF-16 encoding, it indexes code units, not code points, and it does absolutely nothing to handle surrogate pairs. This means that programs need to implement this handling themselves, at least when interfacing directly with wxString.

All of this means that Audacity's XML-escape function still needs to handle surrogate pairs (unless, of course, I'm missing something important).
Yarn366
 
Posts: 12
Joined: Sat May 20, 2017 11:31 pm
Operating System: Windows 7

Re: Error: reference to invalid character number at line 6

Permanent link to this post Posted by steve » Fri Sep 29, 2017 9:05 am

Yarn366 wrote:I found an article in the official wxWidgets 3.0.2 documentation describing how wxString works:
http://docs.wxwidgets.org/3.0.2/overvie ... g_internal

Excellent. That clearly answers my question about why the UTF-16 encoding is happening.

The bit that grabbed me was (emphasis mine):
Thus when iterating over a UTF-16 string stored in a wxString under Windows, the user code has to take care of surrogate pairs himself.

Which, if I understand correctly, is what your patch does.

It would appear that an alternative solution would be to build WxWidgets on Windows with wxUSE_UNICODE_UTF8=1 so that UTF-8 encoding is used on all platforms.
Given Audacity's dependence on XML, and that (presumably) we want to allow all and any printable characters in all and any language, perhaps this would be a better solution (?)
The possible downside that I notice in that documentation is a performance hit when Iterating wxString Characters. Is that likely to be a significant issue for Audacity?

Have you tried building wxWidgets with wxUSE_UNICODE_UTF8=1 ?
I'm not likely to have time to try that 'till next week, but I think it would be worth testing - if nothing else, it could confirm that the problem is what we think it is.
9/10 questions are answered in the FREQUENTLY ASKED QUESTIONS (FAQ)
steve
Site Admin
 
Posts: 45092
Joined: Sat Dec 01, 2007 11:43 am
Operating System: Linux *buntu

PreviousNext

Return to Windows



Who is online

Users browsing this forum: Bing [Bot] and 7 guests