Error: reference to invalid character number at line 6
Forum rules
This forum is for Audacity on Windows.
Please state which version of Windows you are using,
and the exact three-section version number of Audacity from "Help menu > About Audacity".
Audacity 1.2.x and 1.3.x are obsolete and no longer supported. If you still have those versions, please upgrade at https://www.audacityteam.org/download/.
The old forums for those versions are now closed, but you can still read the archives of the 1.2.x and 1.3.x forums.
Please state which version of Windows you are using,
and the exact three-section version number of Audacity from "Help menu > About Audacity".
Audacity 1.2.x and 1.3.x are obsolete and no longer supported. If you still have those versions, please upgrade at https://www.audacityteam.org/download/.
The old forums for those versions are now closed, but you can still read the archives of the 1.2.x and 1.3.x forums.
Re: Error: reference to invalid character number at line 6
In case any devs see this, I posted a message about this and my proposed fix to audacity-devel about a month and a half ago here: https://sourceforge.net/p/audacity/mail ... /35931978/. I'm still waiting for a response.
Re: Error: reference to invalid character number at line 6
There are some responses to your pull request: https://github.com/audacity/audacity/pull/197Yarn366 wrote:In case any devs see this, I posted a message about this and my proposed fix to audacity-devel about a month and a half ago here: https://sourceforge.net/p/audacity/mail ... /35931978/. I'm still waiting for a response.
9/10 questions are answered in the FREQUENTLY ASKED QUESTIONS (FAQ)
Re: Error: reference to invalid character number at line 6
Only one of those responses (not counting the "milestone") isn't mine, and I already explained in the post after that why the suggestion in that post would be an unnecessary complication. And it was after the fake milestone was applied to my pull request that I posted to audacity-devel, and apparently nobody noticed that post.steve wrote: There are some responses to your pull request: https://github.com/audacity/audacity/pull/197
Re: Error: reference to invalid character number at line 6
There seems to be confusion from several angles.Yarn366 wrote:Only one of those responses (not counting the "milestone") isn't mine, and I already explained in the post after that why the suggestion in that post would be an unnecessary complication. And it was after the fake milestone was applied to my pull request that I posted to audacity-devel, and apparently nobody noticed that post.
One is that although this bug has been known about for a long time, no-one can find it logged on our bug tracker, and our main bugzilla guy is no longer with us (sadly he died a few weeks ago). So the first thing that we need to do is to get this properly logged on bugzilla. Are you able to provide a small Audacity project to demonstrate the problem? (I'm on Linux so I've never seen this bug first hand).
Secondly there appears to be some confusion about whether your current pull request is intended to be a full fix for the problem, or whether there remains a "much deeper problem" as your Git comment of June 9th suggests. Could you clarify that?
9/10 questions are answered in the FREQUENTLY ASKED QUESTIONS (FAQ)
-
waxcylinder
- Forum Staff
- Posts: 14572
- Joined: Tue Jul 31, 2007 11:03 am
- Operating System: Windows 10
Re: Error: reference to invalid character number at line 6
I don't believe it ever got logged as a bug ...steve wrote:There seems to be confusion from several angles.Yarn366 wrote:Only one of those responses (not counting the "milestone") isn't mine, and I already explained in the post after that why the suggestion in that post would be an unnecessary complication. And it was after the fake milestone was applied to my pull request that I posted to audacity-devel, and apparently nobody noticed that post.
One is that although this bug has been known about for a long time, no-one can find it logged on our bug tracker
We do hwever have a long-standing (very long) FAQ about it in the Audacity Manual:
http://manual.audacityteam.org/man/faq_errors.html#nwf
Peter.
________________________________________FOR INSTANT HELP: (Click on Link below)
* * * * * FAQ * * * * * Tutorials * * * * * Audacity Manual * * * * *
* * * * * FAQ * * * * * Tutorials * * * * * Audacity Manual * * * * *
Re: Error: reference to invalid character number at line 6
Sorry for taking so long to reply.
The first change that I described my "much deeper problem" post would make unnecessary to check the size of wxUChar, but it would likely require major changes to Audacity; thankfully, it's not really necessary for fixing this problem. (And wxString appears to use a 2-byte character type on Windows anyway, so that change probably wouldn't do much good unless the string type is also changed.)
Test project is attached. The project file is valid and loads fine on any platform. Its title contains "🎧", which represents a supplementary character (the headphone emoji). (I could have encoded the character directly and the file still would have been fine, but I chose to use the escape sequence.) However, if you resave it with the Windows version of Audacity, that character becomes "��", which is invalid in XML and won't load in any version of Audacity. (To clarify, the problem is with saving supplementary characters as "
###;
###;", not with failing to load files that contain characters encoded in that manner.)steve wrote: One is that although this bug has been known about for a long time, no-one can find it logged on our bug tracker, and our main bugzilla guy is no longer with us (sadly he died a few weeks ago). So the first thing that we need to do is to get this properly logged on bugzilla. Are you able to provide a small Audacity project to demonstrate the problem? (I'm on Linux so I've never seen this bug first hand).
The fix that I provided should be enough to fix the problem.steve wrote: Secondly there appears to be some confusion about whether your current pull request is intended to be a full fix for the problem, or whether there remains a "much deeper problem" as your Git comment of June 9th suggests. Could you clarify that?
The first change that I described my "much deeper problem" post would make unnecessary to check the size of wxUChar, but it would likely require major changes to Audacity; thankfully, it's not really necessary for fixing this problem. (And wxString appears to use a 2-byte character type on Windows anyway, so that change probably wouldn't do much good unless the string type is also changed.)
- Attachments
-
- supplementary_char.zip
- (671 Bytes) Downloaded 30 times
Re: Error: reference to invalid character number at line 6
No problem, I've only recently returned from my vacationYarn366 wrote:Sorry for taking so long to reply.
Thanks for the test project. I had some time today to test it on Windows 10, and test your proposed fix.
I've certainly got enough information about the problem now to log it as a bug, and your work gives a good lead-in to understanding the problem.
I don't personally have in-depth knowledge about wxWidgets XML / Unicode handing, so I don't know that your fix is the "right" way to fix it.
I can see how your fix prevents the problem from occurring, but I have a niggling feeling that there should be a better way to fix this. In particular, I don't understand why or how surrogate pairs are being created. If I'm reading your code correctly, your fix handles the surrogate pairs when they occur, but why / where / how do they occur in the first place? I thought that UTF-16 encoding should only happen when conversion to UTF-16 is explicitly called.
9/10 questions are answered in the FREQUENTLY ASKED QUESTIONS (FAQ)
Re: Error: reference to invalid character number at line 6
Bug logged here: http://bugzilla.audacityteam.org/show_bug.cgi?id=1752
9/10 questions are answered in the FREQUENTLY ASKED QUESTIONS (FAQ)
Re: Error: reference to invalid character number at line 6
I found an article in the official wxWidgets 3.0.2 documentation describing how wxString works:
http://docs.wxwidgets.org/3.0.2/overvie ... g_internal
The section that concerns this issue is "Internal wxString Encoding," which I suggest reading thoroughly. Here are the important bits that I gathered from there:
http://docs.wxwidgets.org/3.0.2/overvie ... g_internal
The section that concerns this issue is "Internal wxString Encoding," which I suggest reading thoroughly. Here are the important bits that I gathered from there:
- wxString can store strings in UTF-8, UTF-16, or UTF-32 encoding, depending on the platform and compile-time flags.
- By default, wxString uses UTF-16 under Windows, and either UTF-8 or UTF-32 under Linux and macOS (the article is a bit conflicting here, although it appears to suggest UTF-8 more strongly).
- When wxString uses UTF-8 encoding, it indexes code points rather than code units (bytes in the case of UTF-8). It also handles encoding and decoding of multi-byte sequences automatically. This means that programs don't have to do anything special in this case; they can just treat each unit as being one character.
- This is the most important part: When wxString uses UTF-16 encoding, it indexes code units, not code points, and it does absolutely nothing to handle surrogate pairs. This means that programs need to implement this handling themselves, at least when interfacing directly with wxString.
Re: Error: reference to invalid character number at line 6
Excellent. That clearly answers my question about why the UTF-16 encoding is happening.Yarn366 wrote:I found an article in the official wxWidgets 3.0.2 documentation describing how wxString works:
http://docs.wxwidgets.org/3.0.2/overvie ... g_internal
The bit that grabbed me was (emphasis mine):
Which, if I understand correctly, is what your patch does.Thus when iterating over a UTF-16 string stored in a wxString under Windows, the user code has to take care of surrogate pairs himself.
It would appear that an alternative solution would be to build WxWidgets on Windows with wxUSE_UNICODE_UTF8=1 so that UTF-8 encoding is used on all platforms.
Given Audacity's dependence on XML, and that (presumably) we want to allow all and any printable characters in all and any language, perhaps this would be a better solution (?)
The possible downside that I notice in that documentation is a performance hit when Iterating wxString Characters. Is that likely to be a significant issue for Audacity?
Have you tried building wxWidgets with wxUSE_UNICODE_UTF8=1 ?
I'm not likely to have time to try that 'till next week, but I think it would be worth testing - if nothing else, it could confirm that the problem is what we think it is.
9/10 questions are answered in the FREQUENTLY ASKED QUESTIONS (FAQ)