Unicode issue with Nyquist

I’m sorry to tell but we have an serious unicode problem. If I type:

Line 1: (print "äöü")

then in the Audacity text output window I get:

äöü

and in the “Debug” window I get:

Input Expression:
(print "äöü")         

"303244303266303274"

The problem is that the Audacity GUI is unicode-aware, while XLISP is limited to single-byte ASCII characters. If I type a two-byte unicode character in the Audacity GUI then XLISP will read it as two single-byte characters, producing nonsense-letters.

Roger has already answered on the audacity-nyquist mailing list, but I’m afraid that we need to implement a string parser on the Audacity C/C++ level, to display an error-message window ot something as soon as the user types non-ASCII letters. But I still hope that a better solution can be found.

Grmblfx … - edgar

I’ve moved this to it’s own thread as I presume that this is a potential issue for all Nyquist effects and not specific to the Nyquist Prompt / Nyquist Generate Prompt.

Would this be classed as a “bug”?

I’m not sure that the Audacity developers would class it as an “Audacity bug” if it is due to a lack of Unicode support in Xlisp/Nyquist, but it is certainly an important issue if we are looking to resolve paths on Windows with a localized “UserProfile”. Also a potential issue for any Nyquist plug-in that uses text widgets.

The same thing happens if you try to create labels with Unicode characters
for example:

(list (list 1 2 "äöü"))

produces a label containing the text:

äöü

At least not necessarily an Audacity bug. The bug is that ASCII software was included in an Unicode project, where I’m not even sure if around 1999 or 2000 Audacity had Unicode support at all (probably not).

A typical Edgar answer would read: “Everythig would be much easier if Unicode had been invented first, and ASCII then.”

I know, in Germany it’s even worse. The scrambled labels are stored in the Audacity “.aup” project file, corrupting the file structure in some cases, so a project then cannot be opened again until the “.aup” file had been corrected by a text editor.

I just have written a more extensive explanation on the Audacity Nyquist list. I hope we will find a solution.

  • edgar

Just tested with the CMU Nyquist 3.03 Java IDE, exactly the same problem:

> (print "äöü")
"303244303266303274"
"303244303266303274"

Like I have already written on the audacity-nyquist list, I never had problems with using ASCII-only filenames in the past, I even never came to the idea to use non-ASCII letters in filenames with Nyquist because I knew that XLISP can only handle ASCII 0-127, that’s why I never noticed this bug before.

But a naive Audacity user cannot know that Nyquist can’t handle unicode characters, so we need to find some way how Nyquist can be prevented from reading or writing nonsense-characters. I will take a closer look to the Audacity C/C++ code later on.

  • edgar

Notes about parsing Nyquist code with Audacity:

IMO plugin header lines shold be allowed to contain unicode characters in “info” texts. Since these lines are parsed by the Audacity Nyquist interface only and never reach the Nyquist interpreter, this should be not a real problem. The text in the Audacity Nyquist effect window currently cannot be localized via gettext, but this may happen at any time in the future.

In the Nyquist/XLISP/SAL code part of a “.ny” plugin file only ASCII 0-127 characters are allowed. Since it cannot be guaranteed thet a naive Nyquist programmer types some non-allowed characters in the plugin code, the Audacity Nyquist interface must make sure during parsing of the plugin file that only single-byte ASCII 0-127 characters are given to the Nyquist interpreter, otherwise multibyte unicode characters will get scrambled.

Two suggestions:

  • Display an error message with non-ASCII characters (annoying for the casual user)

Maybe better:

  • Convert during parsing of the plugin file in the Audacity Nyquist interface all unicode characters above ASCII 127 to a “?” (question mark) or a similar placeholder, so in case of a Nyquist error the ?-ified code will appear in the Audacity text output or debug window to make the Nyquist programmer know that something went wrong with the code while simultaneosly show the places where wrong characters had been typed.

I already have found the lines in the Audacity C/C++ code where the parsing of the plugin file happens, but still must find a way on the C/C++ or wxWidgets level how to change characters above ASCII 127 to ?-characters to test if this works at all.

What do others think of this idea?

  • edgar

To keep the problem in perspective, I’ve not seen any user problems posted to this forum regarding Unicode characters in Nyquist, though I don’t monitor the Russian section other than for cleaning out spam.

Lol.

I think that may be better as the “casual user” is probably not likely to check the debug window.
So displaying an error message will allow the user to seek out alternatives rather than assume that the plug-in is broken.

Yes, but only because the Audacity Nyquist text input widget never worked right so far.

The problem with the “error message” version is that all Nyquist plugins are parsed at Audacity start-up to determine the entries for the “Effect” menu. With errors in plugin code (the .ny files) the user gets overloaded with errors at Audacity start-time.

But I think a different case is when a user types in non-ASCII characters in a text-input widget of a Nyquist effect. In this case the user must be informed as soon as possible.

Preliminary conclusion: Errors in .ny files are different to user input errors, only the “scrambled characters” effect is the same. A Nyquist programmer should know hoe to use the debug window, but a casual user should be saved from tracking errors via debug tools.

Also I still see no way how to read or write a file with Nyquist where the file- or a directory name contains german umlauts.

  • edgar

The current svn Audacity outputs error messages if plug-ins contain malformed ;control lines

Example

;nyquist plug-in
;version 1
;type generate
;name "Bad..."
;action "Evaluating..."
;control bad "This is wrong" poo "" 0 0 100



Agreed.

Agreed, which is why I think it better that invalid input (for whatever reason) should output an error message that is visible to the user rather than (only) an error message in the debug window.

Unfortunate :cry:
Probably the best we can do about that is to document it.

I’m adding a link to the thread in the Nyquist mailing list. It seems the issue may be more in Audacity-Nyquist than Nyquist itself.



Gale