Swedish characters

When I export a file with Swedish characters ”å”, ”ä” or ”ö” are they changed in the file’s metadata to ”a”, ”a”, and ”o”, for example, the file name “Över ån.wav” is changed to “Over an.wav”. The file name remains correct. The same applies to Artist and Album. Named artist “Göran Åhman” changes to “Goran Ahman”.
I have entered the Swedish language in references. I have Windows 8 and Audacity 2.0.5. Can I do any other setting?

What audio format are you exporting to? In what application are you viewing the metadata after export? Is that application Unicode aware?

Audacity and Windows Explorer (on Windows 7) see Artist Name of “Göran Åhman” in an exported MP3, so I see no Audacity bug here.


Gale

Hi Gale!
I export to WAV format. If I look at the file’s properties in Windows or right-click on the file and look at the file properties, the file name is correctly while Title, Album, Artist, is incorrect. The same applies to Windows Media Player. Whether these are Unicode aware, I cannot answer.

I have not tested to export in other formats than WAV.

In a FAQ: “Why do my exported files not include all the metadata…” the answer is: “Metadata is well supported by many audio formats… but less supported in WAV”. Do I have the answer here?

I have ripped all my CDs with Windows Media Player in WAV format with no problems.

Audacity 2.0.5 does export all seven of its default metadata tags for WAV, but it is still the case that not all players will see these tags. iTunes does not accept WAV metadata at all.

For WAV, 2.0.5 exports each tag in both ID3 format and LIST INFO format. Windows only reads the LIST INFO version. The problem you’ve found is that when Audacity writes the LIST INFO version, extended ASCII characters (like an accented “A”) are replaced with unaccented characters. Also it seems that if you enter Unicode characters such as Chinese characters in a tag in Metadata Editor, Audacity does not write the LIST INFO tag at all.

The LIST INFO format definitely should accept extended ASCII characters because I can add them in a hex editor and Windows sees them. I don’t know whether the Audacity problem is a limitation in the tag writing library we use, or if it is a mistake in the way the feature is coded. I will try to find out.

Meantime the ID3 tags that Audacity writes for WAV retain correct characters, so you could use Foobar 2000 or dBpoweramp instead to view ID3 tags in WAV files. Foobar2000 only reads ID3 in WAV, but dBpoweramp reads ID3 and LIST INFO in WAV and has shell integration with Windows. This means it will show you the correct characters in a popup over the file in Explorer and in a special tab it adds to the Windows Properties sheet for files.


Gale

Hello!
Thanks for the very good and worthwhile information.

I downloaded the Foobar 2000, and it works great, but I would prefer to use Windows Media Player.

I can use Windows Media Player to change to Swedish characters and Windows Media Player “remembers” that, but the change is, however, not to the file’s properties.

Regards
Kurt Lindström, Gällivare, Sweden.

To update about this, it appears the problem is a limitation in the “LIST” chunk specification for string encoding as drawn up originally by IBM/Microsoft.

  • When no country code is present in the tags, only 7-bit ASCII is allowed. This essentially limits us to a choice of 128 characters without accents. However most European accented characters will be converted by widgets to the same letter of the alphabet without accents, rather than the tag just being removed.
  • To use 8-bit characters, a country code must be present, located inside an extra “CSET” (Character Set) chunk.
  • Unicode characters UTF8 and UTF16 are not supported, so won’t appear in the LIST chunk. Unicode is supported in ID3v2.

Audacity does not support the “CSET” chunk. It could be that using CSET would cause audio players that don’t understand CSET to not see LIST tags at all.

Joel (who works on metadata issues for us when he has time) has found that if we use 8-bit encoding for LIST INFO, it might produce characters correctly in Explorer without using CSET, but only if the ID3v2 tags are also written (which we would do). If we do not write ID3v2, Explorer will not see the 8-bit LIST tags without CSET.

This might be a reasonable solution assuming Explorer is falling back to read ID3v2 for WAV, but if an application solely relies on LIST INFO for WAV, it would probably mean the tags would not be seen even if they were only a to z unaccented characters.

So Joel will play around with some more options but there is no immediate solution except to move away from reading the tags with Windows.


Gale