UTF8 bug

Machine-specific discussion
Unix, Linux, OS X, OS/2, Windows, ..?
Locked
ptroev
Posts: 6
Joined: Sun Nov 16, 2008 2:47 am

UTF8 bug

Post by ptroev »

Non-latin utf8 symbol 0xd098 (0x98d0) is translated into 0xd03f (0x3fd0)
when saving a file in newlisp-edit , also any regular expression throws an exception on that symbol,
maybe this is java issue, any suggestions?

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

On Mac OS X I see Chinese (or Korean?) characters. On Ubuntu Linux and Windows XP the characters are displayed as a box, but they save and reload without change on all 3 platforms. I tried all characters from your post.

On Windows and Linux I also tried other UTF8 characters (Greek) which display fine and I don't see any change when saving and reloading.

Perhaps and error specific to your locale? What language is your Windows version and Java localized too?

Also, make sure you have replaced newlisp.exe with an UTF8 enabled executable from here: http://www.newlisp.org/downloads/UTF-8_win32/

ptroev
Posts: 6
Joined: Sun Nov 16, 2008 2:47 am

re

Post by ptroev »

that's strange, because i have same issue on 2 computers.. (jre 1.6.0_05 and 1.5.x)
newLISP v.10.0.1 on Win32 IPv4 UTF-8

that letter is a cyrillic symbol "È" (mirrored N) in utf8,
and is only one that is saved and regexed incorrectly,
during editing and saving it looks ok, but when file is closed and reopened
it shows like square and '?'.

maybe this is regex issue, when highlighting

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

'regex' is fine for me on all three platforms. It must be a problem with the localization of either Windows or Java in your language, or just the font. If I understand you correctly its just one character showing these problems? If your language is Russian, there are several other users on this forum, who may be able to comment.


ps: using Java 1.5 on Mac OS X and Java 1.6 on Windows

ptroev
Posts: 6
Joined: Sun Nov 16, 2008 2:47 am

re

Post by ptroev »

yes, just 1 character, i'm puzzled.

i don't think this is localization (cp1251) or font problem, that char works in other apps,
and it's saved correctly in windows notepad in utf8

thanks any way,
i'll try to contact those people, maybe there's workaround

btw, i was mistaken, nothing is wrong with regex, checked it, works with utf8 text files in russian,
it fails only if that symbols is in .lsp source, inside [text][/text]

ps: that's funny, according to google, there is something special with that symbol and utf8.
so maybe this is a problem with java textarea and/or base64 + utf8

Locked