Upper-case, UTF-8 and Windows won`t work together

Machine-specific discussion
Unix, Linux, OS X, OS/2, Windows, ..?
Locked
Fritz
Posts: 66
Joined: Sun Sep 27, 2009 12:08 am
Location: Russia

Upper-case, UTF-8 and Windows won`t work together

Post by Fritz »

I have a russian Windows with native Win-1251 and DOS-CP866 encoding. For some strange reason "upper-case" operator want not work:

Code: Select all

(println (upper-case "абвгдеёжзийклмнопрстуфхцчшщъыьэюя"))
Result:

Code: Select all

"ࡢ㤥¸槨骫쭮ﰱ㴵縹콾"
Expected:

Code: Select all

"АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ"
Screenshots:
http://img7.imageshost.ru/imgs/091025/a ... /e2e93.png
http://img7.imageshost.ru/imgs/091025/b ... /44605.jpg

Version: newLISP v.10.1.5 on Win32 IPv4 UTF-8

Btw, in Ubuntu "upper-case" works allright:
http://img7.imageshost.ru/imgs/091025/3 ... /31ca8.png

m35
Posts: 171
Joined: Wed Feb 14, 2007 12:54 pm
Location: Carifornia

Re: Upper-case, UTF-8 and Windows won`t work together

Post by m35 »

According to 10.1.5 newLISP nl-string.c

Code: Select all

/* Note that on many platforms towupper/towlower
do not work correctly for non-ascii unicodes */
Have you tried using (upper-case) using the regular (non UTF8) newLISP?

Given the way Windows doesn't cater to UTF8 by default, this may require a bit of platform specific code.

Links for reference
http://msdn.microsoft.com/en-us/library ... 71%29.aspx
http://www.lingoport.com/gi/help/gihelp ... oupper.htm

Fritz
Posts: 66
Joined: Sun Sep 27, 2009 12:08 am
Location: Russia

Re: Upper-case, UTF-8 and Windows won`t work together

Post by Fritz »

non-UTF upper- and low-case just do nothing with russian letters. (lower-case "A") -> "A", (upper-case "a") -> "a".

But that is really not too important: newLISP is comfortable enough to create any encoding function in coffee-cup time. Anyway I had to write my own functions to decode russian letters in URL, in POST-queries, in RTF-files etc.

Code: Select all

(set 'cyr-alphabet (list "а" "б" "в" "г" "д" "е" "ё" "ж" "з" "и" "й" "к" "л" "м" "н" "о" "п" "р" "с" "т" "у" "ф" "х" "ц" "ч" "ш" "щ" "ъ" "ы" "ь" "э" "ю" "я" "А" "Б" "В" "Г" "Д" "Е" "Ё" "Ж" "З" "И" "Й" "К" "Л" "М" "Н" "О" "П" "Р" "С" "Т" "У" "Ф" "Х" "Ц" "Ч" "Ш" "Щ" "Ъ" "Ы" "Ь" "Э" "Ю" "Я"))

(define (cyr-low linea)
  (let (menudo "" letra "")
    (while (!= (set 'letra (pop linea)) "")
      (if (and (find letra cyr-alphabet) (> (find letra cyr-alphabet) 32))
        (push (cyr-alphabet (- (find letra cyr-alphabet) 33)) menudo -1)
        (push letra menudo -1)))
    menudo))
Screenshot:
http://img7.imageshost.ru/imgs/091027/6 ... /812d6.jpg

Locked