UTF-8 - Unicode in development version 8.0.8

Notices and updates
Locked
Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

UTF-8 - Unicode in development version 8.0.8

Post by Lutz »

Just posted the 1st Unicode/UTF-8 compileable development version. I did testing with Cyrillic/Greek/Hebrew/Russian character sets, but could not test on a platform with keybboard support for those characters or platforms which heavily use multibyte characters like Chinese/Japanese/Indian/Arabic chracter sets and also input these from the keyboard.

I believe JP (Jean Pierre) on this board is running Japanese Windows?

There could be (shouldn't) differences running the Tcl/Tk frontend and running newlisp.exe or newlisp (Linux binary) alone. The TCl/TK frontend switches fine on Linux, but could not test on Win32. It is not only display but also the correct working of UTF-8 versions of specific string functions (see CHANGES file and manual cpater about UTF-8), like 'trim', 'nth', 'upper-case', etc..

Any feedback about this is appreciated.

Lutz

HPW
Posts: 1390
Joined: Thu Sep 26, 2002 9:15 am
Location: Germany
Contact:

Post by HPW »

Running Turtle.lsp with the UTF-8 EXE gives an error ' Bad screen distance "302,1612092" '.
Hans-Peter

HPW
Posts: 1390
Joined: Thu Sep 26, 2002 9:15 am
Location: Germany
Contact:

Post by HPW »

When I use a german Umlaut in a String with the Trim command I get a strange result.
> (trim "Höhe;;" ";")
"Hö¨¥–"
Do I need to give the trim command a UTF-8 string where the umlaut is encoded in a compatible way?
Hans-Peter

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

I don't think you should use the UTF-8 version on Win32 in Germany, where Windows is localized with German as a one-byte-character language, probably with code page ISO-8859.

Windows in Germany and other European countries will display Unicode in the notepad.exe application and others but is else not a Unicode enabled OS.

Lutz

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

just found out:

(trim "Höhe;;" ";") => "Höhe"

works fine on newlisp-utf8.exe when in the command shell, it is together with the Tcl/Tk frontend, that is gets confused. I wonder if on Win32 Tcl/Tk has to be compiled as a Unicode application with unicows.dll/lib etc. On Linux it switches on startup.

Lutz

HPW
Posts: 1390
Joined: Thu Sep 26, 2002 9:15 am
Location: Germany
Contact:

Post by HPW »

Thanks for the info.

In the utf8 doku is a typo:

'The utf8 function is used top convert from UCS-4 to UTF-8'

We all know that newLISP is top but I think it should read 'to'.
Hans-Peter

HPW
Posts: 1390
Joined: Thu Sep 26, 2002 9:15 am
Location: Germany
Contact:

Post by HPW »

Maybe the trim example in the docu could be clearer:

Code: Select all

(trim "00012340" "0")            => "1234"
(trim "00012340" "0" "")         => "12340"
(trim "01234000" "" "0")         => "01234"
Hans-Peter

jp
Posts: 22
Joined: Sun Mar 21, 2004 5:21 am

newlisp-utf8.exe breakin UTF8 code

Post by jp »

Lutz

Strangely enough I did try your newlisp-utf8.exe and I found it brakes code when
used with UTF8 strings but the regular NewLisp does not
Example if you run the strings ..

(trim " 日本語が難しい ") ;; Japanish ist schwer (UTF8 i
n Japanese)
(trim " Er ist ein großer Schwätzer ") ;; Er ist ein grosser Schwaetzer
(UTF8 in German)

The code will be broken on both accounts by newlisp-utf8.exe but left intact wit
h Newlisp

Jean-Pierre

jp
Posts: 22
Joined: Sun Mar 21, 2004 5:21 am

Post by jp »

Lutz wrote:I don't think you should use the UTF-8 version on Win32 in Germany, where Windows is localized with German as a one-byte-character language, probably with code page ISO-8859.

Lutz
Indeed under XP with the default code page chcp 437, under the command prompt
echo (trim "Höhe;;" ";") > test.txt
notepad test.txt will show that we have an ANSI coded file.

Jean-Pierre

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

Seems like Windows uses Unicode only internally but else translates to one-byte-character code pages. But when loading a utf-8 file into notepad.exe it works correctly. You also can read this file in newlisp-utf8.exe, upper-case the string and write it back, and it will be fine in notepad.exe. 'upper-case' in newLISP converts the a utf-8 string to 4-byte Unicode and calls a Borland/Windows or Linux -library function towupper(), then converts back to utf-8. notepad.exe also has a save-as option for utf-8.

I wonder if all you need is a utf-8 compiled cmd.exe, like it is the case on Linux with Xterm, and I thought that perhaps Japanese Windows would be like this. Did you try your experiment on US-WinXP or on a Japanese localized version?

Lutz

jp
Posts: 22
Joined: Sun Mar 21, 2004 5:21 am

Post by jp »

The localization won’t matter under Win2k or XP since all the internal representations are in Unicode (UTF-16LE). Strictly speaking UTF8 is not Unicode but a coding that lends itself readily to conversion in Unicode(s). The disparity between newlisp-utf8.exe and its UNIX counterpart could come that under UNIX Unicode is not Low Endian but High Endian and Windows will require a Low Endian code otherwise will mess up subsequent conversion in UTF8.

Jean-Pierre

jp
Posts: 22
Joined: Sun Mar 21, 2004 5:21 am

Post by jp »

HPW wrote:Running Turtle.lsp with the UTF-8 EXE gives an error ' Bad screen distance "302,1612092" '.
Running an equivalent program UTF-8 EXE was able to carry all its calculations and display Japanese without any problems
Jean-Pierre

========= Kame.lsp
;; Kame.lsp - graphics
;; written by Jean-Pierre Berard
;;
;; 1 rad = 180/3.1415927 = 57.29578 deg
;; 1 deg = 0.017453292 rad

(set! color "blue")
(set! width 500)
(set! height 500)

(define (convert angle) (mul angle 0.017453292))
(define (adjacent-cos angle hypo) (mul hypo (cos (convert angle))))
(define (adjacent-tan angle opposite) (div opposite (tan (convert angle))))
(define (hypo-sin angle opposite) (div opposite (sin (convert angle))))
(define (hypo-cos angle adjacent) (div adjacent (cos (convert angle))))
(define (opposite-sin angle hypo) (mul hypo (sin (convert angle))))
(define (opposite-tan angle adjacent) (mul adjacent (tan (convert angle))))
(define (outer inner-angle) (sub 180 inner-angle))

(define (rectangular angle radius)
(set! x (adjacent-cos angle radius))
(set! y (opposite-sin angle radius))
(println "x=" x " y=" y)
true
)

(define (polar x y)
(set! angle (div (atan (div y x)) 0.017453292))
(set! radius (root (add (pow x 2) (pow y 2)) 2))
(println "angle=" angle " radius=" radius)
true
)

(define (triangulation side side-size)
(set! y (div side-size 2))
(set! angle (div 360 side 2))
(set! x (adjacent-tan angle y))
(set! radius (hypo-sin angle y))
(println "angle=" angle " radius=" radius)
(println "x=" x " y=" y)
(pen 'yellow)
(forward y)
(right 90)
(forward x)
(right (sub 180 angle))
(forward radius)
true
)

(define (pseudo-polygon side n)
(set! ratio (div 360 side))
(dotimes (x side)
(forward n)
(right ratio))
(left ratio)
)

(define (polygon side n)
(dotimes (x side)
(forward n)
(right (div 360 side))
))

(define (oval x y)
(set! Y (sub lastY (div y 2)))
(tk ".kw.canvas create oval "
(join (map string (list lastX Y (add lastX x) (add Y y))) " ")
" -outline " color)
(round (div direction 0.017453292))
)

(define (circle n)
(set! X 0)
(set! x (round lastX))
(set! y (round lastY))
(set 'direction -1.570796327)
(set! ratio (mul (div 57.29578 n) 2))
(until (and (= x X) (= y (round lastY)))
(set! X (round lastX))
(forward 1)
(right ratio))
)

(define (cercle n)
(set! x (round lastX))
(set! y (round lastY))
(set! lastX (+ x n))
(for (t 0 2 0.005) ;; from 0 to 2 rad
(set! newX (mul n (cos (mul pi t))))
(set! newY (mul n (sin (mul pi t))))
(set! newX (add newX x))
(set! newY (add newY y))
(tk ".kw.canvas create line "
(join (map string (list lastX lastY newX newY)) " ")
" -fill " color)
(set 'lastX newX)
(set 'lastY newY))
(set 'lastX x)
(set 'lastY y)
(round (div direction 0.017453292))
)

(define (rose clr)
(set 'color clr)
(dotimes (x 90)
(pseudo-polygon 4 60)
(right 2))
)

(define (square n)
(dotimes (x 4)
(forward n)
(right 90))
)

(define (squirl n)
(dotimes (x (/ n 3))
(forward n)
(right 90)
(set! n (- n 2)))
(round (div direction 0.017453292))
)

(define (dragon sign level)
(if (= 0 level)
(forward 4)
(begin
(dec 'level)
(right (sign 45))
(dragon - level)
(left (sign 90))
(dragon + level)
(right (sign 45))
)))

(define (dragon-curve n clr)
(set 'color clr)
(dragon + n)
)

(define (right d)
(set 'direction (add direction (mul d 0.017453292)))
(round (div direction 0.017453292)))

(define (left d)
(set 'direction (sub direction (mul d 0.017453292)))
(round (div direction 0.017453292)))

(define (forward d)
(set 'newX (add lastX (mul (cos direction) d)))
(set 'newY (add lastY (mul (sin direction) d)))
(tk ".kw.canvas create line "
(join (map string (list lastX lastY newX newY)) " ")
" -fill " color)
(tk "update idletasks")
(set 'lastX newX)
(set 'lastY newY)
(round (div direction 0.017453292))
)

(define (backward d)
(set! direction (mul -1 direction))
(forward d)
)

(define (pen clr) (set! color (string clr)))

(define (clear) ;; upper left and lower right
(tk ".kw.canvas create rectangle 0 0 "
(join (map string (list width height)) " ")
" -fill black -tag clear")
(center)
)

(define (center)
(set 'lastX (/ width 2))
(set 'lastY (/ height 2))
(set 'direction -1.570796327))

(define (start x y)
(set 'lastX x)
(set 'lastY y)
(set 'direction -1.570796327))

(define (goto x y)
(set 'lastX x)
(set 'lastY y)
(round (div direction 0.017453292))
)

(begin
(set! today (parse (date (apply date-value (now)))))
(println (car today) " " (cadr today) " " (caddr today))
(set! nihongo {\u4e80\u3000\u4f5c\u56f3})
(tk "if {[winfo exists .kw] == 1} {destroy .kw}")
(tk "toplevel .kw")
(tk "canvas .kw.canvas -width " width " -height " height " -bg black")
(tk "pack .kw.canvas")
(tk "wm geometry .kw +290+25")
(tk "wm title .kw { Kame.lsp}")
(tk "bind .kw exit")
(start 50 450)
(squirl 400)
(rose "red")
(tk ".kw.canvas create text 130 380 "
"-fill white -font {Times 22 normal} -text " nihongo)
)

(define (help)
(println "outer inner-angle")
(println "adjacent-cos angle hypo")
(println "adjacent-tan angle opposite")
(println "hypo-sin angle opposite")
(println "hypo-cos angle adjacent")
(println "opposite-sin angle hypo")
(println "opposite-tan angle adjacent")
(println "triangulation side side-size")
(println "rectangular angle radius")
(println "polar x y")
true
)

jp
Posts: 22
Joined: Sun Mar 21, 2004 5:21 am

Post by jp »

[/quote]
Running an equivalent program UTF-8 EXE was able to carry all its calculations and display Japanese without any problems
Jean-Pierre

========= Kame.lsp
[/quote]

Sorry the second statement after (begin
(println (car today) " " (cadr today) " " (caddr today))
has to be substituted with ….
(println (nth 0 today) " " (nth 1 today) " " (nth 3 today))

One has also to add the function …
(define (round n) (floor (add n 0.5)))
to make it newlisp standard script

Jean-Pierre

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

Thanks Jean-Pierre and Hans-Peter for all the input about the UTF-8 version on Win32. It seems that things are working ok, except for the UTF-8 version of 'trim'.

I found the problem with 'trim' and fixed it for the next development version 8.0.9, probably out tomorrow. I still want to retest all other UTF-8 enabled functions on Windows, which is a bit tedious, because I have to write all strings before/after manipulation to a file and then view them with notepad.exe, to see if they correctly work on character versus byte borders and not change things they shouldn't.

Lutz

jp
Posts: 22
Joined: Sun Mar 21, 2004 5:21 am

Post by jp »

Lutz

Well done everything seems to be fixed except for the best feature of all; the ability of Newlisp to directly communicate with the clipboard. Newlisp-utf8.exe seems to disable completely the clipboard on Unicode and non Unicode based Windows (Win98/ME).
Also newlisp-utf8.dll and alas newlisp.dll as well cannot run processes under the exec function, only the function ! shell out works.

Jean-Pierre

Locked