UTF considerations

For the Compleat Fan
Locked
pjot
Posts: 733
Joined: Thu Feb 26, 2004 10:19 pm
Location: The Hague, The Netherlands
Contact:

UTF considerations

Post by pjot »

It appears that GTK widgets by default use UTF-8 encoded text. I thought I could use the newLisp 'utf8' function to convert (extended) ASCII to UTF, but this did not work the way I expected; the utf8-function always assumes a 4-byte UCS encoded string.

Now, what I was looking for, was a function which could convert a character to UTF-8.

I have written a small function myself which performs this task for a string, assuming bytevalues 0-255:

Code: Select all

(define (utf str)
(set 't 0)
(while (< t (length str))
(begin
		(set 'x (nth t str))
		(if (> (char x) 127)
			(begin
				(set 'b1 (+ (/ (& (char x) 192) 64) 192))
				(set 'b2 (+ (& (char x) 63) 128))
				(set-nth t str (append (char b1)(char b2)))
				(inc 't)
			)
		)
		(inc 't)
	)
)
str)

Probably this can be optimized, but a character-by-character conversion is very slow. I wonder, might it not be convenient to have a UTF-8 conversion command available, like this:

(utf "Kein überraschung") -> "Kein überraschung"

How about that?

Locked