Fore nth to not use utf8 ?

Q&A's, tips, howto's
Locked
dexter
Posts: 74
Joined: Fri Nov 11, 2011 12:55 am

Fore nth to not use utf8 ?

Post by dexter »

Well new lisp support utf8

(nth utf8str index) will return a utf8 string

(slice utf8str 0 1) will return part of a utf8 string, let's say one byte

Why nth works like that?

And How can I force nth to work without utf8 behaviour? like slice do

without re-compile new lisp?

dexter
Posts: 74
Joined: Fri Nov 11, 2011 12:55 am

Re: Fore nth to not use utf8 ?

Post by dexter »

I re-compiled newlisp

This is not the first time ,-DSUPPORT_UTF-8 make mess with me


I really don't think new lisp should support utf8 natively , could be based on system, leave utf-8 to system

isn't it better?

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Re: Fore nth to not use utf8 ?

Post by Lutz »

Most people in the world speak languages which need Unicode characters, so UTF-8 should be the standard in newLISP rather than the exception ;-) But we also need to process binary information, so newLISP has functions to process for UTF-8 and others for processing binary string buffers. The manual indicates what it does for each string processing function.

All functions working on UTF-8 multi byte characters are also marked with a “utf8” suffix in the reference section. An there is also a table collecting them all:

http://www.newlisp.org/downloads/newlis ... f8_capable

dexter
Posts: 74
Joined: Fri Nov 11, 2011 12:55 am

Re: Fore nth to not use utf8 ?

Post by dexter »

I think if these "utf8 "function with a name "utf8_" in the head would less the mess

I know people speak different languages
but today ,most of the operation systems support multi language natively , quiet well

If I split a utf8 string to binary, which will be three bytes , and the print out these bytes, it will be showed as a utf8 string normally

Just a thought :)

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Re: Fore nth to not use utf8 ?

Post by Lutz »

Code: Select all

> (set  'str "我能吞下玻璃而不伤身体。")
"我能吞下玻璃而不伤身体。"

; split into 8-bit bytes

> (unpack (dup "b" (length str)) str)
(230 136 145 232 131 189 229 144 158 228 184 139 231 142 187 231 146 131 232 128 
 140 228 184 141 228 188 164 232 186 171 228 189 147 227 128 130)

; show a char for each 8-bit byte
> (map char (unpack (dup "b" (length str)) str))
("æ" "ˆ" "‘" "è" "ƒ" "½" "å" "" "ž" "ä" "¸" "‹" "ç" "Ž" "»" "ç" 
 "’" "ƒ" "è" "€" "Œ" "ä" "¸" "" "ä" "¼" "¤" "è" "º" "«" "ä" "½" 
 "“" "ã" "€" "‚")
> 
or do this if you want a unicode number for each Chinese character:

Code: Select all

> (explode str)
("我" "能" "吞" "下" "玻" "璃" "而" "不" "伤" "身" "体" "。")
> (map char (explode str))
(25105 33021 21534 19979 29627 29827 32780 19981 20260 36523 20307 12290)
> 

winger
Posts: 46
Joined: Wed Mar 14, 2012 7:31 am

Re: Fore nth to not use utf8 ?

Post by winger »

Code: Select all

> (println(join  ( 0 4 (explode "好汉不吃眼前亏"))) " ")
好汉
" "
This trick is very cool!
Welcome to a newlisper home:)
http://www.cngrayhat.org

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Re: Fore nth to not use utf8 ?

Post by Lutz »

you mean this:

Code: Select all

> (println(join  ( 0 4 (explode "好汉不吃眼前亏"))) " ")
好汉不吃 
" "

TedWalther
Posts: 608
Joined: Mon Feb 05, 2007 1:04 am
Location: Abbotsford, BC
Contact:

Re: Fore nth to not use utf8 ?

Post by TedWalther »

Since utf8 should be the default, for every function that has a binary equivalent, is that mentioned in the manual, what to use for binary mode instead of utf8 mode?

Instead of the utf8_ prefix proposal, I propose the opposite; bin_ prefix for all the functions for when you want it to work byte by byte intead of char by char.

Knowing Lutz, he probably has something even better and more fun up his sleeve.
Cavemen in bearskins invaded the ivory towers of Artificial Intelligence. Nine months later, they left with a baby named newLISP. The women of the ivory towers wept and wailed. "Abomination!" they cried.

Locked