Fore nth to not use utf8 ?

Q&A's, tips, howto's

Fore nth to not use utf8 ?

Postby dexter » Fri Jan 03, 2014 5:04 am

Well new lisp support utf8

(nth utf8str index) will return a utf8 string

(slice utf8str 0 1) will return part of a utf8 string, let's say one byte

Why nth works like that?

And How can I force nth to work without utf8 behaviour? like slice do

without re-compile new lisp?
dexter
 
Posts: 74
Joined: Fri Nov 11, 2011 12:55 am

Re: Fore nth to not use utf8 ?

Postby dexter » Fri Jan 03, 2014 10:08 am

I re-compiled newlisp

This is not the first time ,-DSUPPORT_UTF-8 make mess with me


I really don't think new lisp should support utf8 natively , could be based on system, leave utf-8 to system

isn't it better?
dexter
 
Posts: 74
Joined: Fri Nov 11, 2011 12:55 am

Re: Fore nth to not use utf8 ?

Postby Lutz » Fri Jan 03, 2014 3:35 pm

Most people in the world speak languages which need Unicode characters, so UTF-8 should be the standard in newLISP rather than the exception ;-) But we also need to process binary information, so newLISP has functions to process for UTF-8 and others for processing binary string buffers. The manual indicates what it does for each string processing function.

All functions working on UTF-8 multi byte characters are also marked with a “utf8” suffix in the reference section. An there is also a table collecting them all:

http://www.newlisp.org/downloads/newlis ... f8_capable
Lutz
 
Posts: 5279
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California

Re: Fore nth to not use utf8 ?

Postby dexter » Fri Jan 10, 2014 7:58 am

I think if these "utf8 "function with a name "utf8_" in the head would less the mess

I know people speak different languages
but today ,most of the operation systems support multi language natively , quiet well

If I split a utf8 string to binary, which will be three bytes , and the print out these bytes, it will be showed as a utf8 string normally

Just a thought :)
dexter
 
Posts: 74
Joined: Fri Nov 11, 2011 12:55 am

Re: Fore nth to not use utf8 ?

Postby Lutz » Fri Jan 10, 2014 9:28 pm

Code: Select all
> (set  'str "我能吞下玻璃而不伤身体。")
"我能吞下玻璃而不伤身体。"

; split into 8-bit bytes

> (unpack (dup "b" (length str)) str)
(230 136 145 232 131 189 229 144 158 228 184 139 231 142 187 231 146 131 232 128
 140 228 184 141 228 188 164 232 186 171 228 189 147 227 128 130)

; show a char for each 8-bit byte
> (map char (unpack (dup "b" (length str)) str))
("æ" "ˆ" "‘" "è" "ƒ" "½" "å" "" "ž" "ä" "¸" "‹" "ç" "Ž" "»" "ç"
 "’" "ƒ" "è" "€" "Œ" "ä" "¸" "" "ä" "¼" "¤" "è" "º" "«" "ä" "½"
 "“" "ã" "€" "‚")
>


or do this if you want a unicode number for each Chinese character:

Code: Select all
> (explode str)
("我" "能" "吞" "下" "玻" "璃" "而" "不" "伤" "身" "体" "。")
> (map char (explode str))
(25105 33021 21534 19979 29627 29827 32780 19981 20260 36523 20307 12290)
>
Lutz
 
Posts: 5279
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California

Re: Fore nth to not use utf8 ?

Postby winger » Sun Feb 09, 2014 6:24 am

Code: Select all
> (println(join  ( 0 4 (explode "好汉不吃眼前亏"))) " ")
好汉
" "

This trick is very cool!
Welcome to a newlisper home:)
http://www.cngrayhat.org
winger
 
Posts: 46
Joined: Wed Mar 14, 2012 7:31 am

Re: Fore nth to not use utf8 ?

Postby Lutz » Sun Feb 09, 2014 3:17 pm

you mean this:
Code: Select all
> (println(join  ( 0 4 (explode "好汉不吃眼前亏"))) " ")
好汉不吃
" "
Lutz
 
Posts: 5279
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California

Re: Fore nth to not use utf8 ?

Postby TedWalther » Sun Feb 09, 2014 7:07 pm

Since utf8 should be the default, for every function that has a binary equivalent, is that mentioned in the manual, what to use for binary mode instead of utf8 mode?

Instead of the utf8_ prefix proposal, I propose the opposite; bin_ prefix for all the functions for when you want it to work byte by byte intead of char by char.

Knowing Lutz, he probably has something even better and more fun up his sleeve.
Cavemen in bearskins invaded the ivory towers of Artificial Intelligence. Nine months later, they left with a baby named newLISP. The women of the ivory towers wept and wailed. "Abomination!" they cried.
TedWalther
 
Posts: 605
Joined: Mon Feb 05, 2007 1:04 am
Location: Abbotsford, BC


Return to newLISP in the real world

Who is online

Users browsing this forum: No registered users and 2 guests

cron