How to do string like binary?

dexter · Post by **dexter** » Wed Nov 16, 2011 8:46 am

I set a str with cjk chars like

(setq cn "中文abc")

which contains chinese chars

How can I cut this string into an binary array like in C cn

Cause I need to putchar this string ,but in newlisp

if I use slice like :

Code: Select all

> (char (slice cn 0 1))
16384
> (char (slice cn 1 1))
184
> (char (slice cn 2 1))
173
> (char (slice cn 3 1))
24576

I think this is not the right code value .right?

dexter · Post by **dexter** » Wed Nov 16, 2011 9:11 am

DONE

TURN OFF UTF8 SUPPORT
---------------------------------------------

Turn off utf8 support in makefile
rebuild newlisp withouf utf8

you will see -DSUPPORT_UTF8 in
makefile_build
makefile_linuxLP64_utf8
....
I Just deleted -DSUPPORT_UTF8.

now if ( setq cn "中文")
it'll be :

Code: Select all

> (setq cn "中文")
"\228\184\173\230\150\135"

20013 or else will cause putchar (FCGI_putchar ) error.

the right code of 中文 is above 228....

like lutz said

:)

sunmountain · Post by **sunmountain** » Wed Nov 16, 2011 11:04 am

Could you please tell the rest of us, what exactly you did ?
BTW, the correct codes should be:

中 20013
文 25991
a 97
b 98
c 99

(verified by Python 2.7.2).
There you have to explicitly mark a string as unicode via u'the string' (this changed in Python 3.x, where
all strings are unicode by default).

I'm asking because disabling unicode support while using unicode strings and then getting correct
results seems a bit strange.
Perhaps you could post the code you wrote.

Me wants to learn :-)

Lutz · Post by **Lutz** » Wed Nov 16, 2011 2:38 pm

In UTF-8 versions of newLISP indexing on strings works on character rather than single byte boundaries. Although 'slice' slices binary, 'char' will try to convert to Unicode on UTF-8 versions of newLISP. Use 'unpack':

Code: Select all

> (unpack (dup "b" (length cn)) cn)
(228 184 173 230 97 98 99)

In the manual all functions working on UTF-8 character boundaries are marked with a utf8 behind the red function name.

There is a list of all of these functions in this chapter:

http://www.newlisp.org/downloads/newlis ... icode_utf8

ps: run this to see how it works:

Code: Select all

(set 'str "中文abc")
(println (unpack (dup "b" (length str)) str))
(println (explode str))
(dotimes (i (utf8len str))
    (print (str i) " -> ")
    (println (char (str i))))

gives you this output:

Code: Select all

(228 184 173 230 150 135 97 98 99)
("中" "文" "a" "b" "c")
中 -> 20013
文 -> 25991
a -> 97
b -> 98
c -> 99

newlispfanclub.alh.net

How to do string like binary?

How to do string like binary?

Re: How to do string like binary?

Re: How to do string like binary?

Re: How to do string like binary?