How to do string like binary?

Q&A's, tips, howto's
Locked
dexter
Posts: 74
Joined: Fri Nov 11, 2011 12:55 am

How to do string like binary?

Post by dexter »

I set a str with cjk chars like

(setq cn "中文abc")

which contains chinese chars

How can I cut this string into an binary array like in C cn

Cause I need to putchar this string ,but in newlisp

if I use slice like :

Code: Select all

> (char (slice cn 0 1))
16384
> (char (slice cn 1 1))
184
> (char (slice cn 2 1))
173
> (char (slice cn 3 1))
24576

I think this is not the right code value .right?

dexter
Posts: 74
Joined: Fri Nov 11, 2011 12:55 am

Re: How to do string like binary?

Post by dexter »

DONE

TURN OFF UTF8 SUPPORT
---------------------------------------------


Turn off utf8 support in makefile
rebuild newlisp withouf utf8

you will see -DSUPPORT_UTF8 in
makefile_build
makefile_linuxLP64_utf8
....
I Just deleted -DSUPPORT_UTF8.

now if ( setq cn "中文")
it'll be :

Code: Select all

> (setq cn "中文")
"\228\184\173\230\150\135"
20013 or else will cause putchar (FCGI_putchar ) error.

the right code of 中文 is above 228....

like lutz said

:)
Last edited by dexter on Thu Nov 17, 2011 9:14 am, edited 1 time in total.

sunmountain
Posts: 39
Joined: Tue Mar 15, 2011 5:11 am

Re: How to do string like binary?

Post by sunmountain »

Could you please tell the rest of us, what exactly you did ?
BTW, the correct codes should be:

中 20013
文 25991
a 97
b 98
c 99

(verified by Python 2.7.2).
There you have to explicitly mark a string as unicode via u'the string' (this changed in Python 3.x, where
all strings are unicode by default).

I'm asking because disabling unicode support while using unicode strings and then getting correct
results seems a bit strange.
Perhaps you could post the code you wrote.

Me wants to learn :-)

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Re: How to do string like binary?

Post by Lutz »

In UTF-8 versions of newLISP indexing on strings works on character rather than single byte boundaries. Although 'slice' slices binary, 'char' will try to convert to Unicode on UTF-8 versions of newLISP. Use 'unpack':

Code: Select all

> (unpack (dup "b" (length cn)) cn)
(228 184 173 230 97 98 99)
In the manual all functions working on UTF-8 character boundaries are marked with a utf8 behind the red function name.

There is a list of all of these functions in this chapter:

http://www.newlisp.org/downloads/newlis ... icode_utf8

ps: run this to see how it works:

Code: Select all

(set 'str "中文abc")
(println (unpack (dup "b" (length str)) str))
(println (explode str))
(dotimes (i (utf8len str))
    (print (str i) " -> ")
    (println (char (str i))))
gives you this output:

Code: Select all

(228 184 173 230 150 135 97 98 99)
("中" "文" "a" "b" "c")
中 -> 20013
文 -> 25991
a -> 97
b -> 98
c -> 99

Locked