trim and utf-8 oddness

For the Compleat Fan
Locked
cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

trim and utf-8 oddness

Post by cormullion »

I'm using newLISP UTF-8 on MacOS. Why does the \000 start appearing?

Code: Select all

(set 't "a hypothetical one-dimensional subatomic particle")
"a hypothetical one-dimensional subatomic particle"
> (trim t)
"a hypothetical one-dimensional subatomic particle"
> t
"a hypothetical one-dimensional subatomic particle"
> (trim t "e")
"a hypothetical one-dimensional subatomic particl"
> t
"a hypothetical one-dimensional subatomic particl\000"
> (trim t "a" "e")
" hypothetical one-dimensional subatomic particl"
> t
" hypothetical one-dimensional subatomic particl\000\000"
> 

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

It has always been this way, and you see the \000 only when a string is returned on the commandline. When you print it, its Ok:

Code: Select all

> (set 's "ABC\000")
"ABC\000"
> (println s)
ABC
"ABC\000"
> 
newLISP can work with binary contents in strings, for debugging purpuses it is important to have a way to 'see' that binary contents. Not that the string also appears quoted.

But there is another issue with characters > ASCII 127. These characters starting a few development versions back, where also shown in \nnn format. This was not good for European useres and Windows users using the PC codepage 859, which carries special European characters, money symbols and other symbols and some graphical characters in that codepage.

Starting with version 8.7.10 upper ASCII will only be shown as \nnn codes when the default "C" locale is specified. When any other locale is specified than upper ASCII will be displayed as characters not codes.

But remember all this discussion is only about strings displayed in the interactive newLISP console as return values. When displaying upper ASCII with 'print'n' etc. The character will be displayed as a '?' question mark, if not part of the current code page.

Lutz

ps: and there is an entirely different issue with 'trim' which seems to behave desctructively in UTF-8 which it not should and will be fixed in 8.9.10

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

Thanks. As you say, it only looks odd!

Locked