char bug?

maq · Post by **maq** » Fri Apr 18, 2008 10:56 pm

On version 9.3, I get the following:

newLISP v.9.3.0 on OSX UTF-8, execute 'newlisp -h' for more info.

> (char "")

string index out of bounds in function char
>

Prior to 9.3, this used to return 0. Shouldn't this be still the case being that an empty string by definition (at least in C, which I thought newLISP also followed) is one that contains only the null char (\0)?

Thanks,

--maq

ghfischer · Post by **ghfischer** » Sat Apr 19, 2008 7:57 pm

In nl-string.c change line 219 from

Code: Select all

offset = adjustNegativeIndex(offset, len);

to

Code: Select all

if ((offset != 0) || (len > 0)) offset = adjustNegativeIndex(offset, len);

This will produce the following behavior:

Code: Select all

newLISP v.9.3.8 on OSX IPv4 UTF-8, execute 'newlisp -h' for more info.

> (char "")
0
> (char "" 0)
0
> (char "" 1)

string index out of bounds in function char
> (char "" -1)

string index out of bounds in function char
> (char "a")
97
> (char "a" 1)

string index out of bounds in function char
> (char "a" 0)
97
> (exit)

Lutz · Post by **Lutz** » Sun Apr 20, 2008 6:38 am

Note that 0 and -1 always should return the same result, because they refer to the same character: the first or the last.

The new behavior in 9.3.0 occurred after introducing error reporting for out of range indices on strings.

Thinking a bit more about this I believe that

Code: Select all

(char "") -> 0 ; previous to 9.3.0

is actually wrong. The 'char' function does not work on binary characters, so it should never return a 0 because zero is not a valid character in neither ASCII or UTF-8. The "" string is empty and does not have any characters. The fact that C-strings are finished with 0 is a C-issue.

So 9.3.0 behavior will stay ;-), but perhaps return 'nil' on an empty string?

ps: I deleted my previous post.

ghfischer · Post by **ghfischer** » Sun Apr 20, 2008 3:30 pm

I'm of the opinion that char should move from strings to integers and not introduce nil. I think the empty string is a valid special case where 0 should be returned.

That being said I'm ok with char returning errors if the offset is out of bounds. I'd argue that any argument to the offset for an empty string should be an out of bounds error - but looking at the code it seems easier to just make offset=0 a special case rather that test offset for nil.

0 is a valid character in the ascii set, it's the nul character.
So if (char 0) --> "\000" then (char "\000") --> 0.
"" is simply shorthand for "\000".

xytroxon · Post by **xytroxon** » Sun Apr 20, 2008 9:12 pm

What do other languages do?

Python has the ord function...
http://docs.python.org/lib/built-in-funcs.html

>>> ord("A")
65

>>> ord("AB")
TypeError: ord() expected a character, but string of length 2 found

>>> ord("")
TypeError: ord() expected a character, but string of length 0 found

------------

Python's chr function behaves as follows:

>>> chr(-1)
ValueError: chr() arg not in range(256)

>>> chr(0)
'\x00'

>>> chr(123)
'{'

>>> chr(257)
ValueError: chr() arg not in range(256)

---------

Anyone use Ruby or other languages?

maq · Post by **maq** » Mon Apr 21, 2008 5:42 pm

Lutz wrote:
The new behavior in 9.3.0 occurred after introducing error reporting for out of range indices on strings.

I looked on the 9.3 release notes and did not see anything related to this, or the similar new behavior on lists. Could you please direct me to an explanation of the new behavior. This change is causing many things to break in a pre-9.3 codebase that I am working from and I need a systemic way to identify all the affected functions and correct them.

Lutz wrote: So 9.3.0 behavior will stay ;-), but perhaps return 'nil' on an empty string?

I would argue that returning a nil is better than erroring out and thus requiring additional code to test if the input is within bounds (e.g. not empty) or to catch the error. It jut seems that this one could go either way, so why not default towards an implementation that results in less code.

newdep · Post by **newdep** » Mon Apr 21, 2008 5:51 pm

Just a mind set..

(get-char) returns "\x00"

(get-char "") returns 0

(char) returns "\000"

so i would assume (char "") to return 0

But logicaly spoken (char "") is nil

(To be honest, i dont like "string index out of bounds in function char")
I like more consistent global returns of nil when its not-true or out of bounds. because the use of (nil? ...) just fits it perfectly.. or (zero? ...)

Lutz · Post by **Lutz** » Mon Apr 21, 2008 6:07 pm

in develpment version 9.3.10 tomorrrow

(char "") => nil
empty string and 0 character not defined as displayable in ASCII and UTF-8, also "" and "\000" are not the same (= "" "\000") => nil. Now can do: (if (char str) ...)

(char), (get-char)
will give a missing parameter error message (also get-int, get-float, get-string)

(get-char "") => 0
get-char gets the byte at the address of the empty string, which is a zero byte, stays as is.

newdep · Post by **newdep** » Mon Apr 21, 2008 7:27 pm

I realy must have missed 9.3.9..
I thought I had it but there is only a 9.3.8 to be found..

9.3.9 must have been a ghost release when we move to 9.3.10 ;-)

cormullion · Post by **cormullion** » Mon Apr 21, 2008 8:27 pm

I think the list index out of bounds was discussed at length here and is also mentioned in the 9.3 release notes. Not sure about strings...

I had to fix a bit of my code too. I sympathize!

newlispfanclub.alh.net

char bug?

char bug?

A temporary patch.