Page 1 of 1
char bug?
Posted: Fri Apr 18, 2008 10:56 pm
by maq
On version 9.3, I get the following:
Code: Select all
newLISP v.9.3.0 on OSX UTF-8, execute 'newlisp -h' for more info.
> (char "")
string index out of bounds in function char
>
Prior to 9.3, this used to return 0. Shouldn't this be still the case being that an empty string by definition (at least in C, which I thought newLISP also followed) is one that contains only the null char (\0)?
Thanks,
--maq
A temporary patch.
Posted: Sat Apr 19, 2008 7:57 pm
by ghfischer
In nl-string.c change line 219 from
Code: Select all
offset = adjustNegativeIndex(offset, len);
to
Code: Select all
if ((offset != 0) || (len > 0)) offset = adjustNegativeIndex(offset, len);
This will produce the following behavior:
Code: Select all
newLISP v.9.3.8 on OSX IPv4 UTF-8, execute 'newlisp -h' for more info.
> (char "")
0
> (char "" 0)
0
> (char "" 1)
string index out of bounds in function char
> (char "" -1)
string index out of bounds in function char
> (char "a")
97
> (char "a" 1)
string index out of bounds in function char
> (char "a" 0)
97
> (exit)
Posted: Sun Apr 20, 2008 6:38 am
by Lutz
Note that 0 and -1 always should return the same result, because they refer to the same character: the first or the last.
The new behavior in 9.3.0 occurred after introducing error reporting for out of range indices on strings.
Thinking a bit more about this I believe that
Code: Select all
(char "") -> 0 ; previous to 9.3.0
is actually wrong. The 'char' function does not work on binary characters, so it should never return a 0 because zero is not a valid character in neither ASCII or UTF-8. The "" string is empty and does not have any characters. The fact that C-strings are finished with 0 is a C-issue.
So 9.3.0 behavior will stay ;-), but perhaps return 'nil' on an empty string?
ps: I deleted my previous post.
Posted: Sun Apr 20, 2008 3:30 pm
by ghfischer
I'm of the opinion that char should move from strings to integers and not introduce nil. I think the empty string is a valid special case where 0 should be returned.
That being said I'm ok with char returning errors if the offset is out of bounds. I'd argue that any argument to the offset for an empty string should be an out of bounds error - but looking at the code it seems easier to just make offset=0 a special case rather that test offset for nil.
0 is a valid character in the ascii set, it's the nul character.
So if (char 0) --> "\000" then (char "\000") --> 0.
"" is simply shorthand for "\000".
Posted: Sun Apr 20, 2008 9:12 pm
by xytroxon
What do other languages do?
Python has the ord function...
http://docs.python.org/lib/built-in-funcs.html
>>> ord("A")
65
>>> ord("AB")
TypeError: ord() expected a character, but string of length 2 found
>>> ord("")
TypeError: ord() expected a character, but string of length 0 found
------------
Python's chr function behaves as follows:
>>> chr(-1)
ValueError: chr() arg not in range(256)
>>> chr(0)
'\x00'
>>> chr(123)
'{'
>>> chr(257)
ValueError: chr() arg not in range(256)
---------
Anyone use Ruby or other languages?
Posted: Mon Apr 21, 2008 5:42 pm
by maq
Lutz wrote:
The new behavior in 9.3.0 occurred after introducing error reporting for out of range indices on strings.
I looked on the 9.3 release notes and did not see anything related to this, or the similar new behavior on lists. Could you please direct me to an explanation of the new behavior. This change is causing many things to break in a pre-9.3 codebase that I am working from and I need a systemic way to identify all the affected functions and correct them.
Lutz wrote:
So 9.3.0 behavior will stay ;-), but perhaps return 'nil' on an empty string?
I would argue that returning a nil is better than erroring out and thus requiring additional code to test if the input is within bounds (e.g. not empty) or to catch the error. It jut seems that this one could go either way, so why not default towards an implementation that results in less code.
Posted: Mon Apr 21, 2008 5:51 pm
by newdep
Just a mind set..
(get-char) returns "\x00"
(get-char "") returns 0
(char) returns "\000"
so i would assume (char "") to return 0
But logicaly spoken (char "") is nil
(To be honest, i dont like "string index out of bounds in function char")
I like more consistent global returns of nil when its not-true or out of bounds. because the use of (nil? ...) just fits it perfectly.. or (zero? ...)
Posted: Mon Apr 21, 2008 6:07 pm
by Lutz
in develpment version 9.3.10 tomorrrow
(char "") => nil
empty string and 0 character not defined as displayable in ASCII and UTF-8, also "" and "\000" are not the same (= "" "\000") => nil. Now can do: (if (char str) ...)
(char), (get-char)
will give a missing parameter error message (also get-int, get-float, get-string)
(get-char "") => 0
get-char gets the byte at the address of the empty string, which is a zero byte, stays as is.
Posted: Mon Apr 21, 2008 7:27 pm
by newdep
I realy must have missed 9.3.9..
I thought I had it but there is only a 9.3.8 to be found..
9.3.9 must have been a ghost release when we move to 9.3.10 ;-)
Posted: Mon Apr 21, 2008 8:27 pm
by cormullion
I think the list index out of bounds was discussed at length here and is also mentioned in the 9.3 release notes. Not sure about strings...
I had to fix a bit of my code too. I sympathize!