char bug?

Notices and updates
Locked
maq
Posts: 7
Joined: Tue Jan 18, 2005 6:20 am
Contact:

char bug?

Post by maq »

On version 9.3, I get the following:

Code: Select all

newLISP v.9.3.0 on OSX UTF-8, execute 'newlisp -h' for more info.

> (char "")

string index out of bounds in function char
>
Prior to 9.3, this used to return 0. Shouldn't this be still the case being that an empty string by definition (at least in C, which I thought newLISP also followed) is one that contains only the null char (\0)?

Thanks,

--maq

ghfischer
Posts: 14
Joined: Mon May 09, 2005 4:17 pm
Location: Austin, tX

A temporary patch.

Post by ghfischer »

In nl-string.c change line 219 from

Code: Select all

offset = adjustNegativeIndex(offset, len);
to

Code: Select all

if ((offset != 0) || (len > 0)) offset = adjustNegativeIndex(offset, len);
This will produce the following behavior:

Code: Select all

newLISP v.9.3.8 on OSX IPv4 UTF-8, execute 'newlisp -h' for more info.

> (char "")
0
> (char "" 0)
0
> (char "" 1)

string index out of bounds in function char
> (char "" -1)

string index out of bounds in function char
> (char "a")
97
> (char "a" 1)

string index out of bounds in function char
> (char "a" 0)
97
> (exit)


Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

Note that 0 and -1 always should return the same result, because they refer to the same character: the first or the last.

The new behavior in 9.3.0 occurred after introducing error reporting for out of range indices on strings.

Thinking a bit more about this I believe that

Code: Select all

(char "") -> 0 ; previous to 9.3.0
is actually wrong. The 'char' function does not work on binary characters, so it should never return a 0 because zero is not a valid character in neither ASCII or UTF-8. The "" string is empty and does not have any characters. The fact that C-strings are finished with 0 is a C-issue.

So 9.3.0 behavior will stay ;-), but perhaps return 'nil' on an empty string?

ps: I deleted my previous post.

ghfischer
Posts: 14
Joined: Mon May 09, 2005 4:17 pm
Location: Austin, tX

Post by ghfischer »

I'm of the opinion that char should move from strings to integers and not introduce nil. I think the empty string is a valid special case where 0 should be returned.

That being said I'm ok with char returning errors if the offset is out of bounds. I'd argue that any argument to the offset for an empty string should be an out of bounds error - but looking at the code it seems easier to just make offset=0 a special case rather that test offset for nil.

0 is a valid character in the ascii set, it's the nul character.
So if (char 0) --> "\000" then (char "\000") --> 0.
"" is simply shorthand for "\000".

xytroxon
Posts: 296
Joined: Tue Nov 06, 2007 3:59 pm
Contact:

Post by xytroxon »

What do other languages do?

Python has the ord function...
http://docs.python.org/lib/built-in-funcs.html

>>> ord("A")
65

>>> ord("AB")
TypeError: ord() expected a character, but string of length 2 found

>>> ord("")
TypeError: ord() expected a character, but string of length 0 found

------------

Python's chr function behaves as follows:

>>> chr(-1)
ValueError: chr() arg not in range(256)

>>> chr(0)
'\x00'

>>> chr(123)
'{'

>>> chr(257)
ValueError: chr() arg not in range(256)

---------

Anyone use Ruby or other languages?
"Many computers can print only capital letters, so we shall not use lowercase letters."
-- Let's Talk Lisp (c) 1976

maq
Posts: 7
Joined: Tue Jan 18, 2005 6:20 am
Contact:

Post by maq »

Lutz wrote:
The new behavior in 9.3.0 occurred after introducing error reporting for out of range indices on strings.
I looked on the 9.3 release notes and did not see anything related to this, or the similar new behavior on lists. Could you please direct me to an explanation of the new behavior. This change is causing many things to break in a pre-9.3 codebase that I am working from and I need a systemic way to identify all the affected functions and correct them.
Lutz wrote: So 9.3.0 behavior will stay ;-), but perhaps return 'nil' on an empty string?
I would argue that returning a nil is better than erroring out and thus requiring additional code to test if the input is within bounds (e.g. not empty) or to catch the error. It jut seems that this one could go either way, so why not default towards an implementation that results in less code.

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

Just a mind set..

(get-char) returns "\x00"

(get-char "") returns 0

(char) returns "\000"

so i would assume (char "") to return 0


But logicaly spoken (char "") is nil



(To be honest, i dont like "string index out of bounds in function char")
I like more consistent global returns of nil when its not-true or out of bounds. because the use of (nil? ...) just fits it perfectly.. or (zero? ...)
-- (define? (Cornflakes))

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

in develpment version 9.3.10 tomorrrow

(char "") => nil
empty string and 0 character not defined as displayable in ASCII and UTF-8, also "" and "\000" are not the same (= "" "\000") => nil. Now can do: (if (char str) ...)

(char), (get-char)
will give a missing parameter error message (also get-int, get-float, get-string)

(get-char "") => 0
get-char gets the byte at the address of the empty string, which is a zero byte, stays as is.

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

I realy must have missed 9.3.9..
I thought I had it but there is only a 9.3.8 to be found..

9.3.9 must have been a ghost release when we move to 9.3.10 ;-)
-- (define? (Cornflakes))

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

I think the list index out of bounds was discussed at length here and is also mentioned in the 9.3 release notes. Not sure about strings...

I had to fix a bit of my code too. I sympathize!

Locked