integer from string

Sammo · Post by **Sammo** » Wed Feb 11, 2004 6:12 pm

examples 2) and 4) seem counter-intuitive

1) (integer "9") --> 9
2) (integer "09") --> 0
3) (integer "-9") --> -9
4) (integer "-09") --> 0

nigelbrown · Post by **nigelbrown** » Wed Feb 11, 2004 8:39 pm

Leading 0's should be OK as it keeps the symmetry of reading what you write with %05d where the 0 says pad with leading zeros - this padding is used in the txt2pdf code for example
viz
> (format "%05d" -1)
"-0001"
> (integer (format "%05d" -1))
-1
>

However the 9 seems to be broken viz
> (integer "01")
1
> (integer "09")
0
>
and further
> (dotimes (i 20) (print "<" (integer (format "%05d" (integer i))) ">"))
<0><1><2><3><4><5><6><7><0><0><8><9><10><11><12><13><14><15><1><1>">"
>
while
> (dotimes (i 20) (print "<" (integer i) ">"))
<0><1><2><3><4><5><6><7><8><9><10><11><12><13><14><15><16><17><18><19>">"
>
Lutz?

Nigel

Sammo · Post by **Sammo** » Wed Feb 11, 2004 9:54 pm

Aha! It's looking like the bite of octal formatting described under "Evaluating newLISP Expressions" in which octal values are described as being prefixed by 0 (zero). Since 8 and 9 aren't valid octal digits, (integer "09") would be interpreted much as (integer "15b") --> 15 would be in the decimal base. And then it also makes sense that (integer "010") --> 8.

Thanks, Nigel, for the insight!

Lutz · Post by **Lutz** » Wed Feb 11, 2004 11:49 pm

the 'C' function strtol() is used for conversion, it also takes hex i.e:

(integer "0xff") => 255

Values bigger than the maximum integer go to the maximum integer:

(integer 10e20) => 2147483647

or minimum integer:

(integer -10e20) => -2147483648

Lutz
ps: all this is also mentioned in the manual

nigelbrown · Post by **nigelbrown** » Thu Feb 12, 2004 1:14 am

Yes, it is all clear in the manual, apologies to Lutz for the 'broken' comment.

Perhaps an (integer form could be (integer string defaultexpr base) to be as flexible as the real strtol() as in strtol the parameter base allows forcing "0009" to be 9 by specifying base as 10 viz the C code

#include <stdlib.h>
#include <stdio.h>
int main() {
char number[] = { '0', '0','0','9' ,' '};

printf( "base is 0 -> %ld \n" , strtol(number, NULL,0) );
printf( "base is 10 -> %ld \n" , strtol(number, NULL,10) );
return(0);
}

prints
base is 0 -> 0
base is 10 -> 9

Nigel

PS I had to look up the details of strtol:
strtol() and strtoll()
The strtol() function converts the initial portion of the
string pointed to by str to a type long int representation.

The strtoll() function converts the initial portion of the
string pointed to by str to a type long long representation.

Both functions first decompose the input string into three
parts: an initial, possibly empty, sequence of white-space
characters (as specified by isspace(3C)); a subject sequence
interpreted as an integer represented in some radix deter-
mined by the value of base; and a final string of one or
more unrecognized characters, including the terminating null
byte of the input string. They then attempt to convert the
subject sequence to an integer and return the result.

If the value of base is 0, the expected form of the subject
sequence is that of a decimal constant, octal constant or
hexadecimal constant, any of which may be preceded by a + or
- sign. A decimal constant begins with a non-zero digit, and
consists of a sequence of decimal digits. An octal constant
consists of the prefix 0 optionally followed by a sequence
of the digits 0 to 7 only. A hexadecimal constant consists
of the prefix 0x or 0X followed by a sequence of the decimal
digits and letters a (or A) to f (or F) with values 10 to 15
respectively.

If the value of base is between 2 and 36, the expected form
of the subject sequence is a sequence of letters and digits
representing an integer with the radix specified by base,
optionally preceded by a + or - sign. The letters from a (or
A) to z (or Z) inclusive are ascribed the values 10 to 35;
only letters whose ascribed values are less than that of
base are permitted. If the value of base is 16, the charac-
ters 0x or 0X may optionally precede the sequence of letters
and digits, following the sign if present.

The subject sequence is defined as the longest initial
subsequence of the input string, starting with the first
non-white-space character, that is of the expected form. The
subject sequence contains no characters if the input string
is empty or consists entirely of white-space characters, or
if the first non-white-space character is other than a sign
or a permissible letter or digit.

If the subject sequence has the expected form and the value
of base is 0, the sequence of characters starting with the
first digit is interpreted as an integer constant. If the
subject sequence has the expected form and the value of base
is between 2 and 36, it is used as the base for conversion,
ascribing to each letter its value as given above. If the
subject sequence begins with a minus sign, the value result-
ing from the conversion is negated. A pointer to the final
string is stored in the object pointed to by endptr, pro-
vided that endptr is not a null pointer.

In other than the POSIX locale, additional implementation-
dependent subject sequence forms may be accepted.

If the subject sequence is empty or does not have the
expected form, no conversion is performed; the value of str
is stored in the object pointed to by endptr, provided that
endptr is not a null pointer.

Lutz · Post by **Lutz** » Thu Feb 12, 2004 3:59 pm

With just a line or two of code I could add the additional parameter for the number base in 'integer':

(integer "1111" 0 2) => 15

turns out you still have to specify at least a numericcal digit or 0x before the numbers:

(integer "ff" "won't work" 16) => "won't work"

(integer "0xff" "won't work" 16) => 255
(integer "0ff" "won't work" 16) => 255

but I think its Ok , it also makes sense, because if not you would take text far to often as a number. So the rule: "must start with +/- or a digit" always applies.

and its fine for octals without the leading 0:

(integer "77" 0 8) => 63

Also regarding (seek 0). After compiling on Linux it turns out, that it also returns a '-1'. So far only on BSD 'ftell(stdout)' will report the number of characters printed. Haven't tried Solaris and Mac OSX (BSD based) yet.

Reading the section in the GNU libC manual about seek/ftell, I wonder why it wouldn't work on LINUX.

Lutz

nigelbrown · Post by **nigelbrown** » Sun Mar 14, 2004 11:52 pm

As part of an infix evaluator I'm doing I revisited the newLISP conversion fns comparing them to the underlying C (such as strtol) -
strtol can optionally return the remaining unevaluated portion of the string.
Currently when you do (integer or (float you don't get the unconverted string fragment (in say $0)

You can do this with parse viz:

> (define (myfloat s) (begin (setq $0 (join (rest (parse s)))) (float s)))
(lambda (s)
(begin
(setq $0 (join (rest (parse s))))
(float s)))
> (myfloat "1.3D+3")
1.3
> $0
"D+3"
> (myfloat "1.3D+3/1.2*7")
1.3
> $0
"D+3/1.2*7"
>

I don't know what people think about making this a feature of (float (integer - I'm happy with (myfloat but I thought I'd float the idea (no pun intended).

Lutz · Post by **Lutz** » Mon Mar 15, 2004 1:21 am

I just tried it (was only one additional line of code) but it more than doubles the time the function needs, from 468 to 1000 micro seconds, for a feature used relatively infrequent, so I didn't do it.

Lutz