bug? about float point number lexical analysis

psilwen · Post by **psilwen** » Wed Oct 01, 2014 10:02 am

I want convert big float point number to integer.

> (int 999999999999999999999999999999999999999999.99)
9223372036854775807

I tried

Code: Select all

> (bigint 999999999999999999999999999999999999999999.99)
1000000000000000045259160000000000000000000L

because of

When converting from floating point, rounding errors occur going back and forth between decimal and binary arithmetic.

And

Code: Select all

> (bigint "1234567890123456789012345678901234567890.123456789")
1234567890123456789012345678901234567890L

this is what i expected

it means that needs to be converted to a string first

But,

Code: Select all

> (bigint (string 1234567890123456789012345678901234567890.123456789))
1L

> (string 1234567890123456789012345678901234567890.123456789)
"1.23456789012346e+039123456789"

not "1234567890123456789012345678901234567890.123456789"

and also, the result is incorrect!

Distinctly, literal 1234567890123456789012345678901234567890.123456789 to be parsed as two parts "1234567890123456789012345678901234567890." and "123456789"

Code: Select all

> 1234567890123456789012345678901234567890.123456789
1.23456789012346e+039
123456789

More tests

Code: Select all

> (setq d 1234567890123456789012345678901234567890.123456789)

ERR: missing argument in function setf

Code: Select all

> 100000000000000000.9876543210123456789
1e+017
456789

> (length "100000000000000000.9876543210123")
32

> 1000000000000000000.9876543210123456789
1e+018
3456789
> 10000000000000000000.9876543210123456789
1e+019
23456789
> 100000000000000000000.9876543210123456789
1e+020
123456789
> 1000000000000000000000.9876543210123456789
1e+021
342391             <----- Where it comes
89
> 10000000000000000000000.9876543210123456789
1e+022
10123456789
> 100000000000000000000000.9876543210123456789
1e+023
210123456789
> 1000000000000000000000000.9876543210123456789
1e+024
3210123456789
> 10000000000000000000000000.9876543210123456789
1e+025
43210123456789
> 100000000000000000000000000.9876543210123456789
1e+026
543210123456789
> 1000000000000000000000000000.9876543210123456789
1e+027
6543210123456789
> 10000000000000000000000000000.9876543210123456789
1e+028
76543210123456789
> 100000000000000000000000000000.9876543210123456789
1e+029
876543210123456789
> 1000000000000000000000000000000.9876543210123456789
1e+030
9876543210123456789L

> 10.1234567890123456789012345678901234567890
10.1234567890123
342391                  <----- Where it comes
890

Does considered support bigdecimal feature?

Lutz · Post by **Lutz** » Wed Oct 01, 2014 5:18 pm

Code: Select all

(bigint "1234567890123456789012345678901234567890.123456789”)

and:

Code: Select all

(bigint (string 1234567890123456789012345678901234567890.123456789))

are not the same. When the large decimal-point number is parsed it is converted to floating point IEE 754 with max 16 digits of precision:

Code: Select all

; in version up to 10.6.1
> 1234567890123456789012345678901234567890.123456789
1.23456789012346e+39
123456789
> 

> (setq d 1234567890123456789012345678901234567890.123456789)

ERR: missing argument in function setf

up to and including v.10.6.1, only 32 characters are parsed for decimal number including a potential sign and the decimal point. The rest of the source will be parsed as a different number, also causing error for the setq statement syntax. In version 10.6.2 up to 255 characters will be parsed in decimal numbers:

Code: Select all

; in version 10.6.2 and after (in progress)
> 1234567890123456789012345678901234567890.123456789
1.23456789012346e+39

> (setq d 1234567890123456789012345678901234567890.123456789)
1.23456789012346e+39
> 
>

This float now gets converted to a string and that string parsed by bigint. Note that Python does the same conversion when parsing code:

Code: Select all

>>> str(1234567890123456789012345678901234567890.123456789)
'1.23456789012e+39'
>>>

When bigint parses a string it expects integer numbers and will stop parsing at any other character like the decimal point.

Code: Select all

> (bigint "1.23456789012346e+39")
1L
>

rickyboy · Post by **rickyboy** » Wed Oct 01, 2014 6:42 pm

Thank you, psilwen and Lutz!

psilwen · Post by **psilwen** » Fri Oct 03, 2014 2:10 am

only 32 characters are parsed for decimal number including a potential sign and the decimal point.

Code: Select all

> (length "1234567890123456789012345678901234567890.0")
42
> 1234567890123456789012345678901234567890.0
1.23456789012346e+039
0

It actually stops parsing at the first non-numeric department.

The rest of the source will be parsed as a different number

This strategy has the potential problems, it is easy to confusing.

Reported it as an error might be better.

At least we knows the error, rather than face the wrong results puzzled.

Code: Select all

> (/   1234567890123456789012345678901234567890     123456)
10000063910409026608770296128995225L
> (/   1234567890123456789012345678901234567890.0   123456)

ERR: division by zero in function /

> (div 1234567890123456789012345678901234567890     123456)
1.0000063910409e+034
> (div 1234567890123456789012345678901234567890.0   123456)
inf
> (div 2   0)
inf

Code: Select all

> (%   10000000000000000000000000000000000000000000000.0 10)

ERR: division by zero in function %
> (mod 10000000000000000000000000000000000000000000000.0 10)
nan
> (%   10000000000000000000000000000000000000000000000 10)
0L
> (mod 10000000000000000000000000000000000000000000000 10)
8

rickyboy · Post by **rickyboy** » Fri Oct 03, 2014 3:24 am

Hello psilwen,

Most of these issues you raise are not issues at all in version 10.6.2. I recommend you download and build that version on your machine and repeat these examples.

I've done just that and I only recall this one example still being an issue.

Code: Select all

>$ ./newlisp
newLISP v.10.6.2 64-bit on BSD IPv4/6 UTF-8, options: newlisp -h

> (mod 10000000000000000000000000000000000000000000000 10)
8

I think it should evaluate to 0.

This example yields the same result in 10.6.2 as in the version you are using.

Code: Select all

> (div 2   0)
inf

However, I believe it to be the correct behavior. I actually have something like the following in some of my old code.

Code: Select all

(define inf (div 1 0))

It's convenient to have a symbol that evaluates to "high values."

I hope this helps, and thank you very much for taking the time to check all of this (and making newLISP better).

psilwen · Post by **psilwen** » Fri Oct 03, 2014 3:53 am

In version 10.6.2 up to 255 characters will be parsed in decimal numbers

This can not completely solve the problem.

Core of these issues is not the right way to parse.

When the number exceeds 255 characters, the same issue will appear again.

rickyboy · Post by **rickyboy** » Fri Oct 03, 2014 4:29 am

psilwen,

Are you suggesting that newLISP allow the user to enter a number of arbitrary length (as input) and then parse and store the internal representation of the number (as an exact representation of the input, i.e. arbitrarily large)?

AFAIK, no programming language allows this. There are always limits. Life is about dealing with "scarce" resources, and one of the major issues of software design is how to deal with that scarcity, while still meeting the goals you have in mind.

Given that, what do you propose should be the design for entering and storing numbers in newLISP? Curious.

Lutz · Post by **Lutz** » Fri Oct 03, 2014 4:37 am

I wrote the following while rickyboy was posting at same time.

Many of these problems will disappear when allowing more longer numbers when parsing. In the following example the second number with a decimal point will not be broken up in 10.6.2:

Code: Select all

; correct in all versions
> (div 1234567890123456789012345678901234567890     123456)
1.0000063910409e+34

; in 10.6.0/1 number splits at point second arg gets 0
> (div 1234567890123456789012345678901234567890.0     123456)
inf

; in 10.6.2 same result when the decimal point is present
> (div 1234567890123456789012345678901234567890.0     123456)
1.0000063910409e+34
>

when using the integer division operator ‘/‘ the following happens:

Code: Select all

; correct in all versions
>  (/   1234567890123456789012345678901234567890     123456)
10000063910409026608770296128995225L

; in 10.6.1 number splits
> (/   1234567890123456789012345678901234567890.0   123456)
ERR: division by zero in function /

; in 10.6.2 expected result after integer overflow
>  (/   1234567890123456789012345678901234567890.0     123456)
74709791641190

In both cases the integer operator translates the first operand into the biggest 64bit signed integer 9223372036854775807 and then does the division.

In 10.6.0/1 the number splits and the second operand is a 0 causing the ‘div by zero’ error. In 10.6.2 the correct result 74709791641190 is displayed.

The following example:

Code: Select all

; in all versions mod forces conversion to floats
(mod 10000000000000000000000000000000000000000000000 10)
8
; 10.6.0/1 big number gets split
> (mod 10000000000000000000000000000000000000000000000.0 10)
nan

; 10.6.2 rounding error because of limited precision float conversion
> (mod 10000000000000000000000000000000000000000000000.0 10)
8
>

The floating point operator will transform the first operand into a float with max 16 digits precision and the 8 is a rounding error when converting to IEEE 754 double floats. The same happens in other languages, e.g. Python

Code: Select all

; in Python same rounding error because of limited precision float conversion
>>> 10000000000000000000000000000000000000000000000.0 % 10
8.0

instead of the correct 0 zero.

Some final comments:
====================

All of these examples are constructed test cases using number literals/constants, which in practice never have occurred, except in a post a few days ago where a Pi constant with about 50 digits after the decimal point was used. Except for that code I have never seen this kind of problem in the real world.

newLISP chooses to keep integer and float arithmetic apart. Integer and float operators implicitly convert there arguments causing integer overflows or floating point conversion rounding errors. newLISP also separates bigint from normal 64-bit int arithmetic. Both of these separations are done on purpose and convenient when using newLISP in embedded systems when interacting with hardware registers or when doing integer arithmetic in other domains which are inherently of integer type.

Many of the problems will disappear when allowing higher precision decimal and float numbers while parsing, but the fact the these numbers will be converted to IEEE 754 double floats, will stay and cause some of the effects shown. These effects can be shown in any programming language using floats.

Ps: about the 32 decimal digit limit: The 32-length limit is not tested until the first non-digit character. I misspoke in my previous post.

Ps: about (div 2 0) => inf, this is part of IEE 754 compliance. See also the file
newlisp-10.x.x/qa-specific-tests/qa-float which tests for many of the IEE compliance features.

psilwen · Post by **psilwen** » Sat Oct 04, 2014 2:38 am

rickyboy wrote: Are you suggesting that newLISP allow the user to enter a number of arbitrary length (as input) and then parse and store the internal representation of the number (as an exact representation of the input, i.e. arbitrarily large)?

AFAIK, no programming language allows this. There are always limits.

I certainly know that.

What I mean is

Code: Select all

> newlisp
newLISP v.10.6.2 32-bit on Win32 IPv4/6 UTF-8, options: newlisp -h

> 12345678901234567890123456789012345678909999999999999999999999999999999999999
9999999999999999999999999999999999999999999999999999999999999999999999999999999
9999999999999999999999999999999999999999999999999999999999999999999999999999999
999999999999999999.123456789
1.23456789012346e+255
123456789

> (string 123456789012345678901234567890123456789099999999999999999999999999999
9999999999999999999999999999999999999999999999999999999999999999999999999999999
9999999999999999999999999999999999999999999999999999999999999999999999999999999
99999999999999999999999999.123456789)
"1.23456789012346e+255123456789"

This result is very weird, counterintuitive.

Users never expecting completely legal a single decimal number (it only consists of digits and a single dot) is parsed into multiple number.

The limits from 32 extended to 255, just cover up the issue, but does not really solve it.

In Python

Code: Select all

>>> a = 111111111122222222223333333333444444444455555555556666666666777777777788888888889999999999
        111111111122222222223333333333444444444455555555556666666666777777777788888888889999999999
        111111111122222222223333333333444444444455555555556666666666777777777788888888889999999999
        111111111122222222223333333333444444444.987654321
>>> a
1.1111111112222222e+308

>>> str(a)
'1.11111111122e+308'

When the number exceeds the limit

Code: Select all

>>> b = 111111111122222222223333333333444444444455555555556666666666777777777788888888889999999999
        111111111122222222223333333333444444444455555555556666666666777777777788888888889999999999
        111111111122222222223333333333444444444455555555556666666666777777777788888888889999999999
        1111111111222222222233333333334444444445.987654321
>>> b
inf

it evaluated to inf, not two values 1.1111111112222222e+308 and 987654321.

AND

Code: Select all

>>> str(b)
'inf'

it evaluated to 'inf', not the string '1.1111111112222222e+308987654321'.

Python's behavior is reasonable, newLISP is not. I think.

This is my opinion.

Thank you for attention.

Lutz · Post by **Lutz** » Sat Oct 04, 2014 2:11 pm

Overengineering and costly for code size and speed. newLISP would be slower and much bigger when designed in this philosophy. I doubt the splitting of large floats would ever occur with a 255(*) size limit.

(*) now 1000: http://www.newlisp.org/downloads/develo ... 10.6.2.txt

newlispfanclub.alh.net

bug? about float point number lexical analysis

bug? about float point number lexical analysis

Re: bug? about float point number lexical analysis

Re: bug? about float point number lexical analysis

Re: bug? about float point number lexical analysis

Re: bug? about float point number lexical analysis

Re: bug? about float point number lexical analysis

Re: bug? about float point number lexical analysis

Re: bug? about float point number lexical analysis

Re: bug? about float point number lexical analysis

Re: bug? about float point number lexical analysis