bug? about float point number lexical analysis

Q&A's, tips, howto's

bug? about float point number lexical analysis

I want convert big float point number to integer.
Code: Select all
`> (int 999999999999999999999999999999999999999999.99)9223372036854775807`

I tried
Code: Select all
`> (bigint 999999999999999999999999999999999999999999.99)1000000000000000045259160000000000000000000L`

because of
When converting from floating point, rounding errors occur going back and forth between decimal and binary arithmetic.

And

Code: Select all
`> (bigint "1234567890123456789012345678901234567890.123456789")1234567890123456789012345678901234567890L`

this is what i expected

it means that needs to be converted to a string first

But,
Code: Select all
`> (bigint (string 1234567890123456789012345678901234567890.123456789))1L> (string 1234567890123456789012345678901234567890.123456789)"1.23456789012346e+039123456789"`

not "1234567890123456789012345678901234567890.123456789"

and also, the result is incorrect!

Distinctly, literal 1234567890123456789012345678901234567890.123456789 to be parsed as two parts "1234567890123456789012345678901234567890." and "123456789"

Code: Select all
`> 1234567890123456789012345678901234567890.1234567891.23456789012346e+039123456789`

More tests
Code: Select all
`> (setq d 1234567890123456789012345678901234567890.123456789)ERR: missing argument in function setf`

Code: Select all
`> 100000000000000000.98765432101234567891e+017456789> (length "100000000000000000.9876543210123")32> 1000000000000000000.98765432101234567891e+0183456789> 10000000000000000000.98765432101234567891e+01923456789> 100000000000000000000.98765432101234567891e+020123456789> 1000000000000000000000.98765432101234567891e+021342391             <----- Where it comes89> 10000000000000000000000.98765432101234567891e+02210123456789> 100000000000000000000000.98765432101234567891e+023210123456789> 1000000000000000000000000.98765432101234567891e+0243210123456789> 10000000000000000000000000.98765432101234567891e+02543210123456789> 100000000000000000000000000.98765432101234567891e+026543210123456789> 1000000000000000000000000000.98765432101234567891e+0276543210123456789> 10000000000000000000000000000.98765432101234567891e+02876543210123456789> 100000000000000000000000000000.98765432101234567891e+029876543210123456789> 1000000000000000000000000000000.98765432101234567891e+0309876543210123456789L> 10.123456789012345678901234567890123456789010.1234567890123342391                  <----- Where it comes890`

Does considered support bigdecimal feature?
(reverse "newlisp")
psilwen

Posts: 21
Joined: Thu Jul 03, 2014 5:25 am

Re: bug? about float point number lexical analysis

Code: Select all
`(bigint "1234567890123456789012345678901234567890.123456789”)`

and:
Code: Select all
`(bigint (string 1234567890123456789012345678901234567890.123456789))`

are not the same. When the large decimal-point number is parsed it is converted to floating point IEE 754 with max 16 digits of precision:

Code: Select all
`; in version up to 10.6.1> 1234567890123456789012345678901234567890.1234567891.23456789012346e+39123456789> > (setq d 1234567890123456789012345678901234567890.123456789)ERR: missing argument in function setf`

up to and including v.10.6.1, only 32 characters are parsed for decimal number including a potential sign and the decimal point. The rest of the source will be parsed as a different number, also causing error for the setq statement syntax. In version 10.6.2 up to 255 characters will be parsed in decimal numbers:

Code: Select all
`; in version 10.6.2 and after (in progress)> 1234567890123456789012345678901234567890.1234567891.23456789012346e+39> (setq d 1234567890123456789012345678901234567890.123456789)1.23456789012346e+39> > `

This float now gets converted to a string and that string parsed by bigint. Note that Python does the same conversion when parsing code:

Code: Select all
`>>> str(1234567890123456789012345678901234567890.123456789)'1.23456789012e+39'>>> `

When bigint parses a string it expects integer numbers and will stop parsing at any other character like the decimal point.
Code: Select all
`> (bigint "1.23456789012346e+39")1L> `
Lutz

Posts: 5276
Joined: Thu Sep 26, 2002 4:45 pm

Re: bug? about float point number lexical analysis

Thank you, psilwen and Lutz!
(λx. x x) (λx. x x)
rickyboy

Posts: 582
Joined: Fri Apr 08, 2005 7:13 pm
Location: Front Royal, Virginia

Re: bug? about float point number lexical analysis

only 32 characters are parsed for decimal number including a potential sign and the decimal point.

Code: Select all
`> (length "1234567890123456789012345678901234567890.0")42> 1234567890123456789012345678901234567890.01.23456789012346e+0390`

It actually stops parsing at the first non-numeric department.

The rest of the source will be parsed as a different number

This strategy has the potential problems, it is easy to confusing.

Reported it as an error might be better.

At least we knows the error, rather than face the wrong results puzzled.

Code: Select all
`> (/   1234567890123456789012345678901234567890     123456)10000063910409026608770296128995225L> (/   1234567890123456789012345678901234567890.0   123456)ERR: division by zero in function /> (div 1234567890123456789012345678901234567890     123456)1.0000063910409e+034> (div 1234567890123456789012345678901234567890.0   123456)inf> (div 2   0)inf`

Code: Select all
`> (%   10000000000000000000000000000000000000000000000.0 10)ERR: division by zero in function %> (mod 10000000000000000000000000000000000000000000000.0 10)nan> (%   10000000000000000000000000000000000000000000000 10)0L> (mod 10000000000000000000000000000000000000000000000 10)8`
(reverse "newlisp")
psilwen

Posts: 21
Joined: Thu Jul 03, 2014 5:25 am

Re: bug? about float point number lexical analysis

Hello psilwen,

Most of these issues you raise are not issues at all in version 10.6.2. I recommend you download and build that version on your machine and repeat these examples.

I've done just that and I only recall this one example still being an issue.

Code: Select all
`>\$ ./newlispnewLISP v.10.6.2 64-bit on BSD IPv4/6 UTF-8, options: newlisp -h> (mod 10000000000000000000000000000000000000000000000 10)8`

I think it should evaluate to 0.

This example yields the same result in 10.6.2 as in the version you are using.

Code: Select all
`> (div 2   0)inf`

However, I believe it to be the correct behavior. I actually have something like the following in some of my old code.

Code: Select all
`(define inf (div 1 0))`

It's convenient to have a symbol that evaluates to "high values."

I hope this helps, and thank you very much for taking the time to check all of this (and making newLISP better).
(λx. x x) (λx. x x)
rickyboy

Posts: 582
Joined: Fri Apr 08, 2005 7:13 pm
Location: Front Royal, Virginia

Re: bug? about float point number lexical analysis

In version 10.6.2 up to 255 characters will be parsed in decimal numbers

This can not completely solve the problem.

Core of these issues is not the right way to parse.

When the number exceeds 255 characters, the same issue will appear again.
(reverse "newlisp")
psilwen

Posts: 21
Joined: Thu Jul 03, 2014 5:25 am

Re: bug? about float point number lexical analysis

psilwen,

Are you suggesting that newLISP allow the user to enter a number of arbitrary length (as input) and then parse and store the internal representation of the number (as an exact representation of the input, i.e. arbitrarily large)?

AFAIK, no programming language allows this. There are always limits. Life is about dealing with "scarce" resources, and one of the major issues of software design is how to deal with that scarcity, while still meeting the goals you have in mind.

Given that, what do you propose should be the design for entering and storing numbers in newLISP? Curious.
(λx. x x) (λx. x x)
rickyboy

Posts: 582
Joined: Fri Apr 08, 2005 7:13 pm
Location: Front Royal, Virginia

Re: bug? about float point number lexical analysis

I wrote the following while rickyboy was posting at same time.

Many of these problems will disappear when allowing more longer numbers when parsing. In the following example the second number with a decimal point will not be broken up in 10.6.2:

Code: Select all
`; correct in all versions> (div 1234567890123456789012345678901234567890     123456)1.0000063910409e+34; in 10.6.0/1 number splits at point second arg gets 0> (div 1234567890123456789012345678901234567890.0     123456)inf; in 10.6.2 same result when the decimal point is present> (div 1234567890123456789012345678901234567890.0     123456)1.0000063910409e+34> `

when using the integer division operator ‘/‘ the following happens:

Code: Select all
`; correct in all versions>  (/   1234567890123456789012345678901234567890     123456)10000063910409026608770296128995225L; in 10.6.1 number splits> (/   1234567890123456789012345678901234567890.0   123456)ERR: division by zero in function /; in 10.6.2 expected result after integer overflow>  (/   1234567890123456789012345678901234567890.0     123456)74709791641190`

In both cases the integer operator translates the first operand into the biggest 64bit signed integer 9223372036854775807 and then does the division.

In 10.6.0/1 the number splits and the second operand is a 0 causing the ‘div by zero’ error. In 10.6.2 the correct result 74709791641190 is displayed.

The following example:

Code: Select all
`; in all versions mod forces conversion to floats(mod 10000000000000000000000000000000000000000000000 10)8; 10.6.0/1 big number gets split> (mod 10000000000000000000000000000000000000000000000.0 10)nan; 10.6.2 rounding error because of limited precision float conversion> (mod 10000000000000000000000000000000000000000000000.0 10)8> `

The floating point operator will transform the first operand into a float with max 16 digits precision and the 8 is a rounding error when converting to IEEE 754 double floats. The same happens in other languages, e.g. Python

Code: Select all
`; in Python same rounding error because of limited precision float conversion>>> 10000000000000000000000000000000000000000000000.0 % 108.0`

instead of the correct 0 zero.

====================

All of these examples are constructed test cases using number literals/constants, which in practice never have occurred, except in a post a few days ago where a Pi constant with about 50 digits after the decimal point was used. Except for that code I have never seen this kind of problem in the real world.

newLISP chooses to keep integer and float arithmetic apart. Integer and float operators implicitly convert there arguments causing integer overflows or floating point conversion rounding errors. newLISP also separates bigint from normal 64-bit int arithmetic. Both of these separations are done on purpose and convenient when using newLISP in embedded systems when interacting with hardware registers or when doing integer arithmetic in other domains which are inherently of integer type.

Many of the problems will disappear when allowing higher precision decimal and float numbers while parsing, but the fact the these numbers will be converted to IEEE 754 double floats, will stay and cause some of the effects shown. These effects can be shown in any programming language using floats.

Ps: about the 32 decimal digit limit: The 32-length limit is not tested until the first non-digit character. I misspoke in my previous post.

Ps: about (div 2 0) => inf, this is part of IEE 754 compliance. See also the file
newlisp-10.x.x/qa-specific-tests/qa-float which tests for many of the IEE compliance features.
Lutz

Posts: 5276
Joined: Thu Sep 26, 2002 4:45 pm

Re: bug? about float point number lexical analysis

rickyboy wrote:Are you suggesting that newLISP allow the user to enter a number of arbitrary length (as input) and then parse and store the internal representation of the number (as an exact representation of the input, i.e. arbitrarily large)?

AFAIK, no programming language allows this. There are always limits.

I certainly know that.

What I mean is

Code: Select all
`> newlispnewLISP v.10.6.2 32-bit on Win32 IPv4/6 UTF-8, options: newlisp -h> 1234567890123456789012345678901234567890999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999.1234567891.23456789012346e+255123456789> (string 1234567890123456789012345678901234567890999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999.123456789)"1.23456789012346e+255123456789"`

This result is very weird, counterintuitive.

Users never expecting completely legal a single decimal number (it only consists of digits and a single dot) is parsed into multiple number.

The limits from 32 extended to 255, just cover up the issue, but does not really solve it.

In Python

Code: Select all
`>>> a = 111111111122222222223333333333444444444455555555556666666666777777777788888888889999999999        111111111122222222223333333333444444444455555555556666666666777777777788888888889999999999        111111111122222222223333333333444444444455555555556666666666777777777788888888889999999999        111111111122222222223333333333444444444.987654321>>> a1.1111111112222222e+308>>> str(a)'1.11111111122e+308'`

When the number exceeds the limit

Code: Select all
`>>> b = 111111111122222222223333333333444444444455555555556666666666777777777788888888889999999999        111111111122222222223333333333444444444455555555556666666666777777777788888888889999999999        111111111122222222223333333333444444444455555555556666666666777777777788888888889999999999        1111111111222222222233333333334444444445.987654321>>> binf`

it evaluated to inf, not two values 1.1111111112222222e+308 and 987654321.

AND

Code: Select all
`>>> str(b)'inf'`

it evaluated to 'inf', not the string '1.1111111112222222e+308987654321'.

Python's behavior is reasonable, newLISP is not. I think.

This is my opinion.

Thank you for attention.
(reverse "newlisp")
psilwen

Posts: 21
Joined: Thu Jul 03, 2014 5:25 am

Re: bug? about float point number lexical analysis

Overengineering and costly for code size and speed. newLISP would be slower and much bigger when designed in this philosophy. I doubt the splitting of large floats would ever occur with a 255(*) size limit.