qa-float crash

Machine-specific discussion
Unix, Linux, OS X, OS/2, Windows, ..?
newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

I could cheet by putting a SIGNAL trigger like sqrt(-1); inside the C code.
But thats not what I would like to see, also not sure if the stacks are
in sync...
-- (define? (Cornflakes))

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

I dont see any workable way currently without cheeting on the SIGFPE
.. perhpas you have an extra clue ?

This is how the SIGFPE adjustment to 10.1.6 now looks ->


this is inside setupallsignals ->

Code: Select all

#ifdef OS2
setupSignalHandler(SIGFPE, signal_handler);
/* force a SIGFPE trigger when newlisp starts */
/* this is to activate the NaN Inf returns!   */
(sqrt (-1));
/**********************************************/
this is inside the signal_handler

Code: Select all

#ifdef OS2
     /* SIGFPE must be forced for a NaN Inf */
	/* the longjmp returns 1 to setjmp when set */
	case SIGFPE:
		longjmp(errorJump,errorReg);
		break;
#endif

the output of qa-float is this ->

Code: Select all

operation on NaN result in NaN                 
-----------------------------------------------
                     (NaN? (mul 1 aNan)) => true
                     (NaN? (div 1 aNan)) => true
                     (NaN? (add 1 aNan)) => true
                     (NaN? (sub 1 aNan)) => true
                       (NaN? (sin aNan)) => true
                       (NaN? (cos aNan)) => true
                       (NaN? (tan aNan)) => true
                      (NaN? (atan aNan)) => true

comparison with NaN is always nil              
-----------------------------------------------
                        (not (<1> true
                        (not (> 1 aNan)) => true
                       (not (>= 1 aNan)) => true
                       (not (<1> true
                     (not (= aNan aNan)) => true

NaN is not equal to itself                     
-----------------------------------------------
                     (not (= aNan aNan)) => true

integer operations assume NaN as 0             
-----------------------------------------------
                        (= (- 1 aNan) 1) => true
                        (= (+ 1 aNan) 1) => true
                        (= (* 1 aNan) 0) => true
         (not (catch (/ 1 aNan) 'error)) => true
                         (= (>> aNan) 0) => true
                         (= (<<aNan> true

integer operations assume inf as max-int       
-----------------------------------------------
      (= (* 1 aInf) 9223372036854775807) => true
      (= (- aInf 1) 9223372036854775806) => true
     (= (+ aInf 1) -9223372036854775808) => true

FP division by inf results in 0                
-----------------------------------------------
                        (= (/ 1 aInf) 0) => true
                      (= (div 1 aInf) 0) => true

inf specials                                   
-----------------------------------------------
                           (= aInf aInf) => true
                  (NaN? (sub aInf aInf)) => true

retain sign of -0.0                            
-----------------------------------------------
        (= (set 'tiny (div -1 aInf)) -0) => true
                      (= (sqrt tiny) -0) => true

inf is signed too                              
-----------------------------------------------
                  (= aNegInf (div -1 0)) => true
                  (!= aNegInf (div 1 0)) => true

mod with 0 divisor is NaN                      
-----------------------------------------------
                       (NaN? (mod 10 0)) => true

% with 0 divisor throws error                  
-----------------------------------------------
           (not (catch (% 10 0) 'error)) => true

support of subnormals: (0 4.940656458e-324) => (0 4.940656458e-324)
machine epsilon: 1.110223025e-16 => 1.110223025e-16

-- (define? (Cornflakes))

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

forgot the extra setjmp, this too i added..the extra errorReg = setjmp(errorJump); call.


Code: Select all

if((errorReg = setjmp(errorJump)) != 0) 
    {
    if(errorReg && (errorEvent != nilSymbol) ) 
        executeSymbol(errorEvent, NULL, NULL);
    else exit(-1);
    goto AFTER_ERROR_ENTRY;
    }

errorReg = setjmp(errorJump);
setupAllSignals();

-- (define? (Cornflakes))

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

Code: Select all

#ifdef OS2
     /* SIGFPE must be forced for a NaN Inf */
   /* the longjmp returns 1 to setjmp when set */
   case SIGFPE:
      longjmp(errorJump,errorReg);
      break;
#endif
The setjmp() will return only 1 if errorReg was 1, but on program start and after reset it is set to 0, and I think 0 is, what setjmp() when doing the longjmp(). If it would make setjmp() return a 1, then we would see "Not enough memory" reported as error, which is defined as 1.

Can you try this?

Code: Select all

#ifdef OS2
   case SIGFPE:
      longjmp(errorJump,0);
      break;
#endif
I believe it also will work.

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

No that results in the same "double" effect..
Also when moving it to 1 its of no use, there is always a mismatch
in the jmp_buf content.

How about sigsetjmp and siglongjmp and sigset_buf ?

This is how I see the flow in newlisp now with the sigfpe involved,
correct me here if im wrong ;-) only helps finding the itch...

Code: Select all


main()
   |
errorReg = 0
setjmp(errorJump) 
   |
setupAllsignals init (NOT SIGFPE, because its only triggered on exception)
   |
(sqrt -1)  (on the newlisp console)
   |
SIGFPE trigger with errorReg = 0 (from the first fresh init)
longJump(errorJump,errorReg)  (initial stack with errorReg = 0)
   |
errorReg = 1 (is always 1 when returns from LongJump!)
setjmp(erroJump) != 0
(no return on console (sqrt -1) because errorReg is now 1 which is a NEW stack)
   |
errorReg = setjmp(errorJump)  (is now 1 because of longjmp)
setupAllSignals (no trigger for SIGFPE)
   |
(sqrt -1)
   |
SIGFPE trigger with errorReg = 1 (new errorReg value from previous setjmp)
  |
(return "nan") (because the jmp_buf stack is now in sync)

-- (define? (Cornflakes))

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

this is what i mean, except in newlisp It stays on 1..it seems...
Is there extra memory management done somewhere?


Code: Select all

#include <stdio>
#include <float>
#include <signal>
#include <math>
#include <setjmp>

/* testing NaN and Inf return */


/* store stack */
jmp_buf errorJump;

int errorReg = 0;

void signal_handler(int sig)
{
	/* init */
	signal(SIGFPE, signal_handler); 

switch(sig)
	{
	case SIGFPE: 
/*	signal(SIGFPE,SIG_DFL); */
	printf("%s", "SIGFPE!\n"); 
	longjmp(errorJump,errorReg);	
	break;	
	default: return;	
	}

}

int main ()
{
	double nfloat;

	printf("errorReg=%d\n", errorReg);

	errorReg = setjmp(errorJump);
	signal(SIGFPE, signal_handler); 
	printf("errorReg=%d\n", errorReg);

	nfloat /= 0;
	printf("div=%f\n",  nfloat ); 

	errorReg = setjmp(errorJump);
	printf("errorReg=%d\n", errorReg);
	
	nfloat = (sqrt (-1));
	printf("sqrt=%f\n", nfloat );

	errorReg = setjmp(errorJump);
	printf("errorReg=%d\n", errorReg);

	nfloat = (log (0));
	printf("log=%f\n", nfloat );

}


outputs ->

Code: Select all

[E:\PROG\NL\newlisp-10.1.6]f
errorReg=0
errorReg=0
SIGFPE!
errorReg=1
div=inf
errorReg=0
sqrt=nan
errorReg=0
log=-inf
-- (define? (Cornflakes))

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

this all doesn't make sense to me ;-)

The following

- longjmp() doesn't return anything is void longjmp()
- setjmp() returns whatever the second arg of longjmp() was
- errorReg is always set to the return value of setjmp()

I think yor are saying is, that SIGFPE when it occurs needs a longjmp() to restore the stack environment saved previously with setjmp() in errorJump. That would be then the newlisp-reset-entry-point. So SIGFPE would always cause a reset and take newLISP to the command line.

just send me the your changed newlisp.c, perhaps then it makes more sense to me.

EDITt: didn't see you last post while writing this, now I understand, how you think errorReg is set.

But here a totally different approach:
=======================

in setupAllSignals(void) all signals do:

Code: Select all

#ifdef OS2
setupSignalHandler(SIGFPE, SIG_IGN);
#endif
this tells OS/2 to simply ignore this exception. It may not let you do this overwrite, but we can try. In this case you can remove all other OS/2 specific signal code.

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

yes that would fit the GNU systems behaviour of SIGFPE.... let me try that...
btw I did try that previously but with the sigfpe stillinside the sigal_handler..(aaaggg)

lets see..
-- (define? (Cornflakes))

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

No..its doesnt eat it...
the SIGFPE needs to be triggered to catch the NaN's and Inf's it seems..
-- (define? (Cornflakes))

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

Lutz wrote:
The following

- longjmp() doesn't return anything is void longjmp()
- setjmp() returns whatever the second arg of longjmp() was
- errorReg is always set to the return value of setjmp()
I make a correction to the above actualy..

Its correct because a longjmp doesnt return, but ! ->

setjmp always return != 0 when there was a previous longjmp,
not what longjmp had as second argument.
-- (define? (Cornflakes))

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

Nope! From the man page of longjmp():

"Returns 0 after saving the stack environment. If setjmp() returns as a result of a longjmp() call, it returns the value argument of longjmp(), or if the value argument of longjmp() is 0, setjmp() returns 1."

That means setjmp() does return the second arg of longjmp(), except when that arg was 0, then setjmp() returns 1.

For that reason I wanted you to try longjmp(errorJmp, 0) in the error handler earlier.

Also, when you do this:

Code: Select all

if((errorReg = setjmp(errorJump)) != 0)
    {
    if(errorReg && (errorEvent != nilSymbol) )
        executeSymbol(errorEvent, NULL, NULL);
    else exit(-1);
    goto AFTER_ERROR_ENTRY;
    }

errorReg = setjmp(errorJump); <=== will suppress all error messages
setupAllSignals(); 
You effectively suppress all error messages, because the jump buffer is now set to that point with no error treatment when an error occured. It will then just drop into the command line without error messages.

That line has to go, we have to make OS/2 FP work without it.

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

Lutz wrote:Nope! From the man page of longjmp():

"Returns 0 after saving the stack environment. If setjmp() returns as a result of a longjmp() call, it returns the value argument of longjmp(), or if the value argument of longjmp() is 0, setjmp() returns 1."

That means setjmp() does return the second arg of longjmp(), except when that arg was 0, then setjmp() returns 1.

For that reason I wanted you to try longjmp(errorJmp, 0) in the error handler earlier.

Also, when you do this:

Code: Select all

if((errorReg = setjmp(errorJump)) != 0)
    {
    if(errorReg && (errorEvent != nilSymbol) )
        executeSymbol(errorEvent, NULL, NULL);
    else exit(-1);
    goto AFTER_ERROR_ENTRY;
    }

errorReg = setjmp(errorJump); <=== will suppress all error messages
setupAllSignals(); 
You effectively suppress all error messages, because the jump buffer is now set to that point with no error treatment when an error occured. It will then just drop into the command line without error messages.

That line has to go, we have to make OS/2 FP work without it.
Aha..That dusty UNIX programming book of mine will now go into the bin basket! (its from 1987...uhum...)
Ill stick with the manpages ;-)


I suspected that indeed with the extra setjmp, I dont like it that way eighter...
Yes lets stick with the code as it is now...

The odd thing that keeps me awake..is my C-example versus the newlisp code.

What I could do is extract the SIGFPE from the generic handler in newlisp?
perhpas that helps.. thats the only thing i did not do yet..
And newlisp has more longjmp's and setjmp's and an explicit check on 0 or 1 on the
setjmp..
-- (define? (Cornflakes))

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

What I could do is extract the SIGFPE from the generic handler in newlisp?
Yes, this is done for other signals on Sun OS, Tru64 and IBM Aix. E.g:

Code: Select all

#ifdef OS2
void specialOS2_handler(int s)
{
/* Norman's OS/2 stuff */
}

setupSignalHandler(SIGFPE, specialOS2_handler);
#endif
Note that setupSignalHandler() is just setsig() with error checking.

Then there is also this:

Code: Select all

setsig(SIGFPE, SIG_DFL)
its sets up some sort of OS-specific default handler.

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

... I stripped a long story here.. and will make it short..

It seems I keep running into a mixup of setjmp longjmp errorReg values.
(tested this by printing all the errorRegs inside newlisp, see previous posts)

From the SIGFPE handler point of view there are 2 options,
* use a long jump, which works in my c-code example.
* Exit the application with a message, this works in both newlisp and c-code example..

From the setjmp point of view:
* Very first call to setjmp results in a 0 return.
* All longjmp calls after the first 'direct' setjmp call will have != 0.

From the longjmp point of view there:
* Return the stack state last set by setjmp based on env value.
* make sure the initial function isnt finshed befor jumping.

Conclusion as it now is in newlisp:
* C-code example works on linux and on OS/2 gcc compiled.
* Nan and Inf only happen when SIGFPE is set.
* I.e. (div 0) Does initialy not return anything. the errorReg = 0.
* I.e. (div 0) Only returns the second time, as it seems the longjmp errorReg = 1.


I tried it all...I give myself a SIGSEGV...
-- (define? (Cornflakes))

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

oke fresh day fresh SIGHUP..

When starting newlisp freshly and entering (sqrt -1) i get the SIGNAL 8
the errorReg = 0 and returns to the code below where only E5=1 is displayed.

I found that the longjmp inside the SIGFPE handler always returns to this point ->
(which is the first setjmp call).

Code: Select all

/* ======================= main entry on reset ====================== */
printf("E4=%d\n", errorReg);
errorReg = setjmp(errorJump);
printf("E5=%d\n", errorReg);
setupAllSignals();
So again from this test the SIGFPE has initialy (the very first time its called)
not the same set_buf content, only the second time they are in sync. Thats
why the result isnt displayed, at least thats what it looks like..

I have stripped down the signaling inside newlisp so it only does the C-code from my
exmaple. There is no signaling left else then SIGFPE.

because it never displays the E4= it directly jumps to the first setjmp from the longjmp.

If I enter a (/ 0 0) after newlisp initialy started the errorReg = 29 (div by zero integer)
then I enter the (sqrt -1) get the SIGNAL 8 errorReg = 29 and it returns to the same
code (see above) where the E5=29. The next (sqrt -1) (no signal trigger, its running already) is then working.

..Duke Nukem would say... "Where is it!..."
-- (define? (Cornflakes))

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

This it will be in newlisp/2.

#ifdef OS2
/*
longjmp(errorJump,errorReg) and the signal handling of SIGFPE
are unreliable, course cant be traced. could be Libc063 or gcc or ..??
(NaN? (..)) (Inf? (..)) dont work, no returns regarding nan inf...
Therefor ERR: is returned with an exit. not very charming.
*/
case SIGFPE:
printErrorMessage(ERR_MATH, NULL, 0);
exit(-1);
break;
#endif
-- (define? (Cornflakes))

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

NaN and Inf are now returned by SIGFPE. Got it working. Finaly.
Lutz, I have posted you the code.
-- (define? (Cornflakes))

TedWalther
Posts: 608
Joined: Mon Feb 05, 2007 1:04 am
Location: Abbotsford, BC
Contact:

Post by TedWalther »

Well, don't leave me hanging! Can you paste the diff in here?

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

I uploaded the current newlisp-10.1.6.tgz to your place.

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

What kind of diff would you like? Just the changes? because I did it on the 10.1.6 release which isnt yet released yet..

But are you also building or do you only put it in a tree? Then you need the official diff, which is dont have from 10.1.5 to 10.1.6...
-- (define? (Cornflakes))

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

The version I uploaded to Ted contains all of Norman's modifications.

TedWalther
Posts: 608
Joined: Mon Feb 05, 2007 1:04 am
Location: Abbotsford, BC
Contact:

Post by TedWalther »

Thanks Lutz. I was just sort of excited by the way I saw the bug being chased down. I thought it would be a good climax to the detective novel to see the solution viz the diff. I'll check it out in the 10.1.6 tarball. Thank you!

Ted

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

To make a long story even longer..

It might make sence it might not..Im sure I forgot most parts of the
story but here is the ending.. at least the end of the ending..not
the full ending..i mean.. Whats a start of it anyway..

Code: Select all

Last week I had to come out and tell everyone on the internet;
"I have a problem, i want NaN's on my newlisp prompt in OS/2"

I use the OS/2 GCC port and the Klibc build and want NaN's.

I had actualy 3 problems, (1) Why is there no default FP ieee
return like NaN and Inf on Gcc for OS/2. (2) why does
newlisp need a double input to display the result of a
FPException. and (3) Why do I need to digg into my
long gone C knowledge to get this working..

First,..Well the simplest way is to point your finger,
I learned very quickly that this finger pointing is a
nice start of a 2 week struggle, actualy i got very
frustrated with the whole gcc port in OS/2 to get this
little irritating thing working.

Im using here a P4 system that run OS/2.. Fast enough
for coding and very nice for speed compare.

So why did newlisp not return the NaN or Inf? Could it
be GCC, could it be the P4? So I had to rule out.
First the P4.. Early bugs from Intel? I tested these.
No problems on my P4. That left me with GCC and the 
precompiled Libc. So I dugg into the C code of gcc and 
Libc. Well it was not what I hoped to find, speeking 
of programming, i have never seen so much exceptions 
in a bunch of code as here. Tracking was the only option.

I came across the solution by pure luck, actualy by
expirimenting with FloatingPoints in C and newlisp. 
From what I read it seemed the original gcc on 
gnu-systems has this ieee NaN Inf return behaviour 
default and doesnt use the SIGFPE for this.

As OS/2 isnt a pure gnu system but uses a gcc to
compile gnu applications the results and methods are
actualy in a dark area, its not 'yet' well documented.
Also I could not find anywhere a good description on 
what a SIGFPE actualy does in this OS/2 port, so i 
assumed it all depends per OS...period

I could not find anywhere in the bug tracking system
of OS/2 gcc a problem with SIGFPE related to
triggering NaN results..So i had to simmulate to proof
concept.

The behaviour of SIGFPE on OS2 gcc is: 
- Setting up the signal SIGFPE
- Trigger it
- Restore stack et voila..you application has ieee.

To get SIGFPE return to operational mode inside your
application, the only save way is the longjmp way.
(Thanks Richie, for the only decent C book).

From my tests I could see they all worked out of the
box, but not so inside newlisp.

(2) Why did they work in my C-code and not in newlisp?

Im now having a good idea when the NaN is triggered,
but why doesnt it work in newlisp. Newlisp uses some
very simple but effective stack returns to restore
errors.

I adjusted the newlisp code over and over and when you
look long enough to code you finaly dont see the
solution anymore. Finaly I was on the right track. 
a sync problem with the stacks created with jmp_buf.

Newlisp uses an initial stack at startup, SIGFPE which
can occeure anytime and is only triggered when the
Exception happens, does a longjmp to that stack to
restore from the Exception the FP caused and returns
after the longjmp to the first setjmp. (officialy there
is no return from longjmp but i just call it this way)
Now these stacks where not in sync. I did the longjmp 
all the time to the default stack.

I was even on Flag and Co-Processor level at one point, 
far too deep! And I did not want to go there eighter.

I had to figure out the seqence newlisp has in
errorhandling using the stack created with jmp_buf.
There was at initialization a problem which I already
found very quickly in the beginning but did not
realize I was so close to the solution at that time.

Also I still had to find a way to trigger that SIGFPE.
Which wasnt that easy because newlisp has a tight 
error checking. So a wrongly placed fooled sqrt(-1) 
SIGFPE trigger inside the newlisp code and newlisp 
would quit on me. I choose to fool the SIGFPE, there
was no other working way.

As SIGFPE does a longjmp and is initialized when the
Exception occeurs it will always longjmp to the stack
newlisp started with. Because the problem only happens
at the First time the SIGFPE happens. This stack does 
not contain any SIGFPE signals/results/flags, so when 
the SIGFPE happend the return to the newlisp console was
empty, which is correct because newlisp did just that,
restore the original stack. So I had to make sure that 
the SIGFPE restored the correct stack when the Exception 
happend but now including its own handler in the stack.
And that was the solution. 

So there are still questions open like: What does the
stack actualy contain at setjmp init. because the 
sigsetjmp takes care of the signals and flags. Which
isnt used here. Is the behaviour of the SIGFPE trigger
indeed a default way of catching the Ieee NaN's ?

The most irritating about this SIGFPE I found actualy 
while searching the internet. There was not 1 good
answer on the whole internet regarding the SIGFPE.
(Yes not even on the Big names sites eighter..)
And 'no' I did not want to use siginfo or sigaction.
They all just touched the Simple parts of the SIGFPE.

Its like reading a Bad technical book, all the stuff
you actualy want to know is aways in the back of 
the apendix or Reference chapter, just too short 
to learn from. 

So yes.. Simple.. I know.. Im a Wuzzy that cant
program.. a nothing a rooky a lame coder.. 
..yes folks I know I know.. But I got it running :-)

So now its time for a beer!

-- (define? (Cornflakes))

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

Here is a quotation from the EMX OS/2 documentation the GCC/2 version is leaning on..
Except the use of the _control87() that did not do anything in GCC for me.

Code: Select all

By default, all floating point exceptions are masked: The coprocessor will perform a 
default action (replace the result with a NaN, for instance) and continue without 
generating SIGFPE. Use _control87() to enable floating point exceptions. However, 
SIGFPE is not reliable
-- (define? (Cornflakes))

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

and I even found more insight IBM info...finaly some good documents...Perhpas
this sigfpe stack swapping will be changed... Im rechecking _control87()
-- (define? (Cornflakes))

Locked