Segmentation Fault running NewLisp "Daemon mode"

Q&A's, tips, howto's
Locked
newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Segmentation Fault running NewLisp "Daemon mode"

Post by newdep »

Hi Lutz,

Running Newlisp 8009.

Segmentation Fault occeurs when connecting to Newlisp when its running
in Daemon Mode.

Example #1:

bash-2.05b$ newlisp -L -l -d 50000 &
[1] 705

--- Now I telnet to "localhost 50000"
--- Newlisp prompt
--- >(exit)

[1]+ Segmentation fault newlisp -L -l -d 50000
bash-2.05b$



Example #2:

bash-2.05b$ newlisp -L -l -d 50001 &
[1] 713

--- Now I telnet to "localhost 50001"
--- Newlisp prompt
--- >(exit)

bash-2.05b$ fg
newlisp -L -l -d 50001
Segmentation fault
bash-2.05b$



Hope you can catch the bugger...

Norman...
-- (define? (Cornflakes))

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Still a Segmentation fault in 8010

Post by newdep »

Hello Lutz,


Also in version 8.0.10 the Segmentation fault occeurs on the daemon site.

When Newlisp is running in -d mode (not in -p mode)
and the remote client connects and ONLY types (exit) on the first prompt
then NewLisp dumps with Segmentation Fault.

When the client presses first ENTER and THEN types the (exit) ..its oke..

(little issue i think)


Norman.
-- (define? (Cornflakes))

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

This bug has been there for a long time and does not occur on Win32 and BSD. It is not limited to doing an (exit) right away, but can also occur in other circumstances.

If you have any idea how to fix this, help would be appreciated ;)

Lutz

ps: use only one of -L or -l options, if you use both the last one will win

eddier
Posts: 289
Joined: Mon Oct 07, 2002 2:48 pm
Location: Blue Mountain College, MS US

Post by eddier »

Works fine using the Linux 2.6.x kernel. Cannot remember the last digit. What kernel are you using?

Eddie

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

I am not sure, have to check, whatever Mandrake 9.2 uses. Sometimes you have to exit and reconnect from the client several times to provoke the error, as if it is some timing problem.

Lutz

eddier
Posts: 289
Joined: Mon Oct 07, 2002 2:48 pm
Location: Blue Mountain College, MS US

Post by eddier »

Ok. I see. After two tries I got the segment fault.

Mandrake's latest kernel is probably 2.4.x. I think all stable distributions except maybe turbo Linux use this kernel. On install, you can choose the 2.2.x or the 2.4.x kernel, it defaults to 2.2.x.

I'm using Debian testing. For a client this is ok. For the server side I would talk to someone who deals with security. However, I've noticed everything is much faster with the 2.6 kernel. EVERYTHING!

I've run 2.2.x (Mandrake), 2.4.x (Debian), FreeBSD, NetBSD and 2.6.x (Debian) on this same machine (AMD 2400+ with 512M memory). I noticed that the 2.6.x kernel has a much snappier response than 2.4.x and even FreeBSD. I wonder if that holds as well on Intel machines?

I like FreeBSD for a server and 2.6.x as a client.

Eddie

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

Im running here slackwre 2.4.20 / 2.4.26...

Lets see if i can find anything rlated to this issue in the code....

Regards, Norman.
-- (define? (Cornflakes))

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

Oke doing some tracing on my linux machine ..here is the output of the 1st oke and the 2nd dumps..

It stops after the filecontrol64 (fcntl64) the second time. this is 1 daemon sessions btw...

Looks like there is filecontrol happening on a variable wich isnt there..

Perhpas you see it quicker ;-)



### FIRST ###

accept(3, {sin_family=AF_INET, sin_port=htons(34398), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 1
getpeername(1, {sin_family=AF_INET, sin_port=htons(34398), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 0
time(NULL) = 1090333867
open("/etc/localtime", O_RDONLY) = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=1074, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40015000
read(4, "TZif\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\r\0\0\0\r\0"..., 4096) = 1074
close(4) = 0
munmap(0x40015000, 4096) = 0
fcntl64(1, F_GETFL) = 0x2 (flags O_RDWR)
fstat64(1, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40015000
_llseek(1, 0, 0xbffff570, SEEK_CUR) = -1 ESPIPE (Illegal seek)
munmap(0x40015000, 4096) = 0
write(1, "newLISP v8.0.10 Copyright (c) 20"..., 70) = 70
ioctl(1, SNDCTL_TMR_TIMEBASE, 0xbffff700) = -1 EINVAL (Invalid argument)
write(1, "\n> ", 3) = 3
read(1, "(", 1) = 1
read(1, "e", 1) = 1
read(1, "x", 1) = 1
read(1, "i", 1) = 1
read(1, "t", 1) = 1
read(1, ")", 1) = 1
read(1, "\r", 1) = 1
read(1, "\n", 1) = 1
close(1) = 0
old_mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40015000
write(-1, "\n", 1) = -1 EBADF (Bad file descriptor)
close(1) = -1 EBADF (Bad file descriptor)


### SECOND ###
accept(3, {sin_family=AF_INET, sin_port=htons(34399), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 1
getpeername(1, {sin_family=AF_INET, sin_port=htons(34399), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 0
time(NULL) = 1090333877
fcntl64(1, F_GETFL) = 0x2 (flags O_RDWR)
fstat64(1, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40017000
_llseek(1, 0, 0xbffff570, SEEK_CUR) = -1 ESPIPE (Illegal seek)
munmap(0x40017000, 4096) = 0
write(1, "\n> ", 3) = 3
read(1, "(", 1) = 1
read(1, "e", 1) = 1
read(1, "x", 1) = 1
read(1, "i", 1) = 1
read(1, "t", 1) = 1
read(1, ")", 1) = 1
read(1, "\r", 1) = 1
read(1, "\n", 1) = 1
close(1) = 0
old_mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40017000
write(-1, "\n> ", 3) = -1 EBADF (Bad file descriptor)
close(1) = -1 EBADF (Bad file descriptor)
accept(3, {sin_family=AF_INET, sin_port=htons(34400), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 1
getpeername(1, {sin_family=AF_INET, sin_port=htons(34400), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 0
time(NULL) = 1090333887
fcntl64(1, F_GETFL) = 0x2 (flags O_RDWR)
--- SIGSEGV (Segmentation fault) ---
+++ killed by SIGSEGV +++
-- (define? (Cornflakes))

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

thanks for the trace Norman, I think I found the problem

Lutz

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

Version 8.0.11 in http://newlisp.org/downloads/development/ solves this problem.

Lutz

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

Luts thanx for the quick fix...but now in rel 8.0.11 the -d function does not daemon anymore.. drops out after 1 connection exits...

Now it looks like when the clients (exit) from the daemon the daemon re-binds
to the port too quickly...because it has closed it..and fails and exits...

But your right, the segmentation is gone ;-)

PS: perhpas 1 hint for enhancement, when 1 client is connected you can close the "listener" so no more clients can connect and you will keep your current session... that way you dont "pressure" newlisp on the sockets...and siply re-open the listener when the client has exit...
-- (define? (Cornflakes))

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

This is crazy, nn my system Mandrake Linux 9.2 with kernel 2.4.22 it is working Ok.

I then added

deleteInetSession(sock);
close(sock);

after:

connection = accept(sock, (struct sockaddr *) &dest_sin, &dest_sin_len);

in the function: FILE * serverFD(int port, int reconnect) in file nl-sock.c closing the listen socket after accepting a connection, and now it also exits right away on my side and also breaks it on BSD, which all doesn't make much sense :(

Lutz

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

Norman, I wonder if this http://newlisp.org/downloads/development/Norman/

make any difference in the -d mode on your Linux system?

Lutz

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

Hi Lutz,

Did a quick test but no changes...although in -d mode after the first connect
from the client and disconnect the daemon also quits...(nicely) but does not run as daemon anymore...

Norman.
-- (define? (Cornflakes))

eddier
Posts: 289
Joined: Mon Oct 07, 2002 2:48 pm
Location: Blue Mountain College, MS US

Post by eddier »

Doesn't break on Debian 2.6.x

Eddie

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

This is hard to fix for me as I cannot reproduce it on my Linux installation, can you find out where it exits with a trace?

The only exit I see in the program ogic is when the accept() call falls through and leaves the 'connection' variable with a NULL, but then you would get a message: "newLISP server setup on port xxx failed" and you don't seem to get this. So I think is bombing out somewhere?

Lutz

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

thanks for testing Eddie, on my side I am running Kernel 2.4.22 on Mandrake 9.2. Norman, what Linux are you running?

Lutz

ps: you mentioned it already Norman: slackware 2.4

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

Hello Lutz,

Running linux 2.4.20

here is my trace, this time it pops-out at "semget" or the exit(1) = ? see below...


accept(3, {sin_family=AF_INET, sin_port=htons(35464), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 1
getpeername(1, {sin_family=AF_INET, sin_port=htons(35464), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 0
time(NULL) = 1090390017
open("/etc/localtime", O_RDONLY) = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=1074, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40015000
read(4, "TZif\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\r\0\0\0\r\0"..., 4096) = 1074
close(4) = 0
munmap(0x40015000, 4096) = 0
fcntl64(1, F_GETFL) = 0x2 (flags O_RDWR)
fstat64(1, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40015000
_llseek(1, 0, 0xbffff4c0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
munmap(0x40015000, 4096) = 0
write(1, "newLISP v8.0.12 Copyright (c) 20"..., 70) = 70
ioctl(1, SNDCTL_TMR_TIMEBASE, 0xbffff650) = -1 EINVAL (Invalid argument)
write(1, "\n> ", 3) = 3
read(1, "\r", 1) = 1
read(1, "\n", 1) = 1
write(1, "\n> ", 3) = 3
read(1, "(", 1) = 1
read(1, "e", 1) = 1
read(1, "x", 1) = 1
read(1, "i", 1) = 1
read(1, "t", 1) = 1
read(1, ")", 1) = 1
read(1, "\r", 1) = 1
read(1, "\n", 1) = 1
close(1) = 0
close(1) = -1 EBADF (Bad file descriptor)
accept(3,
bash-2.05b$
bash-2.05b$ fg
strace newlisp -d 5001
0x8068bf8, [4294967295]) = -1 EINVAL (Invalid argument)
getpeername(-1, 0xbffff630, [16]) = -1 EBADF (Bad file descriptor)
time(NULL) = 1090390074
semget(1, 0, 0x5|04) = -1 ENOSYS (Function not implemented)
_exit(1) = ?
-- (define? (Cornflakes))

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

Thanks Norman, when restarting the server the accept() call seems to fail on slackware, perhaps because the listen socket is invalid.

In http://newlisp.org/downloads/development/Norman/

you find a version which tries to reopen the port for listening.

Lutz

ps: perhaps Steve knows how to use CreateProcess in Win32 ?

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

Hi Lutz,

just tested the 8.0.12Norman release but unfortunatly no change.. here
is the trace...hope it gives a little insight...

second connect is refuced and the daemon exits...strange it is...


accept(3, {sin_family=AF_INET, sin_port=htons(32819), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 1
getpeername(1, {sin_family=AF_INET, sin_port=htons(32819), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 0
time(NULL) = 1090447188
open("/etc/localtime", O_RDONLY) = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=1074, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40015000
read(4, "TZif\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\r\0\0\0\r\0"..., 4096) = 1074
close(4) = 0
munmap(0x40015000, 4096) = 0
fcntl64(1, F_GETFL) = 0x2 (flags O_RDWR)
fstat64(1, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40015000
_llseek(1, 0, 0xbffff4a0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
munmap(0x40015000, 4096) = 0
write(1, "newLISP v8.0.12 Copyright (c) 20"..., 70) = 70
ioctl(1, SNDCTL_TMR_TIMEBASE, 0xbffff630) = -1 EINVAL (Invalid argument)
write(1, "\n> ", 3) = 3
read(1, "(", 1) = 1
read(1, "e", 1) = 1
read(1, "x", 1) = 1
read(1, "i", 1) = 1
read(1, "t", 1) = 1
read(1, ")", 1) = 1
read(1, "\r", 1) = 1
read(1, "\n", 1) = 1
close(1) = 0
accept(3, 0x8068d38, [4294967295]) = -1 EINVAL (Invalid argument)
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 1
bind(1, {sin_family=AF_INET, sin_port=htons(5001), sin_addr=inet_addr("0.0.0.0")}}, 16) = -1 EADDRINUSE (Address already in use)
close(1) = 0
semget(1, 0, 0x5|04) = -1 ENOSYS (Function not implemented)
_exit(1) = ?
[1]+ Exit 1 newlisp -d 5000
bash-2.05b$
-- (define? (Cornflakes))

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

Strange, it will not let me do accept() on the old listen socket, but also not let me bind the new one -> "Address already in use".

I will keep on trying ...

Lutz

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

The version 8.0.14 in http://newlisp.org/downloads/development/ fixes this. tested on MinGW, Debian, ReadHat, Mandrake, FreeBSD, OpenBSD, Mac OSX, AMD64 and Solaris. I am confident with Norman testing it, we can add Slackware to the list.

The bug wasn't very sophisticated, just uninitialized data structures (shame on me ;-) ). On the Sourceforge compiler farm I could find an OS which also showed the problem and then was able to fix it.

Lutz

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

Hello Lutz,

Thanks for the fix en enhancements,
its running now under slackware ;-) Great...


Regards, Norman.
-- (define? (Cornflakes))

Locked