Page 1 of 1

newLISP speed

Posted: Thu May 06, 2004 1:49 pm
by HPW
Today a co-worker complained about the speed of newLISP running on our solaris host. He had started a parser process and has waited around 4 minutes for process-return.

I asked him about details and I see that he parses a 40 MB file with 648000 lines, where every line is parsed and a 7200 line result file is written.

So I think it is quit good for an interpreter and it is running rock-solid.

:-)

Posted: Thu May 06, 2004 4:37 pm
by Lutz
actually I think it could be much faster, but I don't know what the code really does, so I am not sure, just a guess.

Lutz

Posted: Thu May 06, 2004 5:17 pm
by HPW
You may have a look at the code, may be you see some points where optimisation could be possible.

Code: Select all

(setq starttime (now))
;-----------------------------------------------------------------------
;Repeats a str with num
;-----------------------------------------------------------------------
(define (repstr str num    newstr)
	(setq newstr "")(dotimes(x num)(setq newstr(append newstr str))))
;-----------------------------------------------------------------------
;Main
;-----------------------------------------------------------------------
(if (=(length(main-args))4)
(begin
 (setq in-file(open (nth 2(main-args))"read"))
 (setq out-file(open (nth 3(main-args))"write"))
 (if (>(last(sys-info))5)
	(setq lineendstr "")					;WIN = 6
	(setq lineendstr "\r")					;Solaris = 4
 )
(write-line "Starte Konvertierung Protokoldatei zu CSV")
(setq	NextLineHpos nil
	NewOutString nil
	KombiEndCount 0
	UposCount 0
	MaxUpos 40
)
(while (setq linestr(read-line in-file))
	(setq linelst (parse linestr " "))
	(if NextLineHpos
		(setq	hposlst (replace " " linestr  "")
			hposlst (parse hposlst "|")
			NewOutString (string NewOutString (nth 0 hposlst)";"
							(nth 1 hposlst)";"
							(nth 2 hposlst)";")
			NextLineHpos nil
		)
	)
	(if (=(string(nth 0 linelst))"UPos")
		(begin
		(if upospreisneeded
		(setq	uposlst (replace "[ ]+" linestr "|" 1)
			uposlst (parse uposlst "|")
			NewOutString (string NewOutString "-99999;"
							(nth 2 uposlst)";"
							(nth 4 uposlst)";"
							(nth 6 uposlst)";"))
		(setq	uposlst (replace "[ ]+" linestr "|" 1)
			uposlst (parse uposlst "|")
			NewOutString (string NewOutString (nth 2 uposlst)";"
							(nth 4 uposlst)";"
							(nth 6 uposlst)";"))
		)
		(inc 'UposCount)
		(setq upospreisneeded true)
		)
	)
	(if (=(string(nth 0 linelst))"UPos-Einzelpr.=")
		(setq	uposlst (replace "[ ]+" linestr "|" 1)
			uposlst (parse uposlst "|")
			NewOutString (string NewOutString (replace "."(nth 1 uposlst)",")";")
			upospreisneeded nil
		)
	)
	(if (and(=(string(nth 0 linelst))"MbiKatexFctFwUndRundung:")
		(=(string(nth 1 linelst))"Input-Preis=")
		(=(string(last linelst))"bKombi=1")
		(= KombiEndCount 0))
		(setq KombiEndCount 1)
	)
	(if (and(=(string(nth 0 linelst))"MbiKatexFctFwUndRundung:")
		(=(string(nth 1 linelst))"Output-Preis=")
		(= KombiEndCount 1))
		(setq	uposlst (replace "[ ]+" linestr "|" 1)
			uposlst (parse uposlst "|")
			NewOutString (string NewOutString
				(repstr ";"(*(- MaxUpos UposCount)4)))
			NewOutString (string NewOutString (replace "."(last uposlst)",")";")
			KombiEndCount 2
		)
	)
	(if (and(=(string(nth 0 linelst))"HPos")(=(string(last linelst))"|tstik_varid"))
		(begin
		(setq	NextLineHpos true
			KombiEndCount 0
			UposCount 0
		)
		(if NewOutString
			(write-line	(append
					NewOutString
					lineendstr
					)
			out-file
			)
		)
		(setq NewOutString "")
		)
	)
)
(if NewOutString
	(write-line	(append
			NewOutString
			lineendstr
			)
	out-file
	)
)
(close out-file)
(close in-file)
(write-line "Ende Konvertierung Protokoldatei zu CSV")
(setq endtime (now))
(write-line (string starttime))
(write-line (string endtime))
(exit)
)
(begin
(write-line "Aufruf: newlisp parsepreisprot.lsp in.txt out.csv")
(exit)
)
)

Posted: Thu May 06, 2004 6:42 pm
by Lutz
You definetely should replace (repstr ...) with this much faster version:

Code: Select all

(define (repstr str num , lst)
   (dotimes(x num) (push str lst))
   (join lst))
When num is a few thousands this routines is a hundred times faster.

Letting a string grow by repeatedly appending it is !very! expensive. It is much faster to push the pieces on a list and then join them.

Else it looks Ok to me.

Lutz

Posted: Thu May 06, 2004 7:01 pm
by HPW
Thanks Lutz,

I will test it in the morning when I am back in the office.
I will post the speedup.
(num was this time max 160)

Posted: Thu May 06, 2004 8:28 pm
by HPW
I made a testfile by hand on my home system with similar amount of data.
(This time 648000 line in 42 MB)

Surprise, no time difference. Both 'repstr' use around 110 sec on my 1.8 GHZ WIN XP. Tomorrow I will compare on solaris. But I think it is slower, because it only use 1 of it's 8 processors for the process and the one is maybe a bit slower than my home PC.

At least I take the complete repstr out and still exactly the same time.
So it takes no signifikant time to use it.

Posted: Fri May 07, 2004 12:42 am
by Lutz
Looks like (repstr ...) is taking only little time in the overall process, still, keep this kind of optimization in mind for the future, it has saved me many times when doing repeated appends on a string.

The only other suggestion I have, is shorten the code doing regex stuff, but I am not sure how far this is applicable in your case and complex regular expression also can take their time ...

Lutz

Posted: Fri May 07, 2004 6:11 am
by HPW
Now tested in the office (same file and code):

Solaris Host 850 MHZ (using 1 of 8 CPU): 164 sec
WIN XP 2.8 GHZ: 58 sec

Then I clean up the loop to do nothing on WIN XP: 47 sec
So only reading the lines makes most of the time.

Posted: Fri May 07, 2004 8:58 pm
by Lutz
Interesting, looks like I/O is the limiting factor in this case, not the processing itself. You could try of course:

(dolist (line (parse (read-file "thedatafile") "\r\n"))
(process-the-line line))

This might speed up I/O but costs a lot of memory.

Lutz

Posted: Sat May 08, 2004 7:38 am
by HPW
Interesting idea:

Home WIN XP 1.8 GHZ/512 MB RAM with 42 MB testfile

With line-read: 104 sec with MEM-load < 1 MB
With file-read: 29 sec with MEM-load 109 MB

So when file fits in memory this way file-read can be prefered.

Posted: Sat May 08, 2004 12:52 pm
by Lutz
Almost 4 times speed improvement ... pretty good ...

and if it is true what they say about Sun workstations, it should perform even better on those, because they are supposed to be much faster doing I/O transfer than PCs. Perhaps a Sun machine can swallow that 45Mbyte chunk in 10secs, that would make your Solaris folks happy :)

Lutz