Benchmarking
Benchmarking
newLisp guru's,
What would be the best code, if possible one-liner, to benchmark the performance of newLisp?
Greetings
Peter
What would be the best code, if possible one-liner, to benchmark the performance of newLisp?
Greetings
Peter
Code: Select all
if possible one-liner, to benchmark the performance of newLisp?
============
sorry, there is no such thing.
long answer
===========
For any one line, you will get a language ranking which doesn't say anything about the language. And you change the hardware it is running on, or only the OS and it puts the results on its head.
Even benchmark collections like this:
http://www.newlisp.org/benchmarks/
can bring completely different results when changing the platform or OS.
In the source distribution you find a file qa-bench, which measures performance for most of the built-in functions and consolidates the result in to one performance index. This index is calibrated as 1.0 on Mac OSX on a MacMini 1.83 Ghz with 1 G memory.
You run it like this:
Code: Select all
~> newlisp qa-bench
2363 ms
performance ratio: 1.00 (1.0 on Mac OS X, 1.83 GHz Intel Core 2 Duo)
~>
Code: Select all
~> newlisp qa-bench report
!= 9 ms
$ 9 ms
% 9 ms
& 9 ms
* 9 ms
+ 9 ms
- 9 ms
/ 9 ms
< 9 ms
<< 9 ms
<= 10 ms
>= 9 ms
>> 9 ms
NaN? 9 ms
^ 9 ms
abs 8 ms
acos 8 ms
acosh 9 ms
add 9 ms
address 9 ms
amb 9 ms
and 10 ms
append 9 ms
apply 10 ms
args 9 ms
array 10 ms
array-list 10 ms
array? 10 ms
asin 9 ms
asinh 9 ms
assoc 9 ms
atan 9 ms
atan2 10 ms
atanh 8 ms
atom? 9 ms
base64-dec 10 ms
base64-enc 10 ms
bayes-query 8 ms
bayes-train 9 ms
begin 9 ms
beta 10 ms
betai 9 ms
bind 10 ms
binomial 9 ms
bits 10 ms
case 9 ms
catch 10 ms
ceil 8 ms
char 10 ms
chop 10 ms
clean 10 ms
cond 9 ms
cons 9 ms
constant 9 ms
context 9 ms
context? 8 ms
copy 9 ms
cos 8 ms
cosh 8 ms
count 10 ms
cpymem 9 ms
crc32 11 ms
crit-chi2 10 ms
crit-z 11 ms
curry 9 ms
date 12 ms
date-value 10 ms
debug 10 ms
dec 10 ms
def-new 11 ms
default 10 ms
define 9 ms
define-macro 9 ms
delete 11 ms
det 9 ms
difference 10 ms
div 9 ms
do-until 9 ms
do-while 10 ms
doargs 9 ms
dolist 10 ms
dostring 11 ms
dotimes 10 ms
dotree 11 ms
dump 9 ms
dup 10 ms
empty? 10 ms
encrypt 9 ms
ends-with 11 ms
env 12 ms
erf 9 ms
error-event 9 ms
eval 10 ms
eval-string 11 ms
exists 10 ms
exp 8 ms
expand 9 ms
explode 10 ms
factor 10 ms
fft 9 ms
filter 10 ms
find 11 ms
find-all 10 ms
first 10 ms
flat 9 ms
float 9 ms
float? 8 ms
floor 15 ms
flt 9 ms
for 10 ms
for-all 10 ms
format 11 ms
fv 9 ms
gammai 9 ms
gammaln 8 ms
gcd 9 ms
get-char 9 ms
get-float 9 ms
get-int 10 ms
get-long 9 ms
get-string 9 ms
global 8 ms
global? 9 ms
if 11 ms
if-not 8 ms
ifft 9 ms
import 8 ms
inc 11 ms
index 9 ms
int 9 ms
integer? 8 ms
intersect 10 ms
invert 10 ms
irr 10 ms
join 9 ms
lambda? 8 ms
last 11 ms
last-error 9 ms
legal? 10 ms
length 13 ms
let 10 ms
letex 10 ms
letn 9 ms
list 12 ms
list? 9 ms
local 9 ms
log 8 ms
lookup 10 ms
lower-case 10 ms
macro? 8 ms
main-args 9 ms
map 11 ms
mat 10 ms
match 9 ms
max 10 ms
member 10 ms
min 10 ms
mod 9 ms
mul 9 ms
multiply 15 ms
name 9 ms
new 12 ms
nil? 10 ms
normal 15 ms
not 10 ms
now 10 ms
nper 9 ms
npv 10 ms
nth 9 ms
null? 10 ms
number? 9 ms
or 10 ms
pack 10 ms
parse 10 ms
pmt 9 ms
pop 10 ms
pop-assoc 10 ms
pow 9 ms
pretty-print 9 ms
primitive? 8 ms
prob-chi2 9 ms
prob-z 8 ms
protected? 9 ms
push 9 ms
pv 9 ms
quote 8 ms
quote? 8 ms
rand 9 ms
random 9 ms
randomize 9 ms
read-expr 10 ms
ref 9 ms
ref-all 9 ms
regex 11 ms
regex-comp 9 ms
replace 10 ms
rest 10 ms
reverse 10 ms
rotate 10 ms
round 10 ms
seed 9 ms
select 11 ms
sequence 10 ms
series 11 ms
set 9 ms
set-locale 11 ms
set-ref 10 ms
set-ref-all 10 ms
setf 10 ms
setq 11 ms
sgn 9 ms
sin 8 ms
sinh 9 ms
slice 11 ms
sort 10 ms
source 11 ms
sqrt 8 ms
starts-with 10 ms
string 11 ms
string? 9 ms
sub 9 ms
swap 11 ms
sym 9 ms
symbol? 9 ms
symbols 13 ms
sys-error 11 ms
sys-info 10 ms
tan 8 ms
tanh 8 ms
throw 11 ms
throw-error 10 ms
time 8 ms
time-of-day 9 ms
title-case 11 ms
transpose 10 ms
trim 9 ms
true? 9 ms
unify 10 ms
unique 10 ms
unless 9 ms
unpack 10 ms
until 9 ms
upper-case 9 ms
uuid 10 ms
when 9 ms
while 10 ms
write-buffer 10 ms
write-line 9 ms
xml-parse 10 ms
xml-type-tags 9 ms
zero? 11 ms
| 9 ms
~ 10 ms
2443 ms
performance ratio: 1.00 (1.0 on Mac OS X, 1.83 GHz Intel Core 2 Duo)
Code: Select all
~> newlisp qa-bench report 10
!= 98 ms
$ 91 ms
...
...
xml-parse 100 ms
xml-type-tags 93 ms
zero? 94 ms
| 93 ms
~ 98 ms
24792 ms
performance ratio: 1.00 (1.0 on Mac OS X, 1.83 GHz Intel Core 2 Duo)
Running this under Linux on the same CPU completely changes the picture. Some functions suddenly perform double as fast or slow.
I am running all benchmarks on the same system in the same OS.And you change the hardware it is running on, or only the OS and it puts the results on its head.
Good remark. This means that different benchmarks should run for a longer time, like 15 or 30 minutes.Running this under Linux on the same CPU completely changes the picture. Some functions suddenly perform double as fast or slow.
So maybe we have to look at it the other way around: instead of running a program and see how long it takes to complete, run a program for some time, and then see how many actions were performed.
...Uuuhhhh...So maybe we have to look at it the other way around: instead of running a program and see how long it takes to complete, run a program for some time, and then see how many actions were performed.
If I know that in a textfile of 2Gig there are 100000 volwels,
...both programs come to the same result finaly...
So where is the advantage of doing this not measuring it against a competitive target..like time?
PS: The amount of actions does not always result in a faster/efficient result.
-- (define? (Cornflakes))
Go tease some sheep, you compleat fan! ;-)
But the idea is not so difficult? Suppose we check the (add) statement. Let's run a newLisp program continuously adding 0.1 starting from 0, and let's run that program for 5 minutes.
Now, let's do the same thing in another language.
After those 5 minutes, we can see which value was reached, right?
Suppose newLisp reached 1000 and some other language X reached 500, we may safely conclude that newLisp is faster when it comes to adding floats? If we take newLisp as reference, it means language X is 50% slower?
But the idea is not so difficult? Suppose we check the (add) statement. Let's run a newLisp program continuously adding 0.1 starting from 0, and let's run that program for 5 minutes.
Now, let's do the same thing in another language.
After those 5 minutes, we can see which value was reached, right?
Suppose newLisp reached 1000 and some other language X reached 500, we may safely conclude that newLisp is faster when it comes to adding floats? If we take newLisp as reference, it means language X is 50% slower?
So let me give an example. This compiled BASIC program runs for 10 seconds adding 0.0001 to a variable.
Now, the equivalent of such a BASIC program in newLisp is like this (and correct me if I can implement it more efficiently):
When run the compiled BASIC program, the result is:
Peter
Code: Select all
DECLARE t TYPE double
t = 0
start = SECOND(NOW)
end = start + 10
WHILE SECOND(NOW) NE end DO
t = t + 0.0001
WEND
PRINT "Result is: ", t
END
Code: Select all
(set 't 0.0)
(set 'start (apply date-value (now)))
(set 'end (+ start 10))
(while (not (= (apply date-value (now)) end))
(set 't (add t 0.0001))
)
(println "Result is: " t)
(exit)
When I run the newLisp program, the result is:peter@solarstriker:~/programming$ ./benchmark
Result is: 574.7542999
Both programs run on the same machine in the same Operating System, and to me it seems the results indicate that the BASIC compiler is faster? Again, maybe there can be an optimization for the newLisp program? What do you folks say about it?peter@solarstriker:~/programming$ newlisp benchmark.lsp
Result is: 373.0229
Peter
If this is compiled Basic (it seems to be, judging from the type declarations), then it looks pretty good for newLISP.
But still comparing compiled vs dynamic languages is comparing apples and oranges.
It is also not clear what this example really measures. Probably not floating point addition but rather internal time functions, or both.
By just changing the way time is measured newLISP is double as fast doing more than double the floating point additions than before and beats compiled BASIC:
But my main point is, that languages should not be compared by just testing one or two things, in this case floating point addition and retrieval of system time.
The net is full of this type of toy comparisons doing just some little thing. They make for lots of hits in a block-post but really don't say anything about the programming languages involved.
The best way to benchmark is, to either benchmark lots of well defined specific operations (similar to what qa-bench is doing) or to benchmark well defined real world tasks, big enough to work a broader area of the function repertoire of the language.
But still comparing compiled vs dynamic languages is comparing apples and oranges.
It is also not clear what this example really measures. Probably not floating point addition but rather internal time functions, or both.
By just changing the way time is measured newLISP is double as fast doing more than double the floating point additions than before and beats compiled BASIC:
Code: Select all
(set 't 0.0)
(set 'start (time-of-day))
(set 'end (+ start 10000))
(while (not (= (time-of-day) end))
(set 't (add t 0.0001))
)
(println "Result is: " t)
(exit)
Result is: 925.3656998 ; versus 412.2727 using 'date-value' on Mac Mini 1.83 Ghz
The net is full of this type of toy comparisons doing just some little thing. They make for lots of hits in a block-post but really don't say anything about the programming languages involved.
The best way to benchmark is, to either benchmark lots of well defined specific operations (similar to what qa-bench is doing) or to benchmark well defined real world tasks, big enough to work a broader area of the function repertoire of the language.
It is compiled BASIC all right and indeed, newLisp runs very well!!If this is compiled Basic, then it looks pretty good for newLISP.
In this case, I am particularly interested in newLisp versus any compiled language. I do want to see how well newLisp performs compared with a compiled binary. One of the traditional objections against interpreted languages is, that they are slow. I already observed a very good performance with newLisp programs, but how well does newLisp perform?But still comparing compiled vs dynamic languages is comparing apples and oranges.
Fully agreed. This will always be a problem of benchmarks. Maybe we should say: a similar program with the exact same functionality.It is also not clear what this example really measures. Probably not floating point addition but rather internal time functions, or both.
Obviously not! This was just an example. I already was thinking of multiple tests.But my main point is, that languages should not be compared by just testing one or two things, in this case floating point addition and retrieval of system time.
In the end one never will get the exact performance. Nevertheless, some sort of global indication is sufficient for me.
Your code indeed improves the performance tremendously. If I also improve the BASIC code in a similar way, with compile optimizations (-fnative) then these are the results:
Code: Select all
DECLARE t TYPE double
t = 0
end = NOW + 10
WHILE NOW < end DO
t = t + 0.0001
WEND
PRINT "Result is: ", t
Code: Select all
#!/bin/newlisp
(set 't 0.0)
(set 'end (+ (time-of-day) 10000))
(while (< (time-of-day) end)
(set 't (add t 0.0001))
)
(println "Result is: " t)
(exit)
So newLisp runs 94.66% slower compared to the compiled BASIC binary with the same functionality.
Again, it is admitted that the actual test is blurry, therefore I will make more tests to see the difference. The performance on lists for example, will be much better than a similar functionality in BASIC (arrays?). Probably there are more typical Lisp aspects where even a BASIC-compiler will be beaten.
Peter
A good native compile will always be at least 50 times faster than a interpreted language. Here are some interesting comparisons for fibonacci:
http://dada.perl.it/shootout/fibo.html
and here for other algorithms showing different rankings:
http://dada.perl.it/shootout
It is interesting to see how well JIT (Just In Time) compilation is doing for Java on number crunching tasks.
http://dada.perl.it/shootout/fibo.html
and here for other algorithms showing different rankings:
http://dada.perl.it/shootout
It is interesting to see how well JIT (Just In Time) compilation is doing for Java on number crunching tasks.
-
- Posts: 2038
- Joined: Tue Nov 29, 2005 8:28 pm
- Location: latiitude 50N longitude 3W
- Contact:
I thought I'd start adding to the list you started, Lutz:
I'll add more results for my motley collection of computers when I get a Roundtoit. I expect to hit the 5 second mark later... :)
Code: Select all
2282 ms ; pr: 0.9 ; Mac OS X 2.0 GHz Intel Core 2 Duo
1658 ms ; pr: 0.7 ; FreeBSD at NFSHOST (no idea what CPU - ?)
-
- Posts: 2038
- Joined: Tue Nov 29, 2005 8:28 pm
- Location: latiitude 50N longitude 3W
- Contact:
Joe: would you like to swap your machine with mine? :)
edit: I found a supercomputer upstairs you might prefer:
that's from 2001, that machine...
edit: I found a supercomputer upstairs you might prefer:
Code: Select all
12606 ms ; pr: 5.0 ; Mac OS X 700 MHz PowerPC G3
Last edited by cormullion on Sat Aug 01, 2009 4:12 pm, edited 2 times in total.
-
- Posts: 2038
- Joined: Tue Nov 29, 2005 8:28 pm
- Location: latiitude 50N longitude 3W
- Contact:
;)Joe wrote:The program reported the wrong hardware. It is actually:cormullion wrote:Joe: would you like to swap your machine with mine? :)
2 x 3.2 GHz Quad-Core Intel Xeon
But you probably won't want it. It has some fingerprints and dust on it.
That 'wrong hardware' is actually the reference machine used (ie lutz' machine is a "Mac OS X, 1.83 GHz Intel Core 2 Duo"). It would be cool to be able to find out easily what's running underneath, but I don't think there's a way.
Code: Select all
34928 ms on a Nokia N810 armv61
performance ratio: 13.7
-- (define? (Cornflakes))
My old country home net-surfer over 26400 baud phone line has got you all beat!newdep wrote:PS: well its 14 times smaller then a pc too.. so a small excuse is permitted ;-)Code: Select all
34928 ms on a Nokia N810 armv61 performance ratio: 13.7
Intel Pentium 120 - P54CQS - 120MHz - cache (none) - 11.9 W
Win 95 (1996 BIOS copyright) upgraded to Win 98 and 16 GB HD in 1999, no USB, but has holes in the printed circuit board when USB hardware becomes more widely available ;0)
You've come a long way Intel baby since Win 95/98 "daze"!
Code: Select all
!= 330 ms
$ 440 ms
% 280 ms
& 380 ms
* 330 ms
+ 280 ms
- 330 ms
/ 270 ms
< 280 ms
<< 270 ms
<= 270 ms
= 280 ms
> 270 ms
>= 220 ms
>> 330 ms
NaN? 550 ms
^ 390 ms
abs 330 ms
acos 490 ms
acosh 390 ms
add 320 ms
address 330 ms
amb 330 ms
and 220 ms
append 270 ms
apply 270 ms
args 390 ms
array 270 ms
array-list 270 ms
array? 390 ms
asin 980 ms
asinh 1100 ms
assoc 330 ms
atan 390 ms
atan2 380 ms
atanh 390 ms
atom? 330 ms
base64-dec 600 ms
base64-enc 440 ms
bayes-query 220 ms
bayes-train 770 ms
begin 270 ms
beta 610 ms
betai 600 ms
bind 390 ms
binomial 600 ms
bits 600 ms
case 440 ms
catch 60 ms
ceil 380 ms
char 940 ms
chop 330 ms
clean 50 ms
cond 280 ms
cons 270 ms
constant 220 ms
context 280 ms
context? 270 ms
copy 280 ms
cos 380 ms
cosh 500 ms
count 330 ms
cpymem 380 ms
crc32 170 ms
crit-chi2 760 ms
crit-z 500 ms
curry 160 ms
date 1820 ms
date-value 270 ms
debug 610 ms
dec 430 ms
def-new 330 ms
default 170 ms
define 270 ms
define-macro 330 ms
delete 1600 ms
det 330 ms
difference 330 ms
div 320 ms
do-until 330 ms
do-while 440 ms
doargs 330 ms
dolist 390 ms
dostring 380 ms
dotimes 390 ms
dotree 380 ms
dump 380 ms
dup 270 ms
empty? 330 ms
encrypt 330 ms
ends-with 490 ms
env 720 ms
erf 330 ms
error-event 280 ms
eval 270 ms
eval-string 440 ms
exists 60 ms
exp 430 ms
expand 280 ms
explode 330 ms
factor 330 ms
fft 490 ms
filter 60 ms
find 440 ms
find-all 330 ms
first 330 ms
flat 330 ms
float 980 ms
float? 280 ms
floor 440 ms
flt 330 ms
for 330 ms
for-all 50 ms
format 1700 ms
fv 330 ms
gammai 550 ms
gammaln 440 ms
gcd 220 ms
get-char 390 ms
get-float 490 ms
get-int 1320 ms
get-long 440 ms
get-string 380 ms
global 220 ms
global? 280 ms
if 270 ms
if-not 330 ms
ifft 500 ms
import 270 ms
inc 390 ms
index 50 ms
int 610 ms
integer? 220 ms
intersect 330 ms
invert 330 ms
irr 440 ms
join 330 ms
lambda? 280 ms
last 330 ms
last-error 440 ms
legal? 380 ms
length 440 ms
let 330 ms
letex 330 ms
letn 270 ms
list 280 ms
list? 220 ms
local 330 ms
log 380 ms
lookup 330 ms
lower-case 330 ms
macro? 220 ms
main-args 330 ms
map 330 ms
mat 330 ms
match 330 ms
max 380 ms
member 440 ms
min 330 ms
mod 330 ms
mul 390 ms
multiply 330 ms
name 440 ms
new 330 ms
nil? 380 ms
normal 490 ms
not 280 ms
now 2750 ms
nper 430 ms
npv 330 ms
nth 330 ms
null? 270 ms
number? 330 ms
or 390 ms
pack 490 ms
parse 390 ms
pmt 440 ms
pop 380 ms
pop-assoc 330 ms
pow 330 ms
pretty-print 440 ms
primitive? 270 ms
prob-chi2 500 ms
prob-z 330 ms
protected? 220 ms
push 380 ms
pv 330 ms
quote 330 ms
quote? 220 ms
rand 660 ms
random 380 ms
randomize 330 ms
read-expr 440 ms
ref 390 ms
ref-all 330 ms
regex 440 ms
regex-comp 490 ms
replace 220 ms
rest 440 ms
reverse 380 ms
rotate 390 ms
round 1430 ms
seed 110 ms
select 380 ms
sequence 330 ms
series 270 ms
set 280 ms
set-locale 990 ms
set-ref 380 ms
set-ref-all 330 ms
setf 170 ms
setq 330 ms
sgn 330 ms
sin 440 ms
sinh 430 ms
slice 380 ms
sort 60 ms
source 3460 ms
sqrt 330 ms
starts-with 550 ms
string 7630 ms
string? 440 ms
sub 330 ms
swap 390 ms
sym 380 ms
symbol? 390 ms
symbols 490 ms
sys-error 490 ms
sys-info 330 ms
tan 440 ms
tanh 390 ms
throw 50 ms
throw-error 330 ms
time 9450 ms
time-of-day 3510 ms
title-case 390 ms
transpose 440 ms
trim 220 ms
true? 330 ms
unify 380 ms
unique 330 ms
unless 330 ms
unpack 440 ms
until 330 ms
upper-case 330 ms
uuid 2200 ms
when 270 ms
while 440 ms
write-buffer 440 ms
write-line 440 ms
xml-parse 380 ms
xml-type-tags 330 ms
zero? 280 ms
| 440 ms
~ 1590 ms
127470 ms
performance ratio: 50.0 (1.0 on Mac OS X, 1.83 GHz Intel Core 2 Duo)
The funny thing is, that at work, I can't read web pages any faster on a newer machine ;o)
Lookup your evil, power hungry, global warming Intel beast chip here:
http://www.processor-comparison.com/power.html
-- xytroxon
"Many computers can print only capital letters, so we shall not use lowercase letters."
-- Let's Talk Lisp (c) 1976
-- Let's Talk Lisp (c) 1976
A little bit too late, but here are my two cents on the topic of benckmarks (I did a lot for the Lhogho compiler).
1. I think it is impossible to write a satisfactory one-liner for benchmarking.
2. A true benchmark does not produce a scalar (i.e. a single value), but a vector - i.e. there are many individual performance values, measuring different aspects. For example, scores are for integer arithmetics, for fp arithmetics, for string manipulation, for memory management, for passing parameters, for searching local variables, etc. etc.
3. Except for a vector of many specialized mini-benchmarks, there is an option to use a more complex system, which measures a predefined set of functions. For example, a test for calculation a recursive math function spends time for real calculaitons, for passing parameters, for low-level function invokation, and so on.
4. Simple/short benchmarks can be seriously affected by the Operating system (especially what it is doing in the background while you do the becnmark)
5. The way a code is compiled can also lead to drastically different results. For example, if newLisp is compiled with and without optimizations, this may lead to noticeable difference in performance.
6. What does one-line mean in a language where newlines are just whitespaces?
7. For a one-liner that measure integer arithmetics you can try the 3x+1 problem.
1. I think it is impossible to write a satisfactory one-liner for benchmarking.
2. A true benchmark does not produce a scalar (i.e. a single value), but a vector - i.e. there are many individual performance values, measuring different aspects. For example, scores are for integer arithmetics, for fp arithmetics, for string manipulation, for memory management, for passing parameters, for searching local variables, etc. etc.
3. Except for a vector of many specialized mini-benchmarks, there is an option to use a more complex system, which measures a predefined set of functions. For example, a test for calculation a recursive math function spends time for real calculaitons, for passing parameters, for low-level function invokation, and so on.
4. Simple/short benchmarks can be seriously affected by the Operating system (especially what it is doing in the background while you do the becnmark)
5. The way a code is compiled can also lead to drastically different results. For example, if newLisp is compiled with and without optimizations, this may lead to noticeable difference in performance.
6. What does one-line mean in a language where newlines are just whitespaces?
7. For a one-liner that measure integer arithmetics you can try the 3x+1 problem.