Benchmarking

pjot · Post by **pjot** » Thu Jul 30, 2009 9:02 am

newLisp guru's,

What would be the best code, if possible one-liner, to benchmark the performance of newLisp?

Greetings
Peter

newdep · Post by **newdep** » Thu Jul 30, 2009 9:05 am

I would guess a manipulation of a big list...
sort, lookup, replace, setf

Btw.. Benchmarking against what acualy?

A oneliner i leave for Lutz ;-)

pjot · Post by **pjot** » Thu Jul 30, 2009 9:22 am

OK my question was not specific enough. :-)

So let me rephrase: what would be the best portable code, if possible a one-liner, to benchmark the performance of newLisp?

The idea is to compare the performance of newLisp with other languages.

newdep · Post by **newdep** » Thu Jul 30, 2009 9:31 am

(dotimes (x 1000000) (push x buffer -1))

or a for loop

(for (x 1 100000 2) (push x buffer))

If the other language doesn have lists, then you could
concatenate a string..
Output to screen is not a real performance test..
..so make that 'silent in newlisp ;-)

Lutz · Post by **Lutz** » Thu Jul 30, 2009 11:36 am

Code: Select all

if possible one-liner, to benchmark the performance of newLisp?

short answer
============

sorry, there is no such thing.

long answer
===========

For any one line, you will get a language ranking which doesn't say anything about the language. And you change the hardware it is running on, or only the OS and it puts the results on its head.

Even benchmark collections like this:

http://www.newlisp.org/benchmarks/

can bring completely different results when changing the platform or OS.

In the source distribution you find a file qa-bench, which measures performance for most of the built-in functions and consolidates the result in to one performance index. This index is calibrated as 1.0 on Mac OSX on a MacMini 1.83 Ghz with 1 G memory.

You run it like this:

Code: Select all

~> newlisp qa-bench
2363 ms
performance ratio: 1.00 (1.0 on Mac OS X, 1.83 GHz Intel Core 2 Duo)
~>

or like this:

Code: Select all

~> newlisp qa-bench report
!=               9 ms
$                9 ms
%                9 ms
&                9 ms
*                9 ms
+                9 ms
-                9 ms
/                9 ms
<                9 ms
<<               9 ms
<=              10 ms
>=               9 ms
>>               9 ms
NaN?             9 ms
^                9 ms
abs              8 ms
acos             8 ms
acosh            9 ms
add              9 ms
address          9 ms
amb              9 ms
and             10 ms
append           9 ms
apply           10 ms
args             9 ms
array           10 ms
array-list      10 ms
array?          10 ms
asin             9 ms
asinh            9 ms
assoc            9 ms
atan             9 ms
atan2           10 ms
atanh            8 ms
atom?            9 ms
base64-dec      10 ms
base64-enc      10 ms
bayes-query      8 ms
bayes-train      9 ms
begin            9 ms
beta            10 ms
betai            9 ms
bind            10 ms
binomial         9 ms
bits            10 ms
case             9 ms
catch           10 ms
ceil             8 ms
char            10 ms
chop            10 ms
clean           10 ms
cond             9 ms
cons             9 ms
constant         9 ms
context          9 ms
context?         8 ms
copy             9 ms
cos              8 ms
cosh             8 ms
count           10 ms
cpymem           9 ms
crc32           11 ms
crit-chi2       10 ms
crit-z          11 ms
curry            9 ms
date            12 ms
date-value      10 ms
debug           10 ms
dec             10 ms
def-new         11 ms
default         10 ms
define           9 ms
define-macro     9 ms
delete          11 ms
det              9 ms
difference      10 ms
div              9 ms
do-until         9 ms
do-while        10 ms
doargs           9 ms
dolist          10 ms
dostring        11 ms
dotimes         10 ms
dotree          11 ms
dump             9 ms
dup             10 ms
empty?          10 ms
encrypt          9 ms
ends-with       11 ms
env             12 ms
erf              9 ms
error-event      9 ms
eval            10 ms
eval-string     11 ms
exists          10 ms
exp              8 ms
expand           9 ms
explode         10 ms
factor          10 ms
fft              9 ms
filter          10 ms
find            11 ms
find-all        10 ms
first           10 ms
flat             9 ms
float            9 ms
float?           8 ms
floor           15 ms
flt              9 ms
for             10 ms
for-all         10 ms
format          11 ms
fv               9 ms
gammai           9 ms
gammaln          8 ms
gcd              9 ms
get-char         9 ms
get-float        9 ms
get-int         10 ms
get-long         9 ms
get-string       9 ms
global           8 ms
global?          9 ms
if              11 ms
if-not           8 ms
ifft             9 ms
import           8 ms
inc             11 ms
index            9 ms
int              9 ms
integer?         8 ms
intersect       10 ms
invert          10 ms
irr             10 ms
join             9 ms
lambda?          8 ms
last            11 ms
last-error       9 ms
legal?          10 ms
length          13 ms
let             10 ms
letex           10 ms
letn             9 ms
list            12 ms
list?            9 ms
local            9 ms
log              8 ms
lookup          10 ms
lower-case      10 ms
macro?           8 ms
main-args        9 ms
map             11 ms
mat             10 ms
match            9 ms
max             10 ms
member          10 ms
min             10 ms
mod              9 ms
mul              9 ms
multiply        15 ms
name             9 ms
new             12 ms
nil?            10 ms
normal          15 ms
not             10 ms
now             10 ms
nper             9 ms
npv             10 ms
nth              9 ms
null?           10 ms
number?          9 ms
or              10 ms
pack            10 ms
parse           10 ms
pmt              9 ms
pop             10 ms
pop-assoc       10 ms
pow              9 ms
pretty-print     9 ms
primitive?       8 ms
prob-chi2        9 ms
prob-z           8 ms
protected?       9 ms
push             9 ms
pv               9 ms
quote            8 ms
quote?           8 ms
rand             9 ms
random           9 ms
randomize        9 ms
read-expr       10 ms
ref              9 ms
ref-all          9 ms
regex           11 ms
regex-comp       9 ms
replace         10 ms
rest            10 ms
reverse         10 ms
rotate          10 ms
round           10 ms
seed             9 ms
select          11 ms
sequence        10 ms
series          11 ms
set              9 ms
set-locale      11 ms
set-ref         10 ms
set-ref-all     10 ms
setf            10 ms
setq            11 ms
sgn              9 ms
sin              8 ms
sinh             9 ms
slice           11 ms
sort            10 ms
source          11 ms
sqrt             8 ms
starts-with     10 ms
string          11 ms
string?          9 ms
sub              9 ms
swap            11 ms
sym              9 ms
symbol?          9 ms
symbols         13 ms
sys-error       11 ms
sys-info        10 ms
tan              8 ms
tanh             8 ms
throw           11 ms
throw-error     10 ms
time             8 ms
time-of-day      9 ms
title-case      11 ms
transpose       10 ms
trim             9 ms
true?            9 ms
unify           10 ms
unique          10 ms
unless           9 ms
unpack          10 ms
until            9 ms
upper-case       9 ms
uuid            10 ms
when             9 ms
while           10 ms
write-buffer    10 ms
write-line       9 ms
xml-parse       10 ms
xml-type-tags    9 ms
zero?           11 ms
|                9 ms
~               10 ms
2443 ms
performance ratio: 1.00 (1.0 on Mac OS X, 1.83 GHz Intel Core 2 Duo)

you also can specify the repetition number for more precise results:

Code: Select all

~> newlisp qa-bench report 10
!=              98 ms
$               91 ms
...
...
xml-parse      100 ms
xml-type-tags   93 ms
zero?           94 ms
|               93 ms
~               98 ms
24792 ms
performance ratio: 1.00 (1.0 on Mac OS X, 1.83 GHz Intel Core 2 Duo)

Although calibrated for equal time in each function originally, you see already a difference when you repeat the benchmark, because the environment on a time share OS constantly changes.

Running this under Linux on the same CPU completely changes the picture. Some functions suddenly perform double as fast or slow.

newdep · Post by **newdep** » Thu Jul 30, 2009 7:44 pm

Btw.. 1 benchmark is for sure, I think newlisp is the fastes lisp on the planet so we can skip that part on the ranking ;-)

DrDave · Post by **DrDave** » Fri Jul 31, 2009 7:00 am

I'm wondering why FLOOR runs about 2X slower than CEIL.

DrDave

pjot · Post by **pjot** » Fri Jul 31, 2009 12:56 pm

And you change the hardware it is running on, or only the OS and it puts the results on its head.

I am running all benchmarks on the same system in the same OS.

Running this under Linux on the same CPU completely changes the picture. Some functions suddenly perform double as fast or slow.

Good remark. This means that different benchmarks should run for a longer time, like 15 or 30 minutes.

So maybe we have to look at it the other way around: instead of running a program and see how long it takes to complete, run a program for some time, and then see how many actions were performed.

newdep · Post by **newdep** » Fri Jul 31, 2009 1:08 pm

So maybe we have to look at it the other way around: instead of running a program and see how long it takes to complete, run a program for some time, and then see how many actions were performed.

...Uuuhhhh...

If I know that in a textfile of 2Gig there are 100000 volwels,
...both programs come to the same result finaly...

So where is the advantage of doing this not measuring it against a competitive target..like time?

PS: The amount of actions does not always result in a faster/efficient result.

pjot · Post by **pjot** » Fri Jul 31, 2009 1:26 pm

Go tease some sheep, you compleat fan! ;-)

But the idea is not so difficult? Suppose we check the (add) statement. Let's run a newLisp program continuously adding 0.1 starting from 0, and let's run that program for 5 minutes.

Now, let's do the same thing in another language.

After those 5 minutes, we can see which value was reached, right?

Suppose newLisp reached 1000 and some other language X reached 500, we may safely conclude that newLisp is faster when it comes to adding floats? If we take newLisp as reference, it means language X is 50% slower?

pjot · Post by **pjot** » Fri Jul 31, 2009 1:47 pm

So let me give an example. This compiled BASIC program runs for 10 seconds adding 0.0001 to a variable.

Code: Select all

DECLARE t TYPE double

t = 0

start = SECOND(NOW)
end = start + 10

WHILE SECOND(NOW) NE end DO
    t = t + 0.0001
WEND

PRINT "Result is: ", t

END

Now, the equivalent of such a BASIC program in newLisp is like this (and correct me if I can implement it more efficiently):

Code: Select all

(set 't 0.0)

(set 'start (apply date-value (now)))

(set 'end (+ start 10))

(while (not (= (apply date-value (now)) end))
    (set 't (add t 0.0001))
)

(println "Result is: " t)
(exit)

When run the compiled BASIC program, the result is:

peter@solarstriker:~/programming$ ./benchmark
Result is: 574.7542999

When I run the newLisp program, the result is:

peter@solarstriker:~/programming$ newlisp benchmark.lsp
Result is: 373.0229

Both programs run on the same machine in the same Operating System, and to me it seems the results indicate that the BASIC compiler is faster? Again, maybe there can be an optimization for the newLisp program? What do you folks say about it?

Peter

Lutz · Post by **Lutz** » Fri Jul 31, 2009 2:54 pm

If this is compiled Basic (it seems to be, judging from the type declarations), then it looks pretty good for newLISP.

But still comparing compiled vs dynamic languages is comparing apples and oranges.

It is also not clear what this example really measures. Probably not floating point addition but rather internal time functions, or both.

By just changing the way time is measured newLISP is double as fast doing more than double the floating point additions than before and beats compiled BASIC:

Code: Select all

(set 't 0.0)
(set 'start (time-of-day))
(set 'end (+ start 10000))

(while (not (= (time-of-day) end))
    (set 't (add t 0.0001))
)

(println "Result is: " t)
(exit) 

Result is: 925.3656998 ; versus 412.2727 using 'date-value' on Mac Mini 1.83 Ghz

But my main point is, that languages should not be compared by just testing one or two things, in this case floating point addition and retrieval of system time.

The net is full of this type of toy comparisons doing just some little thing. They make for lots of hits in a block-post but really don't say anything about the programming languages involved.

The best way to benchmark is, to either benchmark lots of well defined specific operations (similar to what qa-bench is doing) or to benchmark well defined real world tasks, big enough to work a broader area of the function repertoire of the language.

pjot · Post by **pjot** » Fri Jul 31, 2009 5:37 pm

If this is compiled Basic, then it looks pretty good for newLISP.

It is compiled BASIC all right and indeed, newLisp runs very well!!

But still comparing compiled vs dynamic languages is comparing apples and oranges.

In this case, I am particularly interested in newLisp versus any compiled language. I do want to see how well newLisp performs compared with a compiled binary. One of the traditional objections against interpreted languages is, that they are slow. I already observed a very good performance with newLisp programs, but how well does newLisp perform?

It is also not clear what this example really measures. Probably not floating point addition but rather internal time functions, or both.

Fully agreed. This will always be a problem of benchmarks. Maybe we should say: a similar program with the exact same functionality.

But my main point is, that languages should not be compared by just testing one or two things, in this case floating point addition and retrieval of system time.

Obviously not! This was just an example. I already was thinking of multiple tests.
In the end one never will get the exact performance. Nevertheless, some sort of global indication is sufficient for me.
Your code indeed improves the performance tremendously. If I also improve the BASIC code in a similar way, with compile optimizations (-fnative) then these are the results:

Code: Select all

DECLARE t TYPE double

t = 0

end = NOW + 10

WHILE NOW < end DO
    t = t + 0.0001
WEND

PRINT "Result is: ", t

Result is: 10543.5752

Code: Select all

#!/bin/newlisp

(set 't 0.0)
(set 'end (+ (time-of-day) 10000))

(while (< (time-of-day) end)
    (set 't (add t 0.0001))
)

(println "Result is: " t)
(exit)

Result is: 563.1762999

So newLisp runs 94.66% slower compared to the compiled BASIC binary with the same functionality.

Again, it is admitted that the actual test is blurry, therefore I will make more tests to see the difference. The performance on lists for example, will be much better than a similar functionality in BASIC (arrays?). Probably there are more typical Lisp aspects where even a BASIC-compiler will be beaten.

Peter

Lutz · Post by **Lutz** » Fri Jul 31, 2009 6:06 pm

A good native compile will always be at least 50 times faster than a interpreted language. Here are some interesting comparisons for fibonacci:

http://dada.perl.it/shootout/fibo.html

and here for other algorithms showing different rankings:

http://dada.perl.it/shootout

It is interesting to see how well JIT (Just In Time) compilation is doing for Java on number crunching tasks.

cormullion · Post by **cormullion** » Sat Aug 01, 2009 11:52 am

I thought I'd start adding to the list you started, Lutz:

Code: Select all

2282 ms         ; pr: 0.9 ; Mac OS X 2.0 GHz Intel Core 2 Duo 
1658 ms         ; pr: 0.7 ; FreeBSD at NFSHOST (no idea what CPU - ?)

I'll add more results for my motley collection of computers when I get a Roundtoit. I expect to hit the 5 second mark later... :)

newdep · Post by **newdep** » Sat Aug 01, 2009 1:30 pm

right in the middle

1943 ms on at AMD 64 3200+
performance ratio: 0.8 (1.0 on Mac OS X, 1.83 GHz Intel Core 2 Duo)

Joe · Post by **Joe** » Sat Aug 01, 2009 2:37 pm

1276 ms = pe: 0.6 (1.0 on Mac OS X, 1.83 GHz Intel Core 2 Duo)

Ran 5 times, all results within a range of 1276 - 1281 ms.

cormullion · Post by **cormullion** » Sat Aug 01, 2009 3:41 pm

Joe: would you like to swap your machine with mine? :)

edit: I found a supercomputer upstairs you might prefer:

Code: Select all

12606 ms         ; pr: 5.0 ; Mac OS X 700 MHz PowerPC G3

that's from 2001, that machine...

Joe · Post by **Joe** » Sat Aug 01, 2009 4:09 pm

cormullion wrote:Joe: would you like to swap your machine with mine? :)

The program reported the wrong hardware. It is actually:

2 x 3.2 GHz Quad-Core Intel Xeon

But you probably won't want it. It has some fingerprints and dust on it.

cormullion · Post by **cormullion** » Sat Aug 01, 2009 4:15 pm

Joe wrote:
cormullion wrote:Joe: would you like to swap your machine with mine? :)
The program reported the wrong hardware. It is actually:

2 x 3.2 GHz Quad-Core Intel Xeon

But you probably won't want it. It has some fingerprints and dust on it.

;)

That 'wrong hardware' is actually the reference machine used (ie lutz' machine is a "Mac OS X, 1.83 GHz Intel Core 2 Duo"). It would be cool to be able to find out easily what's running underneath, but I don't think there's a way.

newdep · Post by **newdep** » Sat Aug 01, 2009 5:47 pm

Code: Select all


34928 ms on a Nokia N810 armv61
performance ratio: 13.7

PS: well its 14 times smaller then a pc too.. so a small excuse is permitted ;-)

pjot · Post by **pjot** » Sat Aug 01, 2009 6:27 pm

With the 64bit newLisp

newLISP v.10.1.1 64-bit on Linux IPv4, execute 'newlisp -h' for more info.

my computer gets this result:

1124 ms on a 2.2Ghz AMD Phenom(tm) 9550 Quad-Core Processor
performance ratio: 0.5 (1.0 on Mac OS X, 1.83 GHz Intel Core 2 Duo)

xytroxon · Post by **xytroxon** » Sat Aug 01, 2009 11:06 pm

newdep wrote:
Code: Select all
34928 ms on a Nokia N810 armv61
performance ratio: 13.7
PS: well its 14 times smaller then a pc too.. so a small excuse is permitted ;-)

My old country home net-surfer over 26400 baud phone line has got you all beat!

Intel Pentium 120 - P54CQS - 120MHz - cache (none) - 11.9 W
Win 95 (1996 BIOS copyright) upgraded to Win 98 and 16 GB HD in 1999, no USB, but has holes in the printed circuit board when USB hardware becomes more widely available ;0)

You've come a long way Intel baby since Win 95/98 "daze"!

Code: Select all

!=             330 ms
$              440 ms
%              280 ms
&              380 ms
*              330 ms
+              280 ms
-              330 ms
/              270 ms
<              280 ms
<<             270 ms
<=             270 ms
=              280 ms
>              270 ms
>=             220 ms
>>             330 ms
NaN?           550 ms
^              390 ms
abs            330 ms
acos           490 ms
acosh          390 ms
add            320 ms
address        330 ms
amb            330 ms
and            220 ms
append         270 ms
apply          270 ms
args           390 ms
array          270 ms
array-list     270 ms
array?         390 ms
asin           980 ms
asinh          1100 ms
assoc          330 ms
atan           390 ms
atan2          380 ms
atanh          390 ms
atom?          330 ms
base64-dec     600 ms
base64-enc     440 ms
bayes-query    220 ms
bayes-train    770 ms
begin          270 ms
beta           610 ms
betai          600 ms
bind           390 ms
binomial       600 ms
bits           600 ms
case           440 ms
catch           60 ms
ceil           380 ms
char           940 ms
chop           330 ms
clean           50 ms
cond           280 ms
cons           270 ms
constant       220 ms
context        280 ms
context?       270 ms
copy           280 ms
cos            380 ms
cosh           500 ms
count          330 ms
cpymem         380 ms
crc32          170 ms
crit-chi2      760 ms
crit-z         500 ms
curry          160 ms
date           1820 ms
date-value     270 ms
debug          610 ms
dec            430 ms
def-new        330 ms
default        170 ms
define         270 ms
define-macro   330 ms
delete         1600 ms
det            330 ms
difference     330 ms
div            320 ms
do-until       330 ms
do-while       440 ms
doargs         330 ms
dolist         390 ms
dostring       380 ms
dotimes        390 ms
dotree         380 ms
dump           380 ms
dup            270 ms
empty?         330 ms
encrypt        330 ms
ends-with      490 ms
env            720 ms
erf            330 ms
error-event    280 ms
eval           270 ms
eval-string    440 ms
exists          60 ms
exp            430 ms
expand         280 ms
explode        330 ms
factor         330 ms
fft            490 ms
filter          60 ms
find           440 ms
find-all       330 ms
first          330 ms
flat           330 ms
float          980 ms
float?         280 ms
floor          440 ms
flt            330 ms
for            330 ms
for-all         50 ms
format         1700 ms
fv             330 ms
gammai         550 ms
gammaln        440 ms
gcd            220 ms
get-char       390 ms
get-float      490 ms
get-int        1320 ms
get-long       440 ms
get-string     380 ms
global         220 ms
global?        280 ms
if             270 ms
if-not         330 ms
ifft           500 ms
import         270 ms
inc            390 ms
index           50 ms
int            610 ms
integer?       220 ms
intersect      330 ms
invert         330 ms
irr            440 ms
join           330 ms
lambda?        280 ms
last           330 ms
last-error     440 ms
legal?         380 ms
length         440 ms
let            330 ms
letex          330 ms
letn           270 ms
list           280 ms
list?          220 ms
local          330 ms
log            380 ms
lookup         330 ms
lower-case     330 ms
macro?         220 ms
main-args      330 ms
map            330 ms
mat            330 ms
match          330 ms
max            380 ms
member         440 ms
min            330 ms
mod            330 ms
mul            390 ms
multiply       330 ms
name           440 ms
new            330 ms
nil?           380 ms
normal         490 ms
not            280 ms
now            2750 ms
nper           430 ms
npv            330 ms
nth            330 ms
null?          270 ms
number?        330 ms
or             390 ms
pack           490 ms
parse          390 ms
pmt            440 ms
pop            380 ms
pop-assoc      330 ms
pow            330 ms
pretty-print   440 ms
primitive?     270 ms
prob-chi2      500 ms
prob-z         330 ms
protected?     220 ms
push           380 ms
pv             330 ms
quote          330 ms
quote?         220 ms
rand           660 ms
random         380 ms
randomize      330 ms
read-expr      440 ms
ref            390 ms
ref-all        330 ms
regex          440 ms
regex-comp     490 ms
replace        220 ms
rest           440 ms
reverse        380 ms
rotate         390 ms
round          1430 ms
seed           110 ms
select         380 ms
sequence       330 ms
series         270 ms
set            280 ms
set-locale     990 ms
set-ref        380 ms
set-ref-all    330 ms
setf           170 ms
setq           330 ms
sgn            330 ms
sin            440 ms
sinh           430 ms
slice          380 ms
sort            60 ms
source         3460 ms
sqrt           330 ms
starts-with    550 ms
string         7630 ms
string?        440 ms
sub            330 ms
swap           390 ms
sym            380 ms
symbol?        390 ms
symbols        490 ms
sys-error      490 ms
sys-info       330 ms
tan            440 ms
tanh           390 ms
throw           50 ms
throw-error    330 ms
time           9450 ms
time-of-day    3510 ms
title-case     390 ms
transpose      440 ms
trim           220 ms
true?          330 ms
unify          380 ms
unique         330 ms
unless         330 ms
unpack         440 ms
until          330 ms
upper-case     330 ms
uuid           2200 ms
when           270 ms
while          440 ms
write-buffer   440 ms
write-line     440 ms
xml-parse      380 ms
xml-type-tags  330 ms
zero?          280 ms
|              440 ms
~              1590 ms
127470 ms
performance ratio: 50.0 (1.0 on Mac OS X, 1.83 GHz Intel Core 2 Duo)

Add another 5 to the performance ratio for printing output to screen instead of to a file...

The funny thing is, that at work, I can't read web pages any faster on a newer machine ;o)

Lookup your evil, power hungry, global warming Intel beast chip here:
http://www.processor-comparison.com/power.html

-- xytroxon

Elica · Post by **Elica** » Tue Aug 04, 2009 11:57 am

A little bit too late, but here are my two cents on the topic of benckmarks (I did a lot for the Lhogho compiler).

1. I think it is impossible to write a satisfactory one-liner for benchmarking.

2. A true benchmark does not produce a scalar (i.e. a single value), but a vector - i.e. there are many individual performance values, measuring different aspects. For example, scores are for integer arithmetics, for fp arithmetics, for string manipulation, for memory management, for passing parameters, for searching local variables, etc. etc.

3. Except for a vector of many specialized mini-benchmarks, there is an option to use a more complex system, which measures a predefined set of functions. For example, a test for calculation a recursive math function spends time for real calculaitons, for passing parameters, for low-level function invokation, and so on.

4. Simple/short benchmarks can be seriously affected by the Operating system (especially what it is doing in the background while you do the becnmark)

5. The way a code is compiled can also lead to drastically different results. For example, if newLisp is compiled with and without optimizations, this may lead to noticeable difference in performance.

6. What does one-line mean in a language where newlines are just whitespaces?

7. For a one-liner that measure integer arithmetics you can try the 3x+1 problem.