Difficult one

For the Compleat Fan
Locked
Jeff
Posts: 604
Joined: Sat Apr 07, 2007 2:23 pm
Location: Ohio
Contact:

Difficult one

Post by Jeff »

I wrote a little US-style proper name parser for some common name formats (to help with some large lists of data I needed to import with various name formats). I wrote one in python, which is not very optimized for this sort of thing, and one in newlisp, which I believe should be, and the one in newlisp is running much slower.

The problem seems to be with my matching algorithm. It runs through around ten regexp patterns on each name (after 'normalizing' the name) until it hits one that matches. I've arranged the patterns as best as I can to match the most efficiently. However, there is a huge speed difference between the normalization function, which does a bunch of replaces, and the parse-name function, which does a bunch of regex searching.

After running through nearly 8000 names, the python version has about a half a second lead on my quad core g5. They both use nearly the same algorithm to do matching and replacing (in fact, I copied and pasted most of the regexps).

Any ideas?

Code: Select all

#!/usr/bin/env newlisp

(define (int->roman num , result)
  (set 'result "")
  (let ((numerals '((1 "I") (4 "IV") (5 "V") (9 "IX") (10 "X") (40 "XL")
                  (50 "L") (90 "XC") (100 "C") (400 "CD") (500 "D") (900 "CM")
                  (1000 "M"))))
       (dolist (numeral-set (reverse numerals))
               (let ((value (first numeral-set)) (numeral (last numeral-set)))
                    (while (>= num value)
                      (begin
                        (set 'result (join (list result numeral)))
                        (set 'num (- num value))))))) result)

(constant '*patterns*
	'(({^\b([\w\-\']{2,})\b\s\b((?:St\. )*[\w\-\']{2,})\b$} (first-name last-name))
	  ({^\b([\w\-\']{2,})\b\s\b([\w\-\']{2,})\b\s\b((?:St\. )*[\w\-\']{2,})\b$} (first-name middle-name last-name))
	  ({^\b([\w\-\']{2,})\b\s\b(\w\.)\s\b((?:St\. )*[\w\-\']{2,})\b$} (first-name middle-name last-name))
	  ({^\b(\w\.)\s\b([\w\-\']{2,})\b\s\b((?:St\. )*[\w\-\']{2,})\b$} (first-name middle-name last-name))
	  ({^\b(\w\.)\s\b(\w\.)\s\b((?:St\. )*[\w\-\']{2,})\b$} (first-name middle-name last-name))
	  ({^\b([\w\-\']{2,})\b\s\b(\w\.)\s\b((?:St\. )*[\w\-\']{2,}\b\s\b[\w\-\']{2,})\b$} (first-name middle-name last-name))
	  ({^\b([\w\-\']{2,})\b\s\b(\w\.\s\w\.)\s\b((?:St\. )*[\w\-\']{2,})\b$} (first-name middle-name last-name))
	  ({^\b([\w\-\']{2,}\b\s\b[\w\-\']{2,})\b\s\b(\w\.)\s\b((?:St\. )*[\w\-\']{2,})\b$} (first-name middle-name last-name))
	  ({^\b([\w\-\']{2,})\b\s\b([\w\-\']{2,}\b\s\b[\w\-\']{2,})\b\s\b((?:St\. )*[\w\-\']{2,})\b$} (first-name middle-name last-name))))

(constant '*romans-list* (map 'int->roman (sequence 1 25)))
(constant '*romans* (format {\s(?P<suffix>%s)$} 
  (join (map (lambda (i) 
                     (format "(?:%s)" i)) 
        *romans-list*) "|")))
(constant '*suffixes* {\b(?P<suffix>[JjSs][Rr])\.*\b})
(set '*parse-errors* '())

(define (normalize nm)
  (let ((comma-matches (find-all {,(?!(\s*[JjSs][Rr]\b\.))} nm)))       
       (cond ((= (length comma-matches) 2)
              (replace {^\s*(.+?)\s*,\s*(.+?)\s*,\s*(.+?)\s*$} nm (format "%s %s %s" $3 $1 $2) 0))
             ((= (length comma-matches) 1)
              (replace {^\s*(.+?)\s*,\s*(.+?)\s*$} nm (format "%s %s" $2 $1) 0)))
       (replace "," nm "")
       (replace {(\.)\s*} nm ". " 0)
       (replace {\b(\w)\b[^$\.]} nm (format "%s. " (upper-case $1)) 0)
       (replace {\b([\w\-\']+?)\b} nm (title-case $1 true) 0)
       (replace {\b([JjSs][Rr])\b\.*} nm (format "%s." 0))
       (replace *romans* nm (format " %s" (upper-case $1)) 1)
       (set 'nm (trim nm))
       (replace {\s+} nm " " 0)) nm)

(define (match-pattern str pattern , p-match)
  (set 'data '((first-name nil) (middle-name nil) (last-name nil)))
  (set 'p-match (regex (first pattern) str))
  (if (not (nil? p-match))
      (begin
        (dolist (x (last pattern))
          (replace-assoc x data (list x (p-match (* 3 (+ 1 $idx))))))) nil))

(define (parse-name nm , data original-name)
  (set 'original-name nm)
  (set 'nm (normalize nm))
  (set 'data '((first-name nil) (middle-name nil)
              (last-name nil) (suffix nil)))
  (let ((suffix-match (regex {\b(?P<suffix>[JjSs][Rr])\.*\b} nm))) ;("Jr" 18 2 "Jr" 18 2)
       (if (not (nil? suffix-match))
           (begin
             (replace-assoc 'suffix data (list 'suffix (suffix-match 3)))
             (replace (format " %s\.*" (last (assoc 'suffix data))) nm "" 1))
           (begin
             (set 'suffix-match (regex *romans* nm 1))
             (if (not (nil? suffix-match))
                 (begin
                   (replace-assoc 'suffix data (list 'suffix (suffix-match 3)))
                   (replace (format " %s" (last (assoc 'suffix data))) nm "" 1))))))
  (let ((good-match (catch
                      (dolist (pattern *patterns*)
                        (let ((possible-match (match-pattern nm pattern)))
                             (if (not (nil? possible-match))
                               (throw possible-match)))))))
       (if (nil? good-match)
           (push original-name *parse-errors* -1)
           (dolist (cell good-match)
                   (replace-assoc (first cell) data cell)))) data)

(load "sample-names.lsp")

(map parse-name nm-list)
(println *parse-errors*)
Jeff
=====
Old programmers don't die. They just parse on...

Artful code

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

I ask these questions sometimes, too. The usual outcome is that if you write Python (or Perl) in newLISP, it doesn't perform quite as well as Python written in Python, Perl in perl, or newLISP in newLISP. Having said that, though, the problem comes when there's no other obvious way to write things.

If i get time I'd like to run through it - perhaps you could show a bit of sample data... newLISP has a few useful timing functions...

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

I re-did the Debian language shootout regex benchmark on my MacMini G4 and it shows newLISP beeing faster in regex pattern matching. So I suppose the slower behaviour you see in newLISP is not caused by regex pattern matching but by something else. The code is too much and the time to little ;-), for me to analyze this further, but there must be a way to make this faster.

time ./regexmatch.python 1000 < regexmatch

real 0m0.676s
user 0m0.443s
sys 0m0.076s


time ./regexmatch.newlisp 1000 < regexmatch.input

real 0m0.373s
user 0m0.311s
sys 0m0.008s


The benchmarks from here: http://newlisp.org/benchmarks/ are for Intel CPUs on Win32, but they give a few hints where the differences in peformance are for Perl, Python and newLISP.


Lutz

nigelbrown
Posts: 429
Joined: Tue Nov 11, 2003 2:11 am
Location: Brisbane, Australia

Post by nigelbrown »

Perhaps a function timing would show the slow point?
These links suggest approaches
Modify function defs to collect timing data when called : http://www-users.cs.umn.edu/~gini/lisp/metering.cl

Interrupt program and record what function it is in : http://wwwcgi.rdg.ac.uk:8081/cgi-bin/cg ... lp/profile and http://www.franz.com/support/documentat ... filing.htm
Maybe this could be done in newlisp using the (timer ) function if there was a way of checking what function has been interrupted?

Nigel

Jeff
Posts: 604
Joined: Sat Apr 07, 2007 2:23 pm
Location: Ohio
Contact:

Post by Jeff »

Corm: It's not very python-esque code. In python, I tend to rely on list-comprehensions and then use pretty lispy code, because lispy code tends to outperform (except in PHP, where function overhead is huge and you can't pass functions as args [only strings that get eval'd into functions or PHP's crippled lambda, which is not subject to garbage collection]). The ^technique^ is the same - read the file in, normalize the names so matching is consistent (capitalization, periods after initials, order of names, etc).

Lutz: I know, I've done my own tests and it is much faster at both reading in file data and regexp matching. That's what is so frustrating about the problem.

Nigel: I know where the problem is. newLisp outperforms python in the first stages of reading in the file and then normalizing the names with regexp replacements. The problem is in the short-circuit logic that runs through the series of regexps in *patterns* and stops at the first good match. It is taking significantly longer in newLisp than in python - enough that it kills the performance gains on the first section.

I'm pretty sure the problem is not in the technique (although parsing by whitespace and lopping off slices would be faster, it is not extensible or as elegant as a solution using pattern recognition; besides, emulating how my brain recognizes a name is much more lispy ;)). The problem, I think, is in the implementation.

In common lisp, I would use lots of nested let statements and recursion, but newlisp's docs say that often, dolist and the other iterators are faster than recursion (since they may be using recursion internally) so I end up using do and begin. I also have a difficult time in newlisp deciding when to use nil params in the function def or let statements. Something about how I wrote parse-name is completely inefficient :(

I just noticed - in match-pattern, I did not declare 'data anywhere. It is reassigning the same global symbol over and over. That couldn't be causing the slowdown, could it?
Jeff
=====
Old programmers don't die. They just parse on...

Artful code

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

Several recommendations to speed up the code and other comments:

instead of:

Code: Select all

(if (not (nil? suffix-match)) ...
say:

Code: Select all

if( (not suffix-match) ...)
; or even faster
(unless suffix-match ...)
instead of:

Code: Select all

(last (assoc 'suffix data))
say:

Code: Select all

(lookup 'suffix data)
'lookup' will always pick the last in the association (if not given an index)

instead of:

Code: Select all

 (if (nil? good-match) ...)
try also 'not', which will trigger on 'nil' and (), while 'nil?' means strictly 'nil'. Not sure about the logic here and the impact is probably minimal, but its better English :-)

Code: Select all

(if (not good-match) ...)

I also have a difficult time in newlisp deciding when to use nil params in the function def or let statements.
All parameters in p1,p2,p3 (define (foo p1 p2 p3) ...) are set to 'nil' if not filled in by the caller.

In a 'let' statement if you want to have the params imitialized to 'nil' then better use 'local' (much less overhead/faster):

instead of:

Code: Select all

(let (param nil) ....)
do:

Code: Select all

(local (param) ...)
It is reassigning the same global symbol over and over. That couldn't be causing the slowdown, could it?
If that global variable always gets the same value, I would set it only once. But which var/statement in match-pattern do you mean?

I am not sure how much the above changes will influence the speed, but in any case they will make the code shorter and more readable.

You seem to compose the name records bye doing 'replace-assoc' in the 'data' list:

Code: Select all

 (set 'data '((first-name nil) (middle-name nil) (last-name nil)))
Would it not be better to compose the whole name records using list' statements instead of replace-assoc statements? (perhaps I am not understanding the code well)

To understand the code better it would be helpful if you can give us a little "sample-names.lsp" with just a few records. Perhaps then we can better understand the logic of 'parse-name' and 'match-pattern' and one could try out those functions in an isolated manner to understand what is going on and suggest a more efficient algorithm.

Lutz

Jeff
Posts: 604
Joined: Sat Apr 07, 2007 2:23 pm
Location: Ohio
Contact:

Post by Jeff »

Lutz,

Thanks for the advice. Particularly with local and lookup. As far as assigning data at the entrance of match-pattern, that function evolved quite a bit and I didn't clean it up properly. In the original form, I needed to have it whether or not the rest of the function proceeded as a shortcut for regexps that did not have one of the elements (i.e. no middle name/initial).

The sample names file is just as it sounds, a list of strings, representing names in various formats:

Code: Select all

(set 'nm-list '("Some P. Body", "Guy, Some Other", "John Q Programmer", "Adams, Sweet Fanny", "I. P. Freely", "Name That Won't Match", "Name That Matches, Jr."))
The idea is to first normalize them, so that they all are properly capitalized, abbreviations end in periods, and are listed in First name or initial, middle name or initial, last name order. That is the normalize function.

The parse-name function then attempts to use a series of regexp patterns to recognize the parts of the name. I wanted to use regexp so that it could be easily expanded later and because when writing lisp, it's always clever to try and emulate the human mind (although not always efficient ;)).

parse-name normalizes the name, then uses the function match-pattern on the result, which tries to match the name using each regexp in *patterns* until one matches, short circuiting at this point.

My purpose in splitting parse-name and match-pattern was to lower the overhead of any one function, since parse-name would be used in iteration.
Jeff
=====
Old programmers don't die. They just parse on...

Artful code

Jeff
Posts: 604
Joined: Sat Apr 07, 2007 2:23 pm
Location: Ohio
Contact:

Post by Jeff »

I've gotten it down to matching the speed of python:

Code: Select all

#!/usr/bin/env newlisp

(define (int->roman num , result)
  (set 'result "")
  (let ((numerals '((1 "I") (4 "IV") (5 "V") (9 "IX") (10 "X") (40 "XL")
                  (50 "L") (90 "XC") (100 "C") (400 "CD") (500 "D") (900 "CM")
                  (1000 "M"))))
       (dolist (numeral-set (reverse numerals))
               (let ((value (first numeral-set)) (numeral (last numeral-set)))
                    (while (>= num value)
                      (begin
                        (set 'result (join (list result numeral)))
                        (set 'num (- num value))))))) result)

(constant '*patterns*
	'(({^\b([\w\-\']{2,})\b\s\b((?:St\. )*[\w\-\']{2,})\b$}
	    (first-name last-name))
	  ({^\b([\w\-\']{2,})\b\s\b([\w\-\']{2,})\b\s\b((?:St\. )*[\w\-\']{2,})\b$}
	    (first-name middle-name last-name))
	  ({^\b([\w\-\']{2,})\b\s\b(\w\.)\s\b((?:St\. )*[\w\-\']{2,})\b$}
	    (first-name middle-name last-name))
	  ({^\b(\w\.)\s\b([\w\-\']{2,})\b\s\b((?:St\. )*[\w\-\']{2,})\b$}
	    (first-name middle-name last-name))
	  ({^\b(\w\.)\s\b(\w\.)\s\b((?:St\. )*[\w\-\']{2,})\b$}
	    (first-name middle-name last-name))
	  ({^\b([\w\-\']{2,})\b\s\b(\w\.)\s\b((?:St\. )*[\w\-\']{2,}\b\s\b[\w\-\']{2,})\b$}
	    (first-name middle-name last-name))
	  ({^\b([\w\-\']{2,})\b\s\b(\w\.\s\w\.)\s\b((?:St\. )*[\w\-\']{2,})\b$}
	    (first-name middle-name last-name))
	  ({^\b([\w\-\']{2,}\b\s\b[\w\-\']{2,})\b\s\b(\w\.)\s\b((?:St\. )*[\w\-\']{2,})\b$}
	    (first-name middle-name last-name))
	  ({^\b([\w\-\']{2,})\b\s\b([\w\-\']{2,}\b\s\b[\w\-\']{2,})\b\s\b((?:St\. )*[\w\-\']{2,})\b$}
	    (first-name middle-name last-name))))

(constant '*romans-list* (map 'int->roman (sequence 1 25)))
(constant '*romans* (format {\s(?P<suffix>%s)$} 
  (join (map (lambda (i) 
                     (format "(?:%s)" i)) 
        *romans-list*) "|")))
(constant '*suffixes* {\b(?P<suffix>[JjSs][Rr])\.*\b})
(set '*parse-errors* '())

(define (normalize nm)
  (let ((comma-matches (find-all {,(?!(\s*[JjSs][Rr]\b\.))} nm)))       
       (cond ((= (length comma-matches) 2)
              (replace {^\s*(.+?)\s*,\s*(.+?)\s*,\s*(.+?)\s*$} nm
                (format "%s %s %s" $3 $1 $2) 0))
             ((= (length comma-matches) 1)
              (replace {^\s*(.+?)\s*,\s*(.+?)\s*$} nm
                (format "%s %s" $2 $1) 0)))
       (replace "," nm "")
       (replace {(\.)\s*} nm ". " 0)
       (replace {\b(\w)\b[^$\.]} nm (format "%s. " (upper-case $1)) 0)
       (replace {\b([\w\-\']+?)\b} nm (title-case $1 true) 0)
       (replace {\b([JjSs][Rr])(\.*)} nm (format "%s." $1) 1)       
       (replace *romans* nm (format " %s" (upper-case $1)) 1)
       (set 'nm (trim nm))
       (replace {\s+} nm " " 0)) nm)

(define (match-pattern str pattern name-if-error) ; name-if-error can be orig.
  (if pattern                                     ; if suffix parsed off str
    (map list (last pattern) (rest (filter string? 
                                   (regex (first pattern) str))))
    (begin
      (push (or name-if-error str) *parse-errors* -1)
      (list '(first-name nil) '(middle-name nil) '(last-name nil)))))

(define (parse-name str , data original-str)
  (set 'original-str str)
  (set 'str (normalize str))
  (push (list 'suffix (if (regex {\b(?P<suffix>[JjSs][Rr]\.*)$} str) $1 nil)) 
        data -1)
  (append (match-pattern (replace {\s+\b(?P<suffix>[JjSs][Rr]\.*)$} str "" 0)
            (exists (lambda (pattern) (regex (first pattern) str)) *patterns*) 
                    original-str) data))

(define (format-name name-list , str)
  (if (string? name-list) (set 'name-list (parse-name name-list)))
  (if (lookup 'first-name name-list)
    (begin
      (set 'str (lookup 'first-name name-list))
      (if (lookup 'middle-name name-list) (write-buffer str 
        (format " %s" (lookup 'middle-name name-list))))
      (write-buffer str (format " %s" (lookup 'last-name name-list)))
      (if (lookup 'suffix name-list) (write-buffer str 
        (format ", %s" (lookup 'suffix name-list))))) nil))

(define (format-name name-list)
  (join (filter string? (map last
    (if (string? name-list) (parse-name name-list) name-list))) " "))

(load "sample-names.lsp")

(map parse-name nm-list)
(println *parse-errors*)
However, if you run (map format-name nm-list), python beats the pants off of newlisp. I also tried name-format as a (format) expression (with conditionals to avoid extra spaces (and errors) in the case of nil values), but that took *much* longer (about 3/4 of a second over 8000 names on a quad core g5) than just joining them.
Jeff
=====
Old programmers don't die. They just parse on...

Artful code

rickyboy
Posts: 607
Joined: Fri Apr 08, 2005 7:13 pm
Location: Front Royal, Virginia

Post by rickyboy »

Who wins at (map parse-name nm-list)? Python or newLISP?
(λx. x x) (λx. x x)

Jeff
Posts: 604
Joined: Sat Apr 07, 2007 2:23 pm
Location: Ohio
Contact:

Post by Jeff »

newLisp by a hair (about .1-.2 seconds over 8000 names on a quad-core g5).
Jeff
=====
Old programmers don't die. They just parse on...

Artful code

rickyboy
Posts: 607
Joined: Fri Apr 08, 2005 7:13 pm
Location: Front Royal, Virginia

Post by rickyboy »

So there's something in the rest of format-name which causes "python [to beat] the pants off of newlisp" when running (map format-name nm-list)?

BTW, are you using map in Python or list comprehensions?

--Rick
(λx. x x) (λx. x x)

Jeff
Posts: 604
Joined: Sat Apr 07, 2007 2:23 pm
Location: Ohio
Contact:

Post by Jeff »

The python code to format a name is slightly more complex:

Code: Select all

def format(name_obj):
	if isinstance(name_obj, str):
		name_obj = parse(name_obj)
	formatted = ' '.join([format_first(name_obj), format_last(name_obj)])
	return formatted

def format_first(name_obj):
	if name_obj['middle']:
		return ' '.join([name_obj['first'], name_obj['middle']])
	else:
		return name_obj['first']

def format_last(name_obj):
	if name_obj['suffix']:
		if name_obj['suffix'] in ROMANS_LIST:
			return ' '.join([name_obj['last'], name_obj['suffix']])
		else:
			return ', '.join([name_obj['last'], name_obj['first']])
	else:
		return name_obj['last']
I separated out first and last in the python (which is my production version atm) so I could more easily compose records with first and last name sections separated. In newlisp, though, it seems like there is a proportionally higher function overhead than in python - I've found that python speeds up when you break operations up, but sometimes newlisp can get bogged down if you factor a function too finely.

Here is the code I am comparing against:

Code: Select all

def main():
	from sample_names import name_list
	
	for name in name_list:
		try:
			parse(name)
		except ParseError, e:
			print e
Using a list comprehension, like [parse(name) for name in name_list] actually makes it quite a bit faster, but doesn't allow for graceful exception handling, since a bad parse throws a ParseError.
Jeff
=====
Old programmers don't die. They just parse on...

Artful code

rickyboy
Posts: 607
Joined: Fri Apr 08, 2005 7:13 pm
Location: Front Royal, Virginia

Post by rickyboy »

Jeff wrote:In newlisp, though, it seems like there is a proportionally higher function overhead than in python - I've found that python speeds up when you break operations up, but sometimes newlisp can get bogged down if you factor a function too finely.
Yes, I've noticed this too. If I could have low lambda overhead and tail call optimization, newLISP might be my favorite language of all time! :-)

On the original subject, I don't then see anything beyond lambda overhead that could be the cause of the problem. I *was* gonna say that newLISP might be slower because your newLISP calling code was list-building and your Python's wasn't, but you said that the list comprehension call was actually faster. :-(
(λx. x x) (λx. x x)

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

In newlisp, though, it seems like there is a proportionally higher function overhead than in python
The Fibonacci benchmark here: http://newlisp.org/benchmarks/

which measures mostly call overhead would suggest the opposite.

To figure really out where the speed differences are between 2 languages is difficult when using a relative complex program as Jeff's name parser. One has to get down to very small code snippets and measure execution time of those to nail down where the differences are.

Regarding "tail call optimization":

Tail call optimization is good if you prefer expressing algorithms via recursion versus iteration. But speed performance wise iteration can be implement more efficiently compared to tail recursion optimization, which usually is implemented using continuations.

The paradox about tail recursion optimization is, that you can only optimize the tail part of the recursion, which at the same time is the part easy to express or transform into iteration (because it is the last call in the function).

In the end tail-recursion-optimization versus iteration comes down to a programmer preference of rather using one versus the other method when expressing algorithms.

Lutz

rickyboy
Posts: 607
Joined: Fri Apr 08, 2005 7:13 pm
Location: Front Royal, Virginia

Post by rickyboy »

Lutz wrote:In the end tail-recursion-optimization versus iteration comes down to a programmer preference of rather using one versus the other method when expressing algorithms.
Absolutely! This is the reason I prefer tail-recursion over iteration -- it's not an implementation issue for me, it's an expression issue. The biggest reason for me is that recursive expressions are *much* easier to reason about, concerning both proofs of program correctness and getting as much of the context of the program loaded into programmer headspace as possible. These are also the reasons I like using FP HOFs also: if I can prove the correctness of 'map' or 'fold', any function that uses 'map' or 'fold' has a relatively easy proof of correctness. And the expression is concise and clear.

Since newLISP doesn't support tail call optimization, I have coded my HOFs iteratively -- I just have to live with that (*in my opinion*) ugliness. But I wall them off into well-used HOF definitions; the functions, at a higher level in the system, that use the HOFs can be devoid of iterative code, thank God. :-)
(λx. x x) (λx. x x)

Jeff
Posts: 604
Joined: Sat Apr 07, 2007 2:23 pm
Location: Ohio
Contact:

Post by Jeff »

I like recursion because I think it is an elegant, DRY solution. However, I do realize it's often not the most efficient way to do things.

In comparing language speeds, another thing to think about is that newLisp always gets a head start due to its interpreter size. On just a few hundred iterations, Python will never come close to newLisp because the actions take less time than loading its interpreter.

If the program runs for several seconds or more, though, then you can start to marginalize the load time. Python seems to run pretty well once it is loaded.

The problem I'm trying to solve is that newLisp should be killing Python in every aspect of this program. When testing feature for feature, newLisp beats the pants off of Python for regexp, list assembly, et al. Therefore, it must be the programmer, not the language, and I'm trying to figure out where I could be writing better code.

Edit: After thinking about the last few posts, I tried testing the speed of (dolist (nm nm-list) (parse-name nm)), and it was about 15% faster.
Jeff
=====
Old programmers don't die. They just parse on...

Artful code

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

I haven't got anything useful to say (above my head, most of it) but - just for interest - what speed differences are you expecting and noticing?

(Recently I did a blog entry trying to match a 75ms Perl script, starting off with a newLISP version which started out taking 1.2 seconds. Obviously a factor of 10 is worth looking into...! Not that I'm a speed freak - just that discrepancies are intriguing.)

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

here is another suggestion for Jeff to speedup the name parser code:

Code: Select all

> (time (format "%s %s %s" "hi" "ho" "hum") 1000000)
4818
> (time (append  "hi" " "  "ho" " " "hum") 1000000)
1723
> 
when using 'format' for simple string appending with filling in text in between, 'append' can be a lot faster.

This is for the '(format "%s %s %s" $3 $1 $2)' pieces in the normalize function.

Lutz

ps: timings on MacMini G4

rickyboy
Posts: 607
Joined: Fri Apr 08, 2005 7:13 pm
Location: Front Royal, Virginia

Post by rickyboy »

Lutz wrote:This is for the '(format "%s %s %s" $3 $1 $2)' pieces in the normalize function.
I wonder if, also, the formats in format-name should be switched to appends or anything else faster. Jeff said his speed discrepancy was noted when running format-name under map. Just a thought.
(λx. x x) (λx. x x)

Jeff
Posts: 604
Joined: Sat Apr 07, 2007 2:23 pm
Location: Ohio
Contact:

Post by Jeff »

Using append, I've gotten execution over 7,896 names down to 1,075 ms, excluding interpreter load time. In python, the same number of names is around 975 ms without counting the interpreter. If I change from iteration to list assignment (using map or a list comprehension), I add only about 50-100 ms to either.
Jeff
=====
Old programmers don't die. They just parse on...

Artful code

Locked