counting words quickly

For the Compleat Fan
Locked
cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

counting words quickly

Post by cormullion »

An unfamiliar sight, perhaps. A Perl script to cound the number of 'and's in a novel:

Code: Select all

#!/usr/bin/perl -w
@ARGV = ("/Users/me/study-in-scarlet.txt");
my $line = 0;
$count = 0;
$word = 0;
while (<>) {
    foreach $word (m/\band/ig) {
      ++$count;
      }
    }
print $count,"\n";
I want a newLISP script that runs as faster than this Perl one and uses a similar approach. But I can't do it. My best attempt is 3 times slower. What am I doing wrong?

Code: Select all

(set 'path "/Users/me/study-in-scarlet.txt")

(set 'count_ 0)
(dolist (w (parse (read-file path) "\\s" 1))
  (if (= w "and")
    (inc 'count_)))
(println count_)
(exit)

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

Perl is hard to beat if you do it the Perl way. Do it the newLISP way and you are more than 2 times faster:

Code: Select all

#!/usr/bin/newlisp

(set 'argv "/Users/Lutz/mrdemo/Sherlock.txt") 
(set 'cnt (length (find-all "(?i)\\band" (read-file argv))))

(println cnt)
(exit)
I used http://www.gutenberg.org/dirs/etext99/advsh12.txt for Sherlock.txt.

Code: Select all

~> time ./andcount.pl
3065

real    0m0.136s
user    0m0.090s
sys     0m0.008s

~> time ./andcount.lsp
3065

real    0m0.059s
user    0m0.015s
sys     0m0.009s
~> 
Lutz
Last edited by Lutz on Tue Apr 10, 2007 7:36 pm, edited 1 time in total.

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

:-) Yes. That's a nice version.

Why do you think my version was slower, though...? Is parse working harder than find-all - I suppose it is returning much more info...

ino-news
Posts: 17
Joined: Sat Jan 13, 2007 3:20 am
Location: germany

Post by ino-news »

cormullion wrote: Why do you think my version was slower, though...? Is parse
working harder than find-all - I suppose it is returning much more
info...
i guess find-all lets the pcre-lib do most of the work. --clemens
regards, clemens

Jeff
Posts: 604
Joined: Sat Apr 07, 2007 2:23 pm
Location: Ohio
Contact:

Post by Jeff »

Code: Select all

(set 'path "/Users/me/study-in-scarlet.txt")

(set 'count_ 0)
(dolist (w (parse (read-file path) "\\s" 1))
  (if (= w "and")
    (inc 'count_)))
(println count_)
(exit) 
My take: what takes so long is that you first expand the file to a list using parse, this assigns a huge number of strings (which according to the description of the memory management on the site, is not what newlisp is optimized for) to the list, then iterates over it. This means that the internal dolist macro (or built in c function or whatever) is having to index its way through the list and then test for equality on each item against another string that is being assigned each iteration: "and". Then, it has to increment 'count_, although that is probably pretty easy on resources.

Lutz's is only assigns the regexp string once and probably is not using dolist for iteration. The best way to write lisp (at least from my experience with older lisps) is to think of the code in terms of the expansion/evaluation of each list element.

This is what makes lisps both more cryptic and more interesting to use. The inside-out evaluation allows you to do things and express code in ways that other languages would never permit.
Jeff
=====
Old programmers don't die. They just parse on...

Artful code

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

Hi Jeff - thanks for the explanations!

Glad to see your recent contributions, too. What do you think of newLISP, given your experience with other languages?

rickyboy
Posts: 607
Joined: Fri Apr 08, 2005 7:13 pm
Location: Front Royal, Virginia

Post by rickyboy »

Also, did you notice that Lutz's newlisp program is much more concise than the perl program? It beats perl at it's own game there too. He, he, he. :-)

It can also be made more concise by removing the unneeded symbol 'set'ting; however, I don't think you can get away with the same thing in the perl version (but I may be wrong). So, chalk up another one for newlisp! --Ricky
(λx. x x) (λx. x x)

Jeff
Posts: 604
Joined: Sat Apr 07, 2007 2:23 pm
Location: Ohio
Contact:

Post by Jeff »

I like it. It's neat to have a lisp that does regex natively (and more cleanly than clisp) and abstracts so many things that are a real pain in older lisps. And, of course, I think it's much more elegant and clean than most interpreted imperative languages.

I do have a bit of trouble, though, because all variables and symbols that are not explicitly declared appear to evaluate to nil. This has huge effects on recursion, although I understand that recursion isn't always the preferred method with newLISP.

Not that I am a lisp expert; I just think it's a better way of doing things. I started on Perl and came to lisp last and with no degree in CS, so it's often difficult for me to pick up some things in the language.

One thing I do miss from common lisp, though, are plists (a la (:foo bar :blah yadda) rather than ((foo bar) (blah yadda))). They made things much more concise, in my (albeit uneducated) opinion.

The scoping is also a bit odd for me, especially where contexts are concerned. I'm used to using symbols rather than strings where possible and only converting them to strings when output is needed. If I do this in a separate context, that context needs to know which context is calling it in order to use symbols passed to an argument. For example:

Code: Select all

(context 'FOO)
(define (hello person) (println "Hello " (string person)))
(context 'BAR)
(FOO:hello 'Jeff) => "Hello BAR:Jeff"
This can be important if you are trying to make a module that can be used from anywhere.

Lutz, is there something I'm missing on this?

EDIT: Fixed the code to demonstrate what I was intending to. Wrote too quickly and forgot what I was trying to illustrate.
Jeff
=====
Old programmers don't die. They just parse on...

Artful code

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

You could do this:

Code: Select all

(context 'FOO) 
(define (hello person) 
    (println "Hello " (name person) " from "  (name person true)))
(context 'BAR)

(FOO:hello 'Jeff) => Hello Jeff from BAR
See: http://newlisp.org/downloads/newlisp_manual.html#name

Lutz

Jeff
Posts: 604
Joined: Sat Apr 07, 2007 2:23 pm
Location: Ohio
Contact:

Post by Jeff »

Thanks, I missed that boolean option on the end of name my first time through the docs. That helps a ton!
Jeff
=====
Old programmers don't die. They just parse on...

Artful code

Locked