wp in newLISP

Q&A's, tips, howto's
Locked
m i c h a e l
Posts: 394
Joined: Wed Apr 26, 2006 3:37 am
Location: Oregon, USA
Contact:

wp in newLISP

Post by m i c h a e l »

Dear Club,

I came across this listing of a word count program implemented in various languages (see README.txt for more info). As you can probably guess, there was no newLISP version!

So I wrote the following, which seems to work:

Code: Select all

#!/usr/bin/newlisp

(new Tree 'counts)

(while (read-line)
	(dolist (word (parse (current-line) " "))
		(counts word (inc (counts word)))
	)
)

(dolist (each (sort (counts))) 
	(println (each 0) " " (each 1))
)

(exit)
Does anyone see room for improvement? If not, I'll submit this to Felix in a few days' time.

m i c h a e l

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

Here is a variation with some explanations:

Code: Select all

(while (read-line)
    (bayes-train (parse (current-line) "\\s+" 0) 'Counts))


(dolist (each (Counts))
    (println (each 0) " " (each 1 0)))

; try it out

newlisp counter.lsp < sometext.txt
the program uses the same hash-tree data structure as yours, but uses newLISP's built-in 'bayes-train' function to do the counting. Just like the hash function it will prepend an underscore to the symbol, but hide it when getting the word list using (Counts).

The example programs in the link mostly split by white-space, which would be "\\s+", but I prefer "\\W+" which does a better job rejecting funny characters, i.e. when parsing from HTML text.

The print functions is almost the same as yours. Sorting is really not necessary, because the list produced by (Count) is already sorted. I need and extra index 0, because 'bayes-train' parenthesizes the counts (there could be more counts in a list). As a bonus you have the toal of all words in Counts:total.

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

What's this then?

Code: Select all

(new Tree 'counts) 

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »


cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

Ah, OK. The same as (define counts:counts). Must have gone to sleep when reading that page...! :)

xytroxon
Posts: 296
Joined: Tue Nov 06, 2007 3:59 pm
Contact:

Post by xytroxon »

cormullion wrote:Ah, OK. The same as (define counts:counts). Must have gone to sleep when reading that page...! :)
Don't feel bad...
(counts word (inc (counts word)))
Gave me (and all Lispers), pause... ;)

Should we be using .nl instead of .lsp file extentions for our "newLISP" programs?

-- xytroxon
"Many computers can print only capital letters, so we shall not use lowercase letters."
-- Let's Talk Lisp (c) 1976

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

Yes, it looks pretty idiomatic...

Perhaps with capital letter for the hash thingy would look slightly better:

Code: Select all

(Counts word (inc (Counts word)))
The slightly out of step thing at first glance is the double-access (Counts word) X 2.

Interesting suggestion for ".nl" as the suffix!

xytroxon
Posts: 296
Joined: Tue Nov 06, 2007 3:59 pm
Contact:

Post by xytroxon »

cormullion wrote: Interesting suggestion for ".nl" as the suffix!
I have to use .nl if I want to use it on my personal (Abyss) server with another Lisp installed. The server uses the extension to know which interpreter to run.

And also .nl can then be expanded...

.nl runs on all machines
.nlm runs on Mac only
.nlx runs on Linux only
.nlw runs on Windows only
etc...

-- xytroxon
"Many computers can print only capital letters, so we shall not use lowercase letters."
-- Let's Talk Lisp (c) 1976

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

what runs on all 64 bits only machines with utf-8 enabled? ;-)
-- (define? (Cornflakes))

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

.nl appears to be used only by "Norton Disk Library" something... It could be a good idea to have an extension that doesn't clash with other Lisps... I think I'll try it out.

Cyril
Posts: 183
Joined: Tue Oct 30, 2007 6:27 pm
Location: Moscow, Russia
Contact:

Re: wp in newLISP

Post by Cyril »

m i c h a e l wrote:I came across this listing of a word count program implemented in various languages (see README.txt for more info). As you can probably guess, there was no newLISP version!
That's fine! I have the similar listing of my own long ago, you can find it here. And, of course, I do have newLISP! ;-) I have also some languages not found in the listing found by you. And I solve a bit different task: I sort output in alphabetical order of words, not by count. But there is a caveat: all the readme-like text is in Russian language. But you can read source texts nevertheless.
With newLISP you can grow your lists from the right side!

m i c h a e l
Posts: 394
Joined: Wed Apr 26, 2006 3:37 am
Location: Oregon, USA
Contact:

Post by m i c h a e l »

Cyril wrote:That's fine! I have the similar listing of my own long ago, you can find it here.
Yes, I did miss your page! Google's Language Tools did a fair job of translating it, so I was able to read it, no problem. Your newLISP implementation matches Lutz's above recommended changes, too.

With Lutz's changes (and a switch from counts to Words), we get:

Code: Select all

#!/usr/bin/newlisp

(new Tree 'Words)

(while (read-line)
	(bayes-train (parse (current-line) {\s+} 0) Words)
)

(dolist (each (Words)) 
	(println (each 0) " " (each 1 0))  ; implicit indexing
)

(exit)
m i c h a e l

Locked