5 Cent tip for today [ Decode me ]

newdep · Post by **newdep** » Sun Oct 17, 2004 3:33 pm

Just playing with some form of decoding the example below could
become pretty complex if you would enhance it more...

If you like working with lists and newlisp then here, i guess, a nice riddle to break just
you mind over it for a couple of minutes...

you have two lists available :-) which is enough to decode...

http://www.nodep.nl/downloads/newlisp/decodeme.lsp

Decoder function solutions can be posted here...(keep it as small as posisble ;-)

enjoy...Norman.

newdep · Post by **newdep** » Sun Oct 17, 2004 6:32 pm

Hi Lutz,

Actualy i need some help on the code listed above,
the current routine (in the above source coded code) is very very slow...

Ill be back with the decoded version if noone lists it befor i do ;-)

Regards, Norman.

Sammo · Post by **Sammo** » Sun Oct 17, 2004 6:49 pm

(set 'a (-1 (length (flat coordinates))))
(set 'x (map (fn (x) (first (ref x coordinates))) (sequence 0 a)))
(print (join (map (fn (y) (nth y individuals)) x)))

Code: Select all

;;
;; And this is what you see when it worked.
;;
;; Playing with different ways of displaying code content
;; the result below is a variation using lists.
;;
;; Enjoy, Norman.
;;

(pretty-print 65535)

(set 'original '())
(set 'coordinates '())
(set 'individuals '())

; make it a character based list from original file
(set 'original (explode (set 'infile (read-file "./index.lsp") )))

; get the unique characters from the list
(set 'individuals (unique original))

; count per unique character the amount it appears in the originial file
(set 'amount (count individuals original))

; create per unique character an index and create an indexed list

(dolist (y individuals)
 (define (member? x) (= y x ))
 (println y) 
 (push (index member? original) coordinates -1))
 

(save "./coordinated.db" 'individuals 'coordinates)

newdep · Post by **newdep** » Sun Oct 17, 2004 7:51 pm

You wanted me to continue with my question to Lutz? ;-) ..hahahaha..

Well pretty quick done sammo ! nice nice...

newdep · Post by **newdep** » Sun Oct 17, 2004 7:56 pm

Oke now Sammo has made his statement ;-)
Im able to continue with my question to Lutz on the above point. :)

I was seeking for a very quick function that was able to return index numbers
from a list...the only buildin function for this job seems to be 'index.

However. the 'index function has a little (in some cases) drawback and
that is that the indexed numbers it returns come from a define...

Taking an example file of i.e. 500 or 700 Kb then the function in the
source-code above (dolist......) is very very slow..takes over 5 minutes...
(No problem at all ...its a hard job...But indeed i wonder if 'map is quicker?)

Besides.. I dont like the (dolist....) function at all in my example because
it does a double check on someting i already have..and thats the individuals...
Actualy it would be nice if this could work -> (index individuals original)

Still im seeking for a very fast way to return indexed numbers from a list?
So..what is the fasted alternative? (im already reverse-enigering Sammo's example)

Regards, Norman.

Sammo · Post by **Sammo** » Mon Oct 18, 2004 3:11 am

Hi Norman,

It seems that pushing those character indices onto the end of lists gets very, very expensive when the lists get large. The following uncommented solution (I hope you can read it) was running as slowly as yours (or even more slowly) until I tried pushing onto the front of the lists -- that is, (push value list-of-stacks stack-index 0) instead of (push value list-of-stacks stack-index -1). When I made this change, I can index the newlisp_manual.html file (455KB) in about 8 seconds -- including writing the resulting 3.2MB file to the disk.

Code: Select all

(set 'time-begin (time-of-day))

(set 'infile "c:/newlisp/newlisp_manual.html")
(set 'outfile "c:/newlisp/test-coords.txt")
(set 'orig (explode (read-file infile)))
(set 'indiv (unique orig))
(set 'coord (dup '() (length indiv)))
(set 'ilist (map (fn (x) (find x indiv)) orig))
(set 'x -1)
(dolist (i ilist) (push (inc 'x) coord i 0))
(save outfile 'indiv 'coord) 

(println (- (time-of-day) time-begin))

Pushing the character indicies onto the front of the lists is okay and doesn't need correcting later on; the only thing that matters is that each character's index get pushed onto the correct stack.

If you really want the indicies in ascending order in each stack, the following one-line modification to the above does the trick:

Code: Select all

(set 'infile "c:/newlisp/newlisp_manual.html")
(set 'outfile "c:/newlisp/test-coords.txt")
(set 'orig (explode (read-file infile)))
(set 'indiv (unique orig))
(set 'coord (dup '() (length indiv)))
(set 'ilist (map (fn (x) (find x indiv)) orig))
(set 'x -1)
(dolist (i ilist) (push (inc 'x) coord i 0))
(set 'coord (map reverse coord))
(save outfile 'indiv 'coord)

On a big file (e.g., newlisp_manual.html), reversing the stacks after populating them took about six percent more time on my WIN2K 1.4GHz laptop.

A couple of more trial runs suggests that this scales pretty well.

62KB file --> 0.75 seconds
445KB file --> 8.5 seconds
1.17MB file --> 20.8 seconds

newdep · Post by **newdep** » Mon Oct 18, 2004 10:28 am

Hello Sammo,

Thanks for the reply, i like (set 'ilist (map (fn (x) (find x indiv)) orig)) ,
which overrules 'index.

The reverse must somehow be abandoned lets see if i can workaround it...
Ill get back on this... Thanks..

Regards, Norman.

Lutz · Post by **Lutz** » Mon Oct 18, 2004 1:13 pm

Sam's solution seems to be the best, but there is one other potential bottle neck in:

(set 'ilist (map (fn (x) (find x indiv)) orig))

which is the: (find x indiv)

It doesn't matter in Norman's task because 'indiv' is such a short list. But imagine you deal with an 'indiv' with thousands of members.

This is where newLISP arrays come in handy:

Code: Select all

; old code
; (set 'ilist (map (fn (x) (find x indiv)) orig))

; new code
(set 'ivec (array 256))
(set 'pos -1)
(dolist (i indiv) (nth-set (char i) ivec (inc 'pos)))

(set 'ilist (map (fn (x) (nth (char x) ivec)) orig))

This builds an array vector where each character is at the position of its value. So at "A" index 65 you have the index position of "A" in indiv.

Now the sequential (find x indiv) is replaced with (nth (char x) ivec) which does random access.

In Norman's example this saves only about 0.5% of time, but it could be worth doing it in other situations with very long ndexing vectors.

Lutz

newdep · Post by **newdep** » Mon Oct 18, 2004 3:06 pm

Hello Lutz,

Thanks for the addon...Actualy speed seems to be more important in
my tool im currently building then expected. It will for sure be eating
MB's greater then 50 MB..or more...

So that leaves me with two remaining questions...

Should I use 'index above 'find or always switch to 'array's when it comes
to performance and what will 'map do on performace ?

Regards, Norman.

Lutz · Post by **Lutz** » Mon Oct 18, 2004 3:31 pm

'map' is pretty good performance wise, becusae it goes sequentially thru the list one by one, so 'map' will always scale well. The problem with 'find' is that it searches sequentially, so if the list is very long it becomes an issue.

I would use 'index' only if you are really interested to get various positions back in a list. But that was not the case in your riddle, because every letter occured only once in the individual vector and 'find' returns as soon as it found one. 'index' keeps on searching hoping to find more until the end of the list.

The array versus list issue is difficult to decide. In your riddle it did not make a real difference and Sam's pure list code was much shorter. I always start coding as a list and only goto arrays when time seems to get a problem. As a thumb rule I would say when the list doesn't exceed a few hundred elements its not worth thinking about arrays. But then again in an individual case things may be different.

Lists are very good when going thru it sequentially (like map), but as soon as you have to jump around in it randomly it can get a problem.

When you are using 'nth-set' or 'set-nth' be also aware of the difference between the two. 'nth-set' is much faster becuase it doesn't need to return the whole list or array.

Lutz

newdep · Post by **newdep** » Mon Oct 18, 2004 7:51 pm

Sammo how long have you been programming lisp?

Its so dammm simple and soo far away sometimes...

This is the first time i see someone using "-1" in an inc..!!!
Clever ?! well yes ! i have never ever used it and its sooo damm simple...
after 15 years of programming i just realised that I was always using
the wrong tools! Pffffffff... I need a coffie.. :-)