sorting

Q&A's, tips, howto's
Locked
eddier
Posts: 289
Joined: Mon Oct 07, 2002 2:48 pm
Location: Blue Mountain College, MS US

sorting

Post by eddier »

Lutz,

In a particular CGI program I am writing the following problem has come up.

I will have to write a sort based on a comparison function. For example, I need to ignore case of letters and I need to sort by a different element of lists of lists as in

Code: Select all

(sort (("12" "Bb") ("32" "ab")) f)
where f is a function to compare the 2nd elements without case.
I was able to

Code: Select all

(define (sort-by L col)
  (map (fn (y) (swap 0 col y))
       (sort
	(map (fn (x) (swap 0 col x))
	     L))))
but I still have the problem of a case sensitive sort. If I code a sort myself in newlisp, I have the problem of setting the stack. Because lists may get rather long, how do I force a cgi to make newlisp change it's stack size?
Maybe I should write a c function store it in a library and use it?

Eddie

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

you can put comandline parameters for newlisp in the first line of your script:

#!/usr/bin/newlisp -s 10000

or

#!/usr/bin/newlisp -s10000

newLISP can parse both, but it also depends on your platform, in my experiments BSD and Linux work fine with both. I always use this as a test script:

#!/usr/bin/newlisp -s10000

(println (main-args))
(println (sys-info))

Lutz

eddier
Posts: 289
Joined: Mon Oct 07, 2002 2:48 pm
Location: Blue Mountain College, MS US

Post by eddier »

Thanks, then I have the following solution. I know merge sort uses a bit of memory but it works for now.

Code: Select all

(define (merge-sort X f)
  (let (h (/ (length X) 2))
    (if (= (length X) 1) 
	X
      (merge
       (merge-sort (slice X 0 h) f)
       (merge-sort (slice X h) f))))
  
(define (merge X Y f)
  (cond ((empty? X) Y)
	((empty? Y) X)
	((f X Y)
	 (cons (first X) (merge (rest X) Y f)))
	(true (cons (first Y) (merge X (rest Y) f)))))
where f is a comparison function. I kind of like to send a (fn (x y) ...)

Eddie

eddier
Posts: 289
Joined: Mon Oct 07, 2002 2:48 pm
Location: Blue Mountain College, MS US

Post by eddier »

After testing I have come to no conclusion. Which is faster (= (length X) 1) or (empty? (rest X))?

Sometimes the former is faster sometimes the latter is faster.

Eddie

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

Nice merge sort! In my measurements (empty? (rest X)) is about 10% faster than (= (length X) 1).

Lutz

eddier
Posts: 289
Joined: Mon Oct 07, 2002 2:48 pm
Location: Blue Mountain College, MS US

Post by eddier »

Thanks Lutz! Everytime I've tried to time the two I get such different results that I couldn't figure out which was faster.

The previous sort had a problem with the contents passed to the compare function. The following fixes the problem and adds the faster compare.

Code: Select all

(define (merge-sort X f)
  (println X)
  (let (h (/ (length X) 2))
    (if (empty? (rest X)) 
	X
      (merge
       (merge-sort (slice X 0 h) f)
       (merge-sort (slice X h) f)))))
  
(define (merge X Y)
  (cond ((empty? X) Y)
	((empty? Y) X)
	((f (first X) (first Y))
	 (cons (first X) (merge (rest X) Y)))
	(true (cons (first Y) (merge X (rest Y))))))
After trying it out, I've found that this is too slow and uses too much memory for my CGI application. I'm going to have to make a library sort and import it. I can use one of the already tweeked quick sorts with a lisp comparison operation (I'll loose the benifits of the tweek from the lisp comparison). I'll get back to this after class.

Eddie

eddier
Posts: 289
Joined: Mon Oct 07, 2002 2:48 pm
Location: Blue Mountain College, MS US

Post by eddier »

Oops! leave the (println X) out.

Eddie

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

You could dothe following:

(set 'data '(("12" "Bb") ("32" "ab") ("43" "bx")) )

(1) extract the upper-cases sort-term together with an index into the list

=> (("BB" 0) ("AB" 1) ("BX" 2))

(2) sort the converted list:

=> (("AB" 1) ("BB" 0) ("BX" 2))

(3) extract the indices:

=> (1 0 2)

(4) select in sort order:

(select data '(2 0 3)) => (("32" "ab") ("12" "Bb") ("43" "bx"))

Lutz

eddier
Posts: 289
Joined: Mon Oct 07, 2002 2:48 pm
Location: Blue Mountain College, MS US

Post by eddier »

Thanks. I will see if this gives me the performance and flexibility I need.

Eddie

eddier
Posts: 289
Joined: Mon Oct 07, 2002 2:48 pm
Location: Blue Mountain College, MS US

Post by eddier »

Your solution works great! Yep this small and fast enough :)

Code: Select all

(define (sort-by L col)
  (let (k -1)
    (select L
	   (map last (sort 
		      (map (fn (x) (list (upper-case (nth col x)) (inc 'k)))
			   L))))))
Eddie

eddier
Posts: 289
Joined: Mon Oct 07, 2002 2:48 pm
Location: Blue Mountain College, MS US

Post by eddier »

Lutz, The CGI works pretty fast connected straight to our server. I would like to know how fast it responds offsite. I'm not sure if this is intuitive to use so please feel free to suggest improvements.

[/url]http://www.bmc.edu/cgi-bin/alum.cgi[/url]

Eddie

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

I have a 300 Kbyte per second downstream line.
Here are the 10x results:


> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi" ))
4797
> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi" ))
3345
> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi" ))
3755
> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi" ))
3375
> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi" ))
3715
> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi" ))
4236
> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi" ))
4727
> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi" ))
4727
> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi" ))
3985
> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi" ))
3274


For a speed improvement you can create the page in a buffer
and then display the buffer in 1 step to the user, instead of
building it while downloading the page. But perhpas that is
just what the point is ;-)

Pretty good function ;-)

Regards, Norman.
-- (define? (Cornflakes))

eddier
Posts: 289
Joined: Mon Oct 07, 2002 2:48 pm
Location: Blue Mountain College, MS US

Post by eddier »

Thanks Norman :)

Connected directly to the network, Im getting

Code: Select all

> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi")) 
409
> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi"))
403
> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi"))
428
> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi"))
399
> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi"))
399
> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi"))
413
> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi"))
397
But I notice from home things are much slower.
For a speed improvement you can create the page in a buffer
and then display the buffer in 1 step to the user, instead of
building it while downloading the page. But perhpas that is
just what the point is ;-)
Exactly! I build the page on the fly because I have different programs on two other clients that update the data and I'm going to use this same prototype CGI (with a few modifications) to genereate all the table listings on our cite. I'll have to modify it a small bit for context so that it will know which data to grab and make it a bit more generic.

The main part of the code is to sort data by the column title clicked and generate an index for quick scrolling to the data items. I'm not sure I like the way graduation dates are indexed. Click the column with "Class of" and notice how it does the hyphenated ones. What do you think about ?

Eddie[/quote]

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

Hi Eddie,

Its a pretty nice CGI composition..quiet fast, but i think not fast enough because i have a fast internet connection and others perhpas dont. If its
for internal network use only its fast enough..

The "bottle-neck" looks to be the collection of the data by the scripts
behind the cgi script. The CGI scripts itself only generates the webpage (which is fast). If you have a problem with the script running behind the cgi-script
then the webuser needs to wait and wait.. i guess..If you can generate the output webpage in a memory buffer in newlisp (using [text] [/text]) and then dump it to the webuser is much faster.. it is possible doing this with dynamic content also..

Another option is perhpas building a little Cashing inside the CGI script..
So befor you display you must i.e. already have loaded 2 KB of data from
you database befor displaying..This way you get bigger steps in displaying the content to the webuser.

Regards, Norman.
-- (define? (Cornflakes))

eddier
Posts: 289
Joined: Mon Oct 07, 2002 2:48 pm
Location: Blue Mountain College, MS US

Post by eddier »

The way data is collected is through the (load mechanism). The client programs create a list of lists from the a database and then ftp the file to the server. I'm now thinking of changing their scripts to just dump 4 html pages onto the server instead of going through CGI. That way the response will be nearly instantaneous.

Eddie

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

I like the working of the script!
You still can use the CGI script to create partly the HTML pages based
on the selection from the webuser.. You already have the data so actualy
you can build anything with it...
-- (define? (Cornflakes))

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

This is what I get in Florida on a DSL

> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi" ))
2672
> (time (get-url "http://www.bmc.edu/cgi-bin/alum.cgi" ))
2750
> (time (get-url "http://www.bmc.edu/" ))
625
> (time (get-url "http://www.bmc.edu/" ))
656

The page generated by the CGI is about 97K

Lutz

ps: completely intuitive to use

eddier
Posts: 289
Joined: Mon Oct 07, 2002 2:48 pm
Location: Blue Mountain College, MS US

Post by eddier »

Thanks Norman and Lutz.

The Alumnae requested that the columns be able to sort. And of course they also wanted the index. And many wanted to be able to print the table.

Good about being intuitive to use. I was going to use a scrolling table but then I have to add an extra part to the script to be able to print correctly. Then, I thought about frames to get the scrolling columns but there are never going to be enough columns to get mixed up about what each column stands for so, "Simple is Better."

Thanks for the comparison of the html page speed from the Web site and from the cgi. There is quite a bit of speed difference. Since a large number of users are still using dialup, I'm going to take Norman's advice and pre-build the pages. It will be quite easy to combine the newLISP scripts already on the client machines to collect the data and ftp with the script to generate the html.

Thanks again guys for you help.

Eddie

eddier
Posts: 289
Joined: Mon Oct 07, 2002 2:48 pm
Location: Blue Mountain College, MS US

Post by eddier »

No more CGI. I combined the two scripts and just generate the pages and ftp them to the site. If there had been many columns then I would have buffered the pages.

Eddie

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

I wonder if prebuilt pages really would have make a difference, the size of the page was 97Kbyte, that is about 1 sec if your server has an upload bandwidth of ~ 1 Mbit/sec, add to this the work the browser has to do and it seems that the CGI processing perhaps was only a small time in the whole measurement.

Lutz

eddier
Posts: 289
Joined: Mon Oct 07, 2002 2:48 pm
Location: Blue Mountain College, MS US

Post by eddier »

I have no clue. None from this side. But, I think creating the pages on a client machine and dumping them on the server means less security risks and then I'm guaranteed that it will be fast enough. I'm just going to copy the script to the other client and add a header file for each type of directory or list that is created.

Eddie

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

In the next version 8.4.5 you can specify any operator, function or anonymous function as comparison function in the second parameter of 'sort'

Lutz

eddier
Posts: 289
Joined: Mon Oct 07, 2002 2:48 pm
Location: Blue Mountain College, MS US

Post by eddier »

This is the way the various sorts in MzScheme's libraries work. I have actually had three occasions that I recall that a sort like this was needed. Thanks Lutz!

Eddie

Locked