Web Crawler

Q&A's, tips, howto's
Locked
kanen
Posts: 145
Joined: Thu Mar 25, 2010 6:24 pm
Contact:

Web Crawler

Post by kanen »

Has anyone written a web crawler in newLISP?

I have looked, but cannot find such a beast.

Any pointers would be greatly appreciated.
. Kanen Flowers http://kanen.me .

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Re: Web Crawler

Post by cormullion »

I'd start looking here...

Fritz
Posts: 66
Joined: Sun Sep 27, 2009 12:08 am
Location: Russia

Re: Web Crawler

Post by Fritz »

kanen wrote:Has anyone written a web crawler in newLISP?
I have written later in 2009 some kind of crawler to gather information from one big goverment site.

Pretty simple thing, just several hundreds lines of code, "cgi.lsp" + regular expressions + lots of cookie romp. If you are going to make crawler without cookie, I think, simple crawler can be developed in one evening.

kanen
Posts: 145
Joined: Thu Mar 25, 2010 6:24 pm
Contact:

Re: Web Crawler

Post by kanen »

I had forgotten, after using Ruby and Python (and, of course, C) for a few years, just how fetching awesome newLISP is.

I did indeed write the simple crawler in one evening and it turns out to be quite fast.
Fritz wrote:
kanen wrote:Has anyone written a web crawler in newLISP?
I have written later in 2009 some kind of crawler to gather information from one big goverment site.

Pretty simple thing, just several hundreds lines of code, "cgi.lsp" + regular expressions + lots of cookie romp. If you are going to make crawler without cookie, I think, simple crawler can be developed in one evening.
. Kanen Flowers http://kanen.me .

hilti
Posts: 140
Joined: Sun Apr 19, 2009 10:09 pm
Location: Hannover, Germany
Contact:

Re: Web Crawler

Post by hilti »

Hi Kanen

I wrote some simple tools for analysing websites in newLISP and used CURL for fetching urls, because (get-url) doesn't support "https". You can simple invoke CURL by the exec function.

Then using SXML for parsing the returned HTML would be the easiest.

Cheers
Hilti
--()o Dragonfly web framework for newLISP
http://dragonfly.apptruck.de

Locked