Page 1 of 1

Web Crawler

Posted: Thu Apr 15, 2010 8:57 pm
by kanen
Has anyone written a web crawler in newLISP?

I have looked, but cannot find such a beast.

Any pointers would be greatly appreciated.

Re: Web Crawler

Posted: Thu Apr 15, 2010 9:32 pm
by cormullion
I'd start looking here...

Re: Web Crawler

Posted: Sat Apr 17, 2010 9:02 am
by Fritz
kanen wrote:Has anyone written a web crawler in newLISP?
I have written later in 2009 some kind of crawler to gather information from one big goverment site.

Pretty simple thing, just several hundreds lines of code, "cgi.lsp" + regular expressions + lots of cookie romp. If you are going to make crawler without cookie, I think, simple crawler can be developed in one evening.

Re: Web Crawler

Posted: Sat Apr 17, 2010 2:54 pm
by kanen
I had forgotten, after using Ruby and Python (and, of course, C) for a few years, just how fetching awesome newLISP is.

I did indeed write the simple crawler in one evening and it turns out to be quite fast.
Fritz wrote:
kanen wrote:Has anyone written a web crawler in newLISP?
I have written later in 2009 some kind of crawler to gather information from one big goverment site.

Pretty simple thing, just several hundreds lines of code, "cgi.lsp" + regular expressions + lots of cookie romp. If you are going to make crawler without cookie, I think, simple crawler can be developed in one evening.

Re: Web Crawler

Posted: Wed Apr 28, 2010 6:01 am
by hilti
Hi Kanen

I wrote some simple tools for analysing websites in newLISP and used CURL for fetching urls, because (get-url) doesn't support "https". You can simple invoke CURL by the exec function.

Then using SXML for parsing the returned HTML would be the easiest.

Cheers
Hilti