Page 1 of 1
Web Crawler
Posted: Thu Apr 15, 2010 8:57 pm
by kanen
Has anyone written a web crawler in newLISP?
I have looked, but cannot find such a beast.
Any pointers would be greatly appreciated.
Re: Web Crawler
Posted: Thu Apr 15, 2010 9:32 pm
by cormullion
I'd start looking
here...
Re: Web Crawler
Posted: Sat Apr 17, 2010 9:02 am
by Fritz
kanen wrote:Has anyone written a web crawler in newLISP?
I have written later in 2009 some kind of crawler to gather information from one big goverment site.
Pretty simple thing, just several hundreds lines of code, "cgi.lsp" + regular expressions + lots of cookie romp. If you are going to make crawler without cookie, I think, simple crawler can be developed in one evening.
Re: Web Crawler
Posted: Sat Apr 17, 2010 2:54 pm
by kanen
I had forgotten, after using Ruby and Python (and, of course, C) for a few years, just how fetching awesome newLISP is.
I did indeed write the simple crawler in one evening and it turns out to be quite fast.
Fritz wrote:kanen wrote:Has anyone written a web crawler in newLISP?
I have written later in 2009 some kind of crawler to gather information from one big goverment site.
Pretty simple thing, just several hundreds lines of code, "cgi.lsp" + regular expressions + lots of cookie romp. If you are going to make crawler without cookie, I think, simple crawler can be developed in one evening.
Re: Web Crawler
Posted: Wed Apr 28, 2010 6:01 am
by hilti
Hi Kanen
I wrote some simple tools for analysing websites in newLISP and used CURL for fetching urls, because (get-url) doesn't support "https". You can simple invoke CURL by the exec function.
Then using SXML for parsing the returned HTML would be the easiest.
Cheers
Hilti