Has anyone written a web crawler in newLISP?
I have looked, but cannot find such a beast.
Any pointers would be greatly appreciated.
Web Crawler
Web Crawler
. Kanen Flowers http://kanen.me .
-
- Posts: 2038
- Joined: Tue Nov 29, 2005 8:28 pm
- Location: latiitude 50N longitude 3W
- Contact:
Re: Web Crawler
I'd start looking here...
Re: Web Crawler
I have written later in 2009 some kind of crawler to gather information from one big goverment site.kanen wrote:Has anyone written a web crawler in newLISP?
Pretty simple thing, just several hundreds lines of code, "cgi.lsp" + regular expressions + lots of cookie romp. If you are going to make crawler without cookie, I think, simple crawler can be developed in one evening.
Re: Web Crawler
I had forgotten, after using Ruby and Python (and, of course, C) for a few years, just how fetching awesome newLISP is.
I did indeed write the simple crawler in one evening and it turns out to be quite fast.
I did indeed write the simple crawler in one evening and it turns out to be quite fast.
Fritz wrote:I have written later in 2009 some kind of crawler to gather information from one big goverment site.kanen wrote:Has anyone written a web crawler in newLISP?
Pretty simple thing, just several hundreds lines of code, "cgi.lsp" + regular expressions + lots of cookie romp. If you are going to make crawler without cookie, I think, simple crawler can be developed in one evening.
. Kanen Flowers http://kanen.me .
Re: Web Crawler
Hi Kanen
I wrote some simple tools for analysing websites in newLISP and used CURL for fetching urls, because (get-url) doesn't support "https". You can simple invoke CURL by the exec function.
Then using SXML for parsing the returned HTML would be the easiest.
Cheers
Hilti
I wrote some simple tools for analysing websites in newLISP and used CURL for fetching urls, because (get-url) doesn't support "https". You can simple invoke CURL by the exec function.
Then using SXML for parsing the returned HTML would be the easiest.
Cheers
Hilti
--()o Dragonfly web framework for newLISP
http://dragonfly.apptruck.de
http://dragonfly.apptruck.de