Page 1 of 1

get-url -> ERR: HTTP document empty

Posted: Tue Mar 26, 2013 10:52 pm
by Darth.Severus
I'm using some code in a program, to get the links of a website incl. subpages. But it's not working, I often get "ERR: HTTP document empty". I puted a until loop into the code, so that it tries it several times, always after some minutes. My fist thougth was, I'm blocked by the server, but this seems not to be the case. If I open a newlist shell and write (get-url url) I get the site, while I still have the same IP.

Code: Select all

(define (pick-links url)
	(setq page (get-url url))
	(println (1 20 page)) ; testing
	(write-file "page" page); also testing
	(until (not (starts-with page "ERR: HTTP document empty")) 
		(and (sleep 600000) (setq page (get-url url))))
	(setq linklist (join (find-all "<a href=([^>]+)>([^>]*)</a>" page) "<br>\n")) 
        (setq linklist 
		(replace {"} linklist "`")) ;"
	(setq parsedlist
		(parse linklist "\n"))  
	(setq page nil) )

Re: get-url -> ERR: HTTP document empty

Posted: Thu Mar 28, 2013 2:41 pm
by cormullion
It worked on the first 6 sites I tried it on. (I removed the 'write-file' line.) Perhaps it's site-specific...?

Re: get-url -> ERR: HTTP document empty

Posted: Fri Mar 29, 2013 2:31 am
by Darth.Severus
Ahhh, my usual error, thinking complex instead of doing the nearest thing. I always tried it with the same website... -> facepalm.

However, it works now. I'm using the dump option of w3m browser to save the sites to my disc before I process them.

Code: Select all

(eval-string (string {(exec "w3m } url { -dump_source -T text/html > temppage.html")}))
I think the problem was a security measure of the website, maybe it blocks non-browsers when they try to get more than two sites.

Thanks, anyway.