get-url -> ERR: HTTP document empty

Q&A's, tips, howto's
Locked
Darth.Severus
Posts: 14
Joined: Mon Sep 17, 2012 3:28 am

get-url -> ERR: HTTP document empty

Post by Darth.Severus »

I'm using some code in a program, to get the links of a website incl. subpages. But it's not working, I often get "ERR: HTTP document empty". I puted a until loop into the code, so that it tries it several times, always after some minutes. My fist thougth was, I'm blocked by the server, but this seems not to be the case. If I open a newlist shell and write (get-url url) I get the site, while I still have the same IP.

Code: Select all

(define (pick-links url)
	(setq page (get-url url))
	(println (1 20 page)) ; testing
	(write-file "page" page); also testing
	(until (not (starts-with page "ERR: HTTP document empty")) 
		(and (sleep 600000) (setq page (get-url url))))
	(setq linklist (join (find-all "<a href=([^>]+)>([^>]*)</a>" page) "<br>\n")) 
        (setq linklist 
		(replace {"} linklist "`")) ;"
	(setq parsedlist
		(parse linklist "\n"))  
	(setq page nil) )

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Re: get-url -> ERR: HTTP document empty

Post by cormullion »

It worked on the first 6 sites I tried it on. (I removed the 'write-file' line.) Perhaps it's site-specific...?

Darth.Severus
Posts: 14
Joined: Mon Sep 17, 2012 3:28 am

Re: get-url -> ERR: HTTP document empty

Post by Darth.Severus »

Ahhh, my usual error, thinking complex instead of doing the nearest thing. I always tried it with the same website... -> facepalm.

However, it works now. I'm using the dump option of w3m browser to save the sites to my disc before I process them.

Code: Select all

(eval-string (string {(exec "w3m } url { -dump_source -T text/html > temppage.html")}))
I think the problem was a security measure of the website, maybe it blocks non-browsers when they try to get more than two sites.

Thanks, anyway.

Locked