XML parsing problems

Q&A's, tips, howto's
Locked
hilti
Posts: 140
Joined: Sun Apr 19, 2009 10:09 pm
Location: Hannover, Germany
Contact:

XML parsing problems

Post by hilti »

Hi!

I'm trying to implement a simple RSS and ATOM reading function, but I got some problems in parsing this kind of elements

Example from NYTimes.com RSS feed

Code: Select all

<dc:creator>By ROBERT D. McFADDEN</dc:creator>
This one doesn't work

Code: Select all

					; TODO: Problems in parsing media:content - ask the forum
					;(lookup '(media:content @ url) item) "<br/>"
You can check out my current efforts on this page: http://www.rundragonfly.com/dragonfly_feeds

Cheers!
Hilti
--()o Dragonfly web framework for newLISP
http://dragonfly.apptruck.de

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Re: XML parsing problems

Post by Lutz »

the xml you posted is parsed like this:

Code: Select all

> (set 'xml "<dc:creator>By ROBERT D. McFADDEN</dc:creator>")
"<dc:creator>By ROBERT D. McFADDEN</dc:creator>"

> (xml-type-tags nil nil nil nil)
(nil nil nil nil)

> (set 'sxml (xml-parse xml 31))
((dc:creator "By ROBERT D. McFADDEN"))

> (lookup (sym "dc:creator") sxml)   ;  <- the correct way to do it
"By ROBERT D. McFADDEN"
> 
you cannot say:

Code: Select all

(lookup 'dc:creator sxml) => nil ; <- wrong way
In the 'lookip' statement It takes 'dc' as a namespace qualifier because of the colon. Although 'xml-parse' created a symbol "dc:creator" with an illegal colon in it. Using (sym "dc:creator") you can still create that symbol and do the lookup.

you also could this:

Code: Select all

> (set 'creator (sym "dc:creator"))
dc:creator

> (lookup creator sxml)
"By ROBERT D. McFADDEN"
> 

hilti
Posts: 140
Joined: Sun Apr 19, 2009 10:09 pm
Location: Hannover, Germany
Contact:

Re: XML parsing problems

Post by hilti »

Thanks Lutz! Parsing for elements like

Code: Select all

<dc:creator>By ALISSA J. RUBIN</dc:creator>
is working great. See it here: http://www.rundragonfly.com/dragonfly_feeds

But one thing I can't get working - the media elements, e.g. NYTimes.com RSS feed

Code: Select all

<media:content url="http://graphics8.nytimes.com/images/2009/11/08/world/08basra_CA0/thumbStandard.jpg" medium="image" height="75" width="75"/>
I don't know how to access the "url" and "medium" elements.

Thanks for help!
Hilti (slowly getting a RSS parsing expert ;-)
--()o Dragonfly web framework for newLISP
http://dragonfly.apptruck.de

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Re: XML parsing problems

Post by cormullion »

If you can get as far as this:

Code: Select all

(set 'mc '(media:content 
    (@ 
      (url "http://graphics8.nytimes.com/images/2009/11/08/world/08peru_CA0/thumbStandard.jpg")  
      (medium "image")  
      (height "75")  
      (width "75"))))
then

Code: Select all

(lookup 'url (rest (first (rest mc))))
will get

http://graphics8.nytimes.com/images/200 ... andard.jpg

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Re: XML parsing problems

Post by Lutz »

Another method uses the 'ref' function:

Code: Select all

(set 'mc '(media:content
    (@
      (url "http://graphics8.nytimes.com/images/.../thumbStandard.jpg") 
      (medium "image") 
      (height "75") 
      (width "75"))))

(last (mc (ref '(url *) mc match))) 

;=> "http://graphics8.nytimes.com/.../thumbStandard.jpg"
The advantage here is, that 'ref' reaches into all nesting levels of a list. The above solution wouldn't need to change if the structure of the XML changes. As long as there is an (url *) elelement in it, 'ref' will find it.

'ref-all' is a version of 'ref' which returns all index vectors found, not only the first one:

Code: Select all

(set 'mc '(( x y z (url "A.com")) (q ( z (url "B.com"))) (url "C.com")))

(set 'vectors (ref-all '(url *) mc match)) ;=> ((0 3) (1 1 1) (2))

(map last (map 'mc vectors)) ;=> ("A.com" "B.com" "C.com")
What few people know is, that indexing a list also can be mapped by mapping the data structure like a function on to the list of indexes. In this case 'mc' is an complex nested data structure with all urls on a different level, but all are found. Note that 'mc' must be quoted in the mapping expression.

itistoday
Posts: 429
Joined: Sun Dec 02, 2007 5:10 pm
Contact:

Re: XML parsing problems

Post by itistoday »

Lutz wrote:

Code: Select all

(map last (map 'mc vectors)) ;=> ("A.com" "B.com" "C.com")
What few people know is, that indexing a list also can be mapped by mapping the data structure like a function on to the list of indexes. In this case 'mc' is an complex nested data structure with all urls on a different level, but all are found. Note that 'mc' must be quoted in the mapping expression.
That is nifty! :-)
Get your Objective newLISP groove on.

Locked