Newbie question

Q&A's, tips, howto's
Locked
paweston2003
Posts: 2
Joined: Sun Jan 27, 2013 12:13 am

Newbie question

Post by paweston2003 »

I'm new to newLISP and to LISP in general.

I am trying return a slice of an xml-parsed nested list but I can't work out the syntax. Essentially I'm trying isolate all the list between the first and last horizontal rules. I have made a kludge which seems to work, but I would like to know how to do this correctly.

Here is what I have come up with so far:

Code: Select all

;Try to open the file given in the main args
(if (not (set 'myxml (read-file (main-args -1))))
	((println (rest (sys-error)))(exit 1)))

;Set the flags so that the XML parser doesn't return "Element" "CType", etc.
(xml-type-tags nil nil nil nil)

;Hopefully parse the text into an s-expression list. The numbers at the end of xml-parse are the parser settings.
(if (not (set 'myxml (xml-parse myxml (+ 1 2 8))))
	((println (first (xml-error))) (exit 1)))

;Isolate the <body>
(set 'myxml (myxml 0 2))

;Discard everything before the first <HR>
(set 'myxml (slice myxml (inc (first (ref 'HR myxml)))))

;Discard everything after the remaining <HR>
(set 'myxml (slice myxml 0 (first (ref 'HR myxml))))

;Output to console to be piped or redirected
(println myxml)

(exit)
Any help, or direction to help, would be appreciated.

-Peter Weston

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Re: Newbie question

Post by Lutz »

Looks like the right approach to me - using 'ref' and manipulating multi dimensional indices - but there are others with more hands-on XML experience than I have.

In any case: Welcome to newLISP

hilti
Posts: 140
Joined: Sun Apr 19, 2009 10:09 pm
Location: Hannover, Germany
Contact:

Re: Newbie question

Post by hilti »

Welcome Peter!

Would You please post an example of Your XML. It's easier to help then.

Thanks
- Marc
--()o Dragonfly web framework for newLISP
http://dragonfly.apptruck.de

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Re: Newbie question

Post by cormullion »

Well, if it works reliably on your test examples, it's 'good enough' code! :)

I think there's no real alternative to carefully slicing it up like you're doing. Although by slicing the XML into pieces presumably it's no longer valid XML when it comes out...?

paweston2003
Posts: 2
Joined: Sun Jan 27, 2013 12:13 am

Re: Newbie question

Post by paweston2003 »

cormullion wrote:Well, if it works reliably on your test examples, it's 'good enough' code! :)

I think there's no real alternative to carefully slicing it up like you're doing. Although by slicing the XML into pieces presumably it's no longer valid XML when it comes out...?
That's right. I was thinking of outputting some basic markdown, and then importing it into the Retext program. After I posted this I thought of chopping up the text before feeding it into xml-parse. However it will definitely be malformed then. I will see if that works when I get home from work. I can always paste on html and body tags to reform it.
hilti wrote:Welcome Peter!

Would You please post an example of Your XML. It's easier to help then.

Thanks
- Marc
It is the "Introduction to Scheme" at uTexas. I saw that it was in the "garbage" folder there, I want to read it, but the format is awful. Retext will output this as pdf. I found it while trying to figure out "(+1- var)" syntax. I definitely prefer "dec".

ftp://ftp.cs.utexas.edu/pub/garbage/cs3 ... html#SEC12

Thanks-
Peter

rickyboy
Posts: 607
Joined: Fri Apr 08, 2005 7:13 pm
Location: Front Royal, Virginia

Re: Newbie question

Post by rickyboy »

Obliquely related, instapaper seems to do a nice job on it. (I picked the narrowest margin setting — from the top-right corner button — to get the code snippets out at the right width. You might have to set that too.)

Check it out: http://www.instapaper.com/text?u=ftp%3A ... ml%23SEC12
(λx. x x) (λx. x x)

rickyboy
Posts: 607
Joined: Fri Apr 08, 2005 7:13 pm
Location: Front Royal, Virginia

Re: Newbie question

Post by rickyboy »

About the approach, Lutz is right — that's basically the approach I and others use. One comment though, and you've probably seen this already, xml-parse will balk at almost surely all HTML which predates XHTML, like this one. In general, it's a good idea to pre-tidy your input, for this reason.

In the case of this Scheme book snippet, I could NOT get xml-parse to eat it, as is.

Code: Select all

> (xml-parse (read-file "schintro_12.html") (+ 1 2 8))
nil
> (xml-error)
("closing tag doesn't match" 3228)
As you no doubt found out, xml-parse hit the </BODY> tag scanning the input and said "Hey! You're closing the BODY element but I have two HR elements that need to be closed first. Game over, man!" :)

However, after popping off to the command line and running tidy on the input, I had me some XHTML which xml-parse happily processed. (Of course, the <HR>s were converted to <hr />s, and all was well with the world.)
(λx. x x) (λx. x x)

Locked