Possible bug in xml-parse

For the Compleat Fan
Locked
Jeff
Posts: 604
Joined: Sat Apr 07, 2007 2:23 pm
Location: Ohio
Contact:

Possible bug in xml-parse

Post by Jeff »

Lutz,

When using xml-parse to read xhtml (which is valid xml), any attributes which contain quotation marks, even escaped or in a CDATA block, will cause an error:

Code: Select all

<a href="/some/path" title="something" onclick="alert(\"Hello world\")">Looky here</a>
This should validate. It would be impossible to make this render correctly using anything apart from single quotes, and that would mean forcing double quotes to single quotes as a direct translation, which might corrupt some javascript (which can use both in the same string). The following is an example that should be valid xml and demonstrates the problem:

Code: Select all

<a href="/some/path/" title="something" onclick="alert(\"Hello\" + 'world')">Looky here</a>
I could get it to validate using the html entity, but it would not function when rendered as a string. Escaping the quotes should be valid; I can't find anything forbidding this in the xhtml spec.

PS Excuse the visible entities - can't figure out any other way to display the contents of an html tag, since the BB seems to be stripping any attributes off.
Jeff
=====
Old programmers don't die. They just parse on...

Artful code

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

When you use XML with no DTD validation (xml-parse does no validation, only checks for XML being well formed) then the following characters are not allowed as of XML spec:

Code: Select all

greater
less
ampersand
quote
apostrope
Use entities to encode them.

When using CDATA, newLISP will process correctly:

Code: Select all

>  (xml-type-tags nil nil nil nil)
(nil nil nil nil)
> (xml-parse {<data><![CDATA[<>&"']]></data>} 15)
((data "<>&\"'"))
> 
but XSLT will translate all special chars in CDATA into entities, so there is no safe way to use special chars in a CDATA block. The best is to just base64 encode all CDATA strings.

Lutz

ps: note that xml-parse is an XML parser not an XHTML parser with HTML DTD validation.

Jeff
Posts: 604
Joined: Sat Apr 07, 2007 2:23 pm
Location: Ohio
Contact:

Post by Jeff »

I don't need DTD validation. I wasn't going that far with it. Checking for well-formed markup was all that I am after.
Jeff
=====
Old programmers don't die. They just parse on...

Artful code

Locked