It is able to parse structured tagged text like an html, is aware of unclosed tags and uses regexp for tags definition.
example:
for html:
Code: Select all
<html><body>
test
<table align=center><tr><td>test1</td><tr><td>test2<td>test3</table>
</body></html>
with syntax rules:
Code: Select all
; tag format: (tag-sym tag-pattern tag-open|close|self (closes-tag closes-tag))
; tag-open - open a sublist and lead it
; tag-close - close a sublists and don't leave himself
; tag-self - close a sublists and leave himself
(set 'html-tags '(
(table "<table(| [^>]*)>" tag-open ())
(table/ "</table>" tag-close (table th tr td))
(tr "<tr(| [^>]*)>" tag-open (tr th td))
(tr/ "</tr>" tag-close (tr th td))
(th "<th(| [^>]*)>" tag-open (th td))
(th/ "</th>" tag-close (th td))
(td "<td(| [^>]*)>" tag-open (th td))
(td/ "</td>" tag-close (th td))
(br "<br>" tag-self ())
(hr "<hr(| [^>]*)>" tag-self ())
(p "<p>" tag-self ())))
Code: Select all
> (set 'htm (TAGS:parse-tags TAGS:html-tags (read-file "example.html")))
("<html><body>\ntest\n" TAGS:table TAGS:tr TAGS:td "test1" TAGS:td/ TAGS:tr
TAGS:td "test2" TAGS:td "test3" TAGS:table/ "\n</body></html>\n")
Code: Select all
> (TAGS:structure-tags TAGS:html-tags htm)
(TAGS:data "<html><body>\ntest\n"
(TAGS:table (TAGS:tr (TAGS:td "test1"))
(TAGS:tr (TAGS:td "test2") (TAGS:td "test3")))
"\n</body></html>\n")
With such nested list, parsing of html-tables becames relatively useful...