getting nL symbols into the find-all regex?

joejoe · Post by **joejoe** » Tue Jun 05, 2012 3:16 am

Hi -

This is my working code to grab titles from my page:

(set 'titles (find-all {<h2>.*</h2>} page $0))

Sometimes they are not <h2> tags that I have to look into to find my titles.

Sometimes they are <a href=...> tags that are inside of <h2> tags. Tough! :0)

I have definied two symbols, enclosure-tag-open and enclosure-tag-close.

Now I am trying to use these symbols in my find-all. I have tried all of these:

Code: Select all

(set 'titles (find-all {enclosure-tag-open[.*]enclosure-tag-close} page $0))

(set 'titles (find-all {(eval enclosure-tag-open).*(eval enclosure-tag-close)} page $0))

(set 'titles (find-all {(println enclosure-tag-open)[.*](println enclosure-tag-close)} page $0))

I have even tried to make my own find-all string and then somehow run it:

Code: Select all

(println "find-all {"(println enclosure-tag-open)".*"(println enclosure-tag-close)"} page $0")

Is there an easy way to have symbols replace the <h2> and </h2> below:

Code: Select all

(set 'titles (find-all {<h2>.*</h2>} page $0))

Thanks very much!

Lutz · Post by **Lutz** » Tue Jun 05, 2012 7:15 am

You could use format to embed variable enclosure strings:

http://www.newlisp.org/downloads/newlis ... tml#format

cormullion · Post by **cormullion** » Tue Jun 05, 2012 2:28 pm

Code: Select all

(set 'titles (find-all {enclosure-tag-open[.*]enclosure-tag-close} page $0))

By putting the strings 'enclosure-tag-open' and 'enclosure-tag-close' into a string, you've prevented newLISP seeing that there is anything special about them.

Code: Select all

(set 'titles (find-all {(eval enclosure-tag-open).*(eval enclosure-tag-close)} page $0))

... similarly, here you've put characters that might be newLISP code into a string. But newLISP will treat the characters as ordinary strings.

Code: Select all

(set 'titles (find-all {(println enclosure-tag-open)[.*](println enclosure-tag-close)} page $0))

... again, these characters are just strings. newLISP treats them as such.

What you need is something like this:

Code: Select all

(set 'titles (find-all (string enclosure-tag-open {.*?} enclosure-tag-close) page))

where the string function evaluates symbols and builds a string from their 'contents'.

I'm just wondering whether you're using an editor that does syntax highlighting? If you were, you could see that your symbols were being treated as plain old strings once inside string delimiters (e.g. as you see in wikibooks with the new syntax highlighting. Life will be much easier if you're currently not using syntax highlighting...

joejoe · Post by **joejoe** » Tue Jun 05, 2012 6:04 pm

string - !!!

I see, yes. Super! Thank you cormullion!

I am using the Geany text editor, and it has a LISP document filetype, which I use.

I attached what it looks like on my screen. I do not see it highlighting symbols. These were already defined further up the page, enclosure-tag-open and enclosure-tag-close.

Are you saying the symbols should be highlighted in some color w/ a good editor?

xytroxon · Post by **xytroxon** » Tue Jun 05, 2012 6:16 pm

I'm at lunch, so here is a quick and dirty post!

Notes:
process-tags function builds list of tags found into clean_tags list
add "magic" \ before the / in the regex of closing tags <\/h2>
add ? to make regex stop at first ending tag match
find-all flag 5 to process upper or lower case tags and include newlines that may occur between tags

Code: Select all

(setq page
[text]
<html>
<head><title>Test HTML</title></head>
<body>

<h1>Header H1</h1>
    blah blah blah blah
<h2>Header H2</h2>
  blah blah blah blah
<h3>Header H3</h3>
          blah blah blah blah
<h4>Header H4</h4>
 blah blah blah blah
<h5>Header H5</h5>
blah blah blah blah
<h6>Header H6</h6>

<H2><a href="http://www.newlisp.org">   newLISP Main Page  </a></H2>
<h2><a href="http://newlispfanclub.alh.net/forum/"> newLISP Forum</a></H2>

</body>
</html>
[/text]
)

(define (process-tags str)
	(println "tag: " str)
	(replace "<h2>" str "" 5)
	(replace "</h2>" str "" 5)
	(replace "<a.*?>" str "" 5)
	(replace "</a>" str "" 5)
	(setq str (trim str))
	(push str clean_tags -1)
)

; (println page)

(find-all {<h2>.*?<\/h2>} page (process-tags $0) 5)
(println)
(println clean_tags)
(exit)

Code: Select all

>"c:\program files (x86)\newlisp\newlisp.exe" "C:\Users\Programming\zx.nl"
tag: <h2>Header H2</h2>
tag: <H2><a href="http://www.newlisp.org">   newLISP Main Page  </a></H2>
tag: <h2><a href="http://newlispfanclub.alh.net/forum/"> newLISP Forum</a></H2>

("Header H2" "newLISP Main Page" "newLISP Forum")
>Exit code: 0

-- xytroxon

joejoe · Post by **joejoe** » Tue Jun 05, 2012 7:09 pm

xytroxon,

You got me in business! :0)

I will combine the find-all w/ nL symbols technique cormullion taught above with your process-tags, [a.k.a. the laundromat :]

Big kickin! :D

newlispfanclub.alh.net

getting nL symbols into the find-all regex?

getting nL symbols into the find-all regex?

Re: getting nL symbols into the find-all regex?

Re: getting nL symbols into the find-all regex?

Re: getting nL symbols into the find-all regex?

Re: getting nL symbols into the find-all regex?

Re: getting nL symbols into the find-all regex?