getting nL symbols into the find-all regex?

Q&A's, tips, howto's
Locked
joejoe
Posts: 173
Joined: Thu Jun 25, 2009 5:09 pm
Location: Denver, USA

getting nL symbols into the find-all regex?

Post by joejoe »

Hi -

This is my working code to grab titles from my page:

Code: Select all

(set 'titles (find-all {<h2>.*</h2>} page $0))
Sometimes they are not <h2> tags that I have to look into to find my titles.

Sometimes they are <a href=...> tags that are inside of <h2> tags. Tough! :0)

I have definied two symbols, enclosure-tag-open and enclosure-tag-close.

Now I am trying to use these symbols in my find-all. I have tried all of these:

Code: Select all

(set 'titles (find-all {enclosure-tag-open[.*]enclosure-tag-close} page $0))

(set 'titles (find-all {(eval enclosure-tag-open).*(eval enclosure-tag-close)} page $0))

(set 'titles (find-all {(println enclosure-tag-open)[.*](println enclosure-tag-close)} page $0))
I have even tried to make my own find-all string and then somehow run it:

Code: Select all

(println "find-all {"(println enclosure-tag-open)".*"(println enclosure-tag-close)"} page $0")
Is there an easy way to have symbols replace the <h2> and </h2> below:

Code: Select all

(set 'titles (find-all {<h2>.*</h2>} page $0))
Thanks very much!

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Re: getting nL symbols into the find-all regex?

Post by Lutz »

You could use format to embed variable enclosure strings:

http://www.newlisp.org/downloads/newlis ... tml#format

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Re: getting nL symbols into the find-all regex?

Post by cormullion »

Code: Select all

(set 'titles (find-all {enclosure-tag-open[.*]enclosure-tag-close} page $0))
By putting the strings 'enclosure-tag-open' and 'enclosure-tag-close' into a string, you've prevented newLISP seeing that there is anything special about them.

Code: Select all

(set 'titles (find-all {(eval enclosure-tag-open).*(eval enclosure-tag-close)} page $0))
... similarly, here you've put characters that might be newLISP code into a string. But newLISP will treat the characters as ordinary strings.

Code: Select all

(set 'titles (find-all {(println enclosure-tag-open)[.*](println enclosure-tag-close)} page $0))
... again, these characters are just strings. newLISP treats them as such.

What you need is something like this:

Code: Select all

(set 'titles (find-all (string enclosure-tag-open {.*?} enclosure-tag-close) page))
where the string function evaluates symbols and builds a string from their 'contents'.

I'm just wondering whether you're using an editor that does syntax highlighting? If you were, you could see that your symbols were being treated as plain old strings once inside string delimiters (e.g. as you see in wikibooks with the new syntax highlighting. Life will be much easier if you're currently not using syntax highlighting...

joejoe
Posts: 173
Joined: Thu Jun 25, 2009 5:09 pm
Location: Denver, USA

Re: getting nL symbols into the find-all regex?

Post by joejoe »

string - !!!

I see, yes. Super! Thank you cormullion!

I am using the Geany text editor, and it has a LISP document filetype, which I use.

I attached what it looks like on my screen. I do not see it highlighting symbols. These were already defined further up the page, enclosure-tag-open and enclosure-tag-close.

Are you saying the symbols should be highlighted in some color w/ a good editor?
Attachments
highlight.png
highlight.png (26.5 KiB) Viewed 4876 times

xytroxon
Posts: 296
Joined: Tue Nov 06, 2007 3:59 pm
Contact:

Re: getting nL symbols into the find-all regex?

Post by xytroxon »

I'm at lunch, so here is a quick and dirty post!

Notes:
process-tags function builds list of tags found into clean_tags list
add "magic" \ before the / in the regex of closing tags <\/h2>
add ? to make regex stop at first ending tag match
find-all flag 5 to process upper or lower case tags and include newlines that may occur between tags

Code: Select all

(setq page
[text]
<html>
<head><title>Test HTML</title></head>
<body>

<h1>Header H1</h1>
    blah blah blah blah
<h2>Header H2</h2>
  blah blah blah blah
<h3>Header H3</h3>
          blah blah blah blah
<h4>Header H4</h4>
 blah blah blah blah
<h5>Header H5</h5>
blah blah blah blah
<h6>Header H6</h6>

<H2><a href="http://www.newlisp.org">   newLISP Main Page  </a></H2>
<h2><a href="http://newlispfanclub.alh.net/forum/"> newLISP Forum</a></H2>

</body>
</html>
[/text]
)

(define (process-tags str)
	(println "tag: " str)
	(replace "<h2>" str "" 5)
	(replace "</h2>" str "" 5)
	(replace "<a.*?>" str "" 5)
	(replace "</a>" str "" 5)
	(setq str (trim str))
	(push str clean_tags -1)
)

; (println page)

(find-all {<h2>.*?<\/h2>} page (process-tags $0) 5)
(println)
(println clean_tags)
(exit)

Code: Select all

>"c:\program files (x86)\newlisp\newlisp.exe" "C:\Users\Programming\zx.nl"
tag: <h2>Header H2</h2>
tag: <H2><a href="http://www.newlisp.org">   newLISP Main Page  </a></H2>
tag: <h2><a href="http://newlispfanclub.alh.net/forum/"> newLISP Forum</a></H2>

("Header H2" "newLISP Main Page" "newLISP Forum")
>Exit code: 0

-- xytroxon
"Many computers can print only capital letters, so we shall not use lowercase letters."
-- Let's Talk Lisp (c) 1976

joejoe
Posts: 173
Joined: Thu Jun 25, 2009 5:09 pm
Location: Denver, USA

Re: getting nL symbols into the find-all regex?

Post by joejoe »

xytroxon,

You got me in business! :0)

I will combine the find-all w/ nL symbols technique cormullion taught above with your process-tags, [a.k.a. the laundromat :]

Big kickin! :D

Locked