regex help

Q&A's, tips, howto's
Locked
joejoe
Posts: 173
Joined: Thu Jun 25, 2009 5:09 pm
Location: Denver, USA

regex help

Post by joejoe »

Ive tried for last three hours to get this one. :0)

I *am* able to get a regex to work with this:

Code: Select all

find-all {<title>([^.]+)<\/title>}
on the xml code below, but I tried this (and about 60 other iterations):

Code: Select all

find-all {<item>([^.]+)<\/item>}
to get the entire value of the text between the <item> and </item> tags.

Is there not a simple way to say, look for the <item> tag and then get everything until you see the </item> tag?

Any suggestion would really be appreciated. Thanks very much! :0)

Code: Select all

<item><title>Very Cute Solar Powered Plant Decoration Flower Random Color </title><description><![CDATA[<table border='0' cellpadding='8'> <tr><td> <a href= 'http://rover.ebay.com/rover/1/711-53200-19255-0/1?ff3=2&toolid=10039&campid=5337070286&item=120898233743&vectorid=229466&lgeo=1' target='_blank'><img src='http://thumbs4.ebaystatic.com/pict/1208982337434040_1.jpg' border='0'/></a></td><td><strong>$1.99</strong><br>End Date: Friday May-18-2012 18:43:55 PDT<br>Buy It Now for only: $1.99<br><a href='http://rover.ebay.com/rover/1/711-53200-19255-0/1?ff3=2&toolid=10039&campid=5337070286&item=120898233743&vectorid=229466&lgeo=1' target='_blank'>Buy It Now</a> | <a href='http://rover.ebay.com/rover/1/711-53200-19255-0/1?ff3=4&toolid=10039&campid=5337070286&vectorid=229466&lgeo=1&mpre=http%3A%2F%2Fcgi1.ebay.com%2Fws%2FeBayISAPI.dll%3FMfcISAPICommand%3DMakeTrack%26item%3D120898233743%26ssPageName%3DRSS%3AB%3ASRCH%3AUS%3A104' target='_blank'>Add to watch list</a></td></tr> </table>]]></description><pubDate>18 Apr 2012 18:53:18 GMT-07:00</pubDate><guid>120898233743</guid><link>http://rover.ebay.com/rover/1/711-53200-19255-0/1?ff3=2&toolid=10039&campid=5337070286&item=120898233743&vectorid=229466&lgeo=1</link><e:BidCount></e:BidCount><e:CurrentPrice>1.99</e:CurrentPrice><e:ListingType>FixedPrice</e:ListingType><e:BuyItNowPrice></e:BuyItNowPrice><e:ListingEndTime>2012-05-19T01:43:55.000Z</e:ListingEndTime><e:ListOrder>120898233743</e:ListOrder><e:PaymentMethod>PayPal</e:PaymentMethod></item>

m i c h a e l
Posts: 394
Joined: Wed Apr 26, 2006 3:37 am
Location: Oregon, USA
Contact:

Re: regex help

Post by m i c h a e l »

Hi joejoe!

Will this work for you?

Code: Select all

find-all {<item>(.+?)<\/item>}
Hope this helps.

m i c h a e l

joejoe
Posts: 173
Joined: Thu Jun 25, 2009 5:09 pm
Location: Denver, USA

Re: regex help

Post by joejoe »

m i c h a e l -

it sure does! i gotta remember KISS when i tackle regexes. :0)

thanks big friend!

joejoe

joejoe
Posts: 173
Joined: Thu Jun 25, 2009 5:09 pm
Location: Denver, USA

Re: regex help

Post by joejoe »

On further request on this, if I may:

I have a list which contains a lot of texts like the large one above, all similar.

What is my best approach on extracting the same item information out of each item in the list, but keeping it associated with the same item?

I want to create an xpath file, so that in the end I have something similar to this:

Code: Select all

;; <eb>
;;   <ebitem>
;;     <ebtitle>ebtitle</ebtitle>
;;     <ebprice>ebprice</ebprice>
;;     <eblink>eblink</eblink>
;;     <ebimage>ebimage</ebimage>
;;   </ebitem>
;;   <ebitem>
;;     <ebtitle>ebtitle</ebtitle>
;;     <ebprice>ebprice</ebprice>
;;     <eblink>eblink</eblink>
;;     <ebimage>ebimage</ebimage>
;;   </ebitem>
;; </eb>
Do I do a dolist and within that dolist, find-all the desired info and push the found regexes into my list template?

Or is it easier to replace the unnecessary pieces of the <item>'s and insert the xpath tags around them?

I guess what I am not sure on how to do is operate on the items in a list, performing multiple things on each item in the list, and then going to the next item in the list and doing the same thing on that item.

What would be the suggested approach to get the data out of <item> and into the xpath template?

I am new to nL and programming and just would like direction so I can carry out the work.

Thanks kindly again.

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Re: regex help

Post by cormullion »

Perhaps something like this:

Code: Select all

(dolist (item (find-all {<item>(.+?)</item>} html-text))   
    (find-all {<title>(.+?)</title>} item (set 'title $1))
    (find-all {<e:CurrentPrice>(.+?)</e:CurrentPrice>} item (set 'price $1))
    (find-all {<link>(.+?)</link>} item (set 'link $1))
    (find-all {<img src='(.+?\.jpg)'} item    (set 'image $1))    
    (println 
      (format {
        <ebitem>
          <ebtitle>%s</ebtitle>
          <ebprice>%s</ebprice>
          <eblink>%s</eblink>
          <ebimage>%s</ebimage>
        </ebitem>} 
        title price link image)))
although this will fail if the information is not found by the regexen...

joejoe
Posts: 173
Joined: Thu Jun 25, 2009 5:09 pm
Location: Denver, USA

Re: regex help

Post by joejoe »

cormullion,

To the rescue, in character, thank you!

I see now how to get data out of a dolist loop and similarly now know how to wield the incredibly useful format function.

Many big thanks for that!

Locked