Find-all + Replace = Crash

Q&A's, tips, howto's
Locked
Fritz
Posts: 66
Joined: Sun Sep 27, 2009 12:08 am
Location: Russia

Find-all + Replace = Crash

Post by Fritz »

I meet some problems during attepts to create (html-parse) function. Here are two functions, which do the same: both function suppose to extract data from "td" tags.

Code: Select all

(define (sacar-td linea)
  (set 'alveolos (find-all "(<td)(.*?)(</td>)" linea $0 1))
  (map (fn (x) (replace "</?td(.*?)>" x "" 1)) alveolos))

(define (crash-td linea)
    (find-all "(<td)(.*?)(</td>)" linea (replace "</?td(.*?)>" $0 "" 1) 1))

(set 'testrow "<tr><td class='kin'>Alpha</td><td>Gamma</td></tr>")
Longer one, "(sacar-td testrow)" works ok. Shorter one, "(crash-td testrow)", crashes the shell:

> (sacar-td testrow)
("Alpha" "Gamma")
> (crash-td testrow)
*** glibc detected *** /usr/bin/newlisp: double free or corruption (fasttop): 0x080cc808 ***
======= Backtrace: =========
/lib/tls/i686/cmov/libc.so.6[0xb7e31a85]
/lib/tls/i686/cmov/libc.so.6(cfree+0x90)[0xb7e354f0]
...

It is the first strange thing.

Second problem -- my repexps works ok only if I replace all "\n" to " " before searching.

Update: accidentaly found solution for the second problem, it is "(?s)" key. Now my "parse-html" function works:

Code: Select all

; Usage (parse-html (get-url "http://www.newlisp.org/downloads/newlisp_manual.html"))
(define (parse-html texto)
  (map sacar-table (find-all "(?s)(<table)(.*?)(</table>)" texto $0 1)))

(define (sacar-td linea)
  (set 'alveolos (find-all "(<t[dh])(.*?)(</t[dh]>)" linea $0 1))
  (map (fn (x) (replace "</?t[dh](.*?)>" x "" 1)) alveolos))

(define (sacar-table linea)
  (map sacar-td (find-all "(?s)(<tr)(.*?)(</tr>)" linea $0 1)))

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

Do it this way:

Code: Select all

(define (crash-td linea)
    (find-all "(<td)(.*?)(</td>)" linea (replace "</td>" (copy $0) "" 1) 1))


> (set 'testrow "<tr><td>Alpha</td><td>Gamma</td></tr>")

> (crash-td testrow)
("Alpha" "Gamma")
>
Replace is trying to make replacement in $0 while at the same copying to it the piece to replace. This will throw a protection error in the future.

Code: Select all

(define (mangle str) 
    (replace "</td>" str "" 1)

(define (crash-td linea)
    (find-all "(<td)(.*?)(</td>)" linea (mangle str) 1))

Use (copy $0) or (copy $it).
Last edited by Lutz on Sun Oct 18, 2009 6:32 pm, edited 2 times in total.

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

Lutz wrote:In a future version $0 only the anaphoric system variable $it will contain the found piece. Trying to change $it will cause a protection error. You would then use (copy $it). Today both $0 and $it contain the found piece.
Not sure what this means? Are you proposing to change the operation of $0 in replace?

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

sorry I mistyped, now corrected.

Nothing will change for 'replace' or 'find' and all other functions doing using regular expressions.

Currently in 'set-ref', and 'set-ref-all', both $0 and $it are set to the found item. For the next version I only mention the usage of $it for these functions and took the usage of $0 for these functions out of the documentation. They will work, but are deprecated and usage of $0 for 'set-ref', and 'set-ref-all' be removed in 10.2 or 10.3, sometime 2010 or 2011. They will be mentioned in the deprecation chapter (2) of the manual.

When doing 'replace' on $0 this can cause a crash and will be flagged with a protection error in the future.

In other words in the future the usage of $0 to $15 will be limited to regular expression searches, all other situations will use the anaphoric $it.

There is one other usage of $0, as a count in 'replace' and 'read-expr', and I haven't decided yet if this good or not good. Perhaps a more descriptive $count should be introduced?
Last edited by Lutz on Sun Oct 18, 2009 6:35 pm, edited 2 times in total.

Fritz
Posts: 66
Joined: Sun Sep 27, 2009 12:08 am
Location: Russia

Post by Fritz »

Thank you, now it is a bit shorter

Code: Select all

(define (parse-html texto)
  (map (fn (x) (map (fn (y)
    (find-all "(?si)(<t[dh])(.*?)(</t[dh]>)" y
      (replace "(?si)</?t[dh](.*?)>" (copy $it) "")))
    (find-all "(?si)(<tr)(.*?)(</tr>)" x)))
  (find-all "(?si)(<table)(.*?)(</table>)" texto)))

Locked