rexeg-comp ?BUG?

Q&A's, tips, howto's
Locked
newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

rexeg-comp ?BUG?

Post by newdep »

Using regex-comp under windows.
(didnt test it under unix)


Btw I think the manual should provide a better exmaple of the use regex-
comp.. I got it from Python that I know how this works..


This works ->

Code: Select all

> (find-all {\d+} "1 2 3 4 numbers upto 5 6 7 8 in it." )
("1" "2" "3" "4" "5" "6" "7" "8")


This doesnt ->

Code: Select all

(setq p1 (regex-comp {\d+} )) 
"ERCP1\000\000\000\000\000\000\004\000\000\000\000\000\000\000\000d\000\000\000(\000\003\000\000\000\000\000\000\000\000\000\000\000\000\000P\000\005,\006B\000\005\000"

> (find-all p1 "1 2 3 4 numbers upto 5 6 7 8 in it." 0x10000 )
()

And replace does work directly with regex-comp..

Code: Select all

> (replace p1 "1 2 3 4 numbers upto 5 6 7 8 in it." "+" 0x10000 )
"+ + + + numbers upto + + + + in it."

Recapturing... or only replace does work or my example doesnt work..
Anyway.. It took me far too much time to get a simple find-all working with a regex-comp..got actualy irritated by it because I needed something
quickly.. Spending 1 hour on this isnt realy quick.. anyway.. Is it a bug?
-- (define? (Cornflakes))

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

Your syntax is not correct, there is a translation expression before the options number:

http://www.newlisp.org/downloads/newlis ... l#find-all

Code: Select all

> (find-all p1 "1 2 3 4 numbers upto 5 6 7 8 in it." $0 0x10000 )
("1" "2" "3" "4" "5" "6" "7" "8")
> 
Also using a precompiled pattern in the above example is redundant, as newLISP caches the compilation of the last pattern automatically.

http://www.newlisp.org/downloads/newlis ... regex-comp

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

..Yes.. I actualy still dont get it ;-)

from the regex-comp manual part there is the mentioning of the 0x10000
to point out a compiled regex to always be there..

the normal (find-all {\d+} ...) does work without the $0.

From the find-all manual part I still dont see any link between the use of
the $0 and the compiled regex...

If (find-all {\d+} "131numbers234" )
results in a list, logicaly I dont have to think any further and use

(find-all p1 "1234more2341" 0x10000)

or even

(find-all p1 "1234more2341")

to get it working with compiled regex..

..And actualy "right here" at the point of not-logical (in my case) it is
crusial/critical in use to the user..


I just think the regex based functions arnt working in the way of logical
expectation of returned results, in default setup.

Working on complex functions creates automaticly a mind-set of complex/simple results.

But a simple function expects a simple handling and result..

Now the question is "what is logical" ;-) Im just writing from the user
perspective..


edited ->

Btw.. even this works out of the box ->
> (setq p2 {\d+})
"\\d+"
> (find-all p2 "1 2 3 4 numbers upto 5 6 7 8 in it.")
("1" "2" "3" "4" "5" "6" "7" "8")

Wether p is a compiled regex or a string it should simply not differ to newlisp..
-- (define? (Cornflakes))

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

'find-all' uses regular expressions by default, which makes sense, because a non-regex 'find-all' would just return the same element multiple times in a list.

So using a regex option is relatively rare. It is much more frequent, that you want to process the found element, so the first optional parameter in 'find-all' is not the a regex option number but a processing expression.

So the most frequent use: "just return the found results in a list" is also the most simple syntax pattern. The first option then is an expression to process the result.

I believe if you start using 'find-all' more often, you will agree ;-)

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

It is that newlisp isnt written in newlisp, else i would have rebuild this ;-)
But you leave me no other choise than fighting regex with regex.. Yes I agree..I need to use find-all more but only when im in the need of finding it all more ;-)
-- (define? (Cornflakes))

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

Lutz,

Wondering.. How easy is it to de-compile the regex-comp result?

If someone provided me a compiled regex for use in newlisp but I
want to adjust it.. Than it would be useful to be able to de-compile it.
Or even if you want to know whats inside..

because.. for now ..nobody can read this

Code: Select all

"ERCP\160\000\000\000\016\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000(\000\003\000\000\000\000\000\000\000\000\000\000\000\000\000P\000t\019\021\"\021t\021h\021i\021s\021 \021i\021s\021 \021a\021 \021h\021i\021d\021d\021e\021n\021 \021m\021e\021s\021s\021a\021g\021e\021 \021n\021o\021t\021 \021b\021e\021i\021n\021g\021 \021d\021e\021-\021c\021o\021m\021p\021i\021l\021e\021d\021 \021f\021o\021r\021 \021n\021o\021w\021\"B\000t\000"



edited..

Actualy there is a way on how to decode this in newlisp and its logic , but only for this regex-comp... others im unable to decode..;-)

--> (find-all {[a-zA-Z' '\"]+} secret) <--
-- (define? (Cornflakes))

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

Perhaps what is needed is a program translating a regex source pattern into plain English. Regex patterns on the net are all published as source, and they are hard enough to read already in source format.

There are really relative few cases, where you would want to pre-compile regex patterns. newLISP compiles patterns automatically and always caches the last one already. So you need repeatedly alternating patterns to really take advantage of the 'regex-comp' function.

I wonder if "ERCP" in the compiled string means "PCRE" backwards? In the PCRE source this string doesn't occur once and the the function pcre_compile() is about 1200 lines of code (one! function).

I wouldn't worry too much about pre-compiling regex patterns, except if you have to address a specific performance bottleneck.

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

Perhaps what is needed is a program translating a regex source pattern into plain English

..Perhpas not fully but this is what i was trying to work out..
I want a more natural search inside my code but there a different ways
on doing this actualy. Using a 'define-macro or a FOOP way Class.
Then there is ofcourse the content handling, yes regex.

Code: Select all

Using this has its advantage of not needing to eval the args.
..but thats also a disadvantage.

(define (return:return) (find-all (string (args 1) (args 0)) (args -1)))

Then there is the question of "readability"..

Code: Select all


Is this more readable?

    (return only letters from line)

or this
    
    (return:letters line)



Now an example..

Code: Select all


(setq line "This is a line with letters, digits 23141234 and numbers 12354125")
(setq machine "you have $10.00 in deposit")

(constant 'only "+")
(constant 'all only)
(constant 'numbers "[0-9]")
(constant 'letters "[a-zA-Z]")
(constant 'money {\$[0-9|\.]+})

(define (return:return) (find-all (string (args 1) (args 0)) (args -1)))
(define (return:letters) (find-all {[a-zA-Z]+} (args -1)))
(define (return:numbers) (find-all {[0-9]+} (args -1)))
(define (return:money) (find-all  {\$[0-9|\.]+} (args -1)))
(define (return:time)  (find-all {\d{1,2}:\d{1,2} | \d{1,2}:\d{1,2}:\d{1,2}} (args -1)))
(define (return:ip)    (find-all {\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}} (args -1)))


(return only letters from line)
> ("This" "is" "a" "line" "with" "letters" "digits" "and" "numbers")

(return all money from machine)
> ("$10.00")

(return:letters line)
>("This" "is" "a" "line" "with" "letters" "digits" "and" "numbers")

(return:money   machine)
>("$10.00")

(return:numbers {n0th1ng h3re})
>("0" "1" "3")

(return:time    {yesterday at 10:00 AM or 22:00:23})
>("10:00 " " 22:00:23")

(return:ip      {Elvis did use 172.172.172.172 But Joe used 1.1.1.1})
>("172.172.172.172" "1.1.1.1")

-- (define? (Cornflakes))

Locked