Page 1 of 1

regex

Posted: Tue May 24, 2005 10:14 pm
by Sammo
This

(replace "(?<=[0-9])(?=(?:[0-9]{3})+(?![0-9]))" "1234567" "," 0)

returns

1,234567

instead of

1,234,567

The expression does the right thing (i.e., returns 1,234,567) at this test site:

http://www.fileformat.info/tool/regex.htm

Look's like (replace ...) is replacing only the first instead of all occurences in v.8.5.9.

Thanks,
-- Sam

Posted: Wed May 25, 2005 12:23 am
by Lutz
This happens when the first character sequence matched is of 0 (zero) length. This is fixed in 8.5.10.

As this is a very rare instance I want to wait to the weeekend before posting a fixed version (the problem is fixed) .

If this fix is urgent for you or anybody else, please let me know and I can post it right away.

Lutz

Posted: Wed May 25, 2005 12:31 am
by Sammo
Hi Lutz,

A fix isn't urgently needed. I can wait.

Thanks for looking into it.
-- Sam

Posted: Wed May 25, 2005 12:35 am
by Lutz
Thanks for catching this Sam, regular expressions are a feature we have to rely on 100%.

Lutz

Posted: Sat May 28, 2005 4:00 pm
by Sammo
Thank you, Lutz, for fixing the problem with 'replace' in 8.5.10. It does, indeed, work correctly now.
-- Sam

Posted: Sat May 28, 2005 5:25 pm
by Lutz
Yes, this bug affected all zero length boundary patterns using "", "^", "$" and "\b", a wonder that nobody tripped over it before you. I have bookmarked that test site, but I am still looking for a longer test suite, not for pattern searching in itself which is all PCRE, but repetitive replacement, which is coded inside newLISP. Your number formatting pattern had all the critical elements in it. I also found the following patterns, which are now included in the 'qa_dot' test-suite (dot for decimal point versus comma):

Code: Select all

; must all evaluate to true

(= (replace "" "abc" "x" 0) "xaxbxcx")
(= (replace "$" "abc" "x" 0) "abcx")
(= (replace "^" "abc" "x" 0) "xabc")
(= (replace "\\b" "abc" "x" 0) "xabcx")
Most of these even have some practical usage.

Lutz