regex

For the Compleat Fan
Locked
Sammo
Posts: 180
Joined: Sat Dec 06, 2003 6:11 pm
Location: Loveland, Colorado USA

regex

Post by Sammo »

This

(replace "(?<=[0-9])(?=(?:[0-9]{3})+(?![0-9]))" "1234567" "," 0)

returns

1,234567

instead of

1,234,567

The expression does the right thing (i.e., returns 1,234,567) at this test site:

http://www.fileformat.info/tool/regex.htm

Look's like (replace ...) is replacing only the first instead of all occurences in v.8.5.9.

Thanks,
-- Sam

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

This happens when the first character sequence matched is of 0 (zero) length. This is fixed in 8.5.10.

As this is a very rare instance I want to wait to the weeekend before posting a fixed version (the problem is fixed) .

If this fix is urgent for you or anybody else, please let me know and I can post it right away.

Lutz
Last edited by Lutz on Wed May 25, 2005 12:32 am, edited 1 time in total.

Sammo
Posts: 180
Joined: Sat Dec 06, 2003 6:11 pm
Location: Loveland, Colorado USA

Post by Sammo »

Hi Lutz,

A fix isn't urgently needed. I can wait.

Thanks for looking into it.
-- Sam

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

Thanks for catching this Sam, regular expressions are a feature we have to rely on 100%.

Lutz

Sammo
Posts: 180
Joined: Sat Dec 06, 2003 6:11 pm
Location: Loveland, Colorado USA

Post by Sammo »

Thank you, Lutz, for fixing the problem with 'replace' in 8.5.10. It does, indeed, work correctly now.
-- Sam

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

Yes, this bug affected all zero length boundary patterns using "", "^", "$" and "\b", a wonder that nobody tripped over it before you. I have bookmarked that test site, but I am still looking for a longer test suite, not for pattern searching in itself which is all PCRE, but repetitive replacement, which is coded inside newLISP. Your number formatting pattern had all the critical elements in it. I also found the following patterns, which are now included in the 'qa_dot' test-suite (dot for decimal point versus comma):

Code: Select all

; must all evaluate to true

(= (replace "" "abc" "x" 0) "xaxbxcx")
(= (replace "$" "abc" "x" 0) "abcx")
(= (replace "^" "abc" "x" 0) "xabc")
(= (replace "\\b" "abc" "x" 0) "xabcx")
Most of these even have some practical usage.

Lutz

Locked