replace bug?

Q&A's, tips, howto's

replace bug?

Postby ralph.ronnquist » Wed Aug 20, 2014 12:37 pm

I ran into the following, which seems like a bug in the regex pattern handling, illustrated in the following example:
Code: Select all
> (map char (explode (replace "[‘’]" "‘" "x" 0)))
(120 120 120)
> (map char (explode (replace "‘" "‘" "x" 0)))
(120)
> (map char (explode (replace "[‘’]" "‘" "x" 2048)))
(120)


Thus, when the pattern is within brackets, the replacement of char u8216 gets replicated into each of the source bytes, whereas without the brackets, the "proper" replacement occurs. The replace is also proper with the flags code 2048 raher than 0.

newLISP v.10.6.0 32-bit on Linux IPv4/6 UTF-8 libffi.
ralph.ronnquist
 
Posts: 208
Joined: Mon Jun 02, 2014 1:40 am
Location: Melbourne, Australia

Re: replace bug?

Postby Lutz » Wed Aug 20, 2014 9:44 pm

The behavior is correct. When using UTF-8 characters in PCRE character classes and not specifying the UTF-8 option (either 2048 or letter “u” in version 10.6.1), each byte in the UTF-8 multibyte character found from the character class will be replaced. Character classes are taken byte-wise if not specifying UTF-8 mode.

http://www.newlisp.org/downloads/pcrepattern.html#SEC7
Lutz
 
Posts: 5279
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California

Re: replace bug?

Postby ralph.ronnquist » Wed Aug 20, 2014 11:45 pm

Ah. Yes, of course!

And, a bit much to expect newlisp mode in emacs know and show this difference, so the dumbbell at the keyboard can go on thinking about nothing...
ralph.ronnquist
 
Posts: 208
Joined: Mon Jun 02, 2014 1:40 am
Location: Melbourne, Australia


Return to newLISP in the real world

Who is online

Users browsing this forum: No registered users and 3 guests

cron