replace bug?

Q&A's, tips, howto's
Locked
ralph.ronnquist
Posts: 228
Joined: Mon Jun 02, 2014 1:40 am
Location: Melbourne, Australia

replace bug?

Post by ralph.ronnquist »

I ran into the following, which seems like a bug in the regex pattern handling, illustrated in the following example:

Code: Select all

> (map char (explode (replace "[‘’]" "‘" "x" 0)))
(120 120 120)
> (map char (explode (replace "‘" "‘" "x" 0)))
(120)
> (map char (explode (replace "[‘’]" "‘" "x" 2048)))
(120)
Thus, when the pattern is within brackets, the replacement of char u8216 gets replicated into each of the source bytes, whereas without the brackets, the "proper" replacement occurs. The replace is also proper with the flags code 2048 raher than 0.

newLISP v.10.6.0 32-bit on Linux IPv4/6 UTF-8 libffi.

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Re: replace bug?

Post by Lutz »

The behavior is correct. When using UTF-8 characters in PCRE character classes and not specifying the UTF-8 option (either 2048 or letter “u” in version 10.6.1), each byte in the UTF-8 multibyte character found from the character class will be replaced. Character classes are taken byte-wise if not specifying UTF-8 mode.

http://www.newlisp.org/downloads/pcrepattern.html#SEC7

ralph.ronnquist
Posts: 228
Joined: Mon Jun 02, 2014 1:40 am
Location: Melbourne, Australia

Re: replace bug?

Post by ralph.ronnquist »

Ah. Yes, of course!

And, a bit much to expect newlisp mode in emacs know and show this difference, so the dumbbell at the keyboard can go on thinking about nothing...

Locked