Regular expression are part of the PCRE
http://pcre.org library code newLISP is using. When PCRE gets compiled it gets compiled for upper/lower-casing, case flipping and character classifying of (letters, numbers, hex-digit etc.) for a specific locale.
In the standard newLISP distribution a file: pcre-chartables.c is contained, which gets automatically generated for a specific locale. In newLISP this locale is the so called 'C'-locale. It does casing etc. only for the first page of one-byte characters in the UTF-8 character set, but guarantees internationally consistent behavior of newLISP at least in the English language. When newLISP starts up, it pus itself into this locale.
As a workaround you could do something like this:
Code: Select all
(find-all (lower-case search-str) (lower-case text-str))
Of course this depends on the newLISP 'upper/lower-case' routines working correctly in your locale's UTF-8 implementation and for the character set used, which should have tables working for the C-libraries towupper() and towlower() functions to pick the right character and case.
Last not least, when using UTF-8 code all regex flags should be or'ed wirh 2048 (see docs for regex). It makes the following difference:
Code: Select all
; wrong because (char 937) should count as only one UTF-8 character
(find (append "." (char 937) ".") (append (char 937) (char 937) (char 937)) 0) => 1
; correct because the first to bytes in (char 937) form one UTF-8 character
(find (append "." (char 937) ".") (append (char 937) (char 937) (char 937)) 2048) => 0
The character used here is the Greek Omega character. I have coded it as (char 937), so you can copy/paste the code without problems. This is what I raelly did:
Code: Select all
(find ".Ω." "ΩΩΩ" 2048) => 0 ; correct offset 0