Multiline support for regexes not working, 10.0.2 -> 10.0

Q&A's, tips, howto's
Locked
TedWalther
Posts: 608
Joined: Mon Feb 05, 2007 1:04 am
Location: Abbotsford, BC
Contact:

Multiline support for regexes not working, 10.0.2 -> 10.0

Post by TedWalther »

I did this:

Code: Select all

(setq x "hee hee\nhi hi\nho ho ho")
(find-all "hee.*hi" (| 2 4 8 512))
=> nil
Long and short, I tried almost every PCRE option and every combination, but nothing seemed to allow the . to match a newline, not even the option (4) that says it allows . to match newline.

Perhaps it is related to another thing I stumbled on,

Code: Select all

(trim "\n     foo   ")
=> "\n    foo"
Is it supposed to work like that?

I am running OpenBSD, Ubuntu, and Debian. All are on 64bit AMD or Intel platforms with dual or better core. All three show the problem.

HPW
Posts: 1390
Joined: Thu Sep 26, 2002 9:15 am
Location: Germany
Contact:

Post by HPW »

Didn't you mean:

Code: Select all

(setq x "hee hee\nhi hi\nho ho ho") 
(find-all "hee.*hi" x $0 5)

("hee hee\nhi hi")
Mabe the doc is not clear:

find-all
syntax: (find-all str-pattern str-text [expr [int-option]])

It can be that [expr [int-option]] means that when int-option is needed you have to provide both.
Hans-Peter

xytroxon
Posts: 296
Joined: Tue Nov 06, 2007 3:59 pm
Contact:

Post by xytroxon »

trim only removes characters of one value (default = space) at a time...

I use replace with a regex that looks for whitespace \s and replaces it with "" (empty string)...

Code: Select all

newLISP v.10.0.4 on Win32 IPv4, execute 'newlisp -h' for more info

> (set 'raw_str "  \n     foo\r\n\tbar    \n ")
"  \n     foo\r\n\tbar    \n "
> (replace "^\s+|\s*$" raw_str "" 0)
"foo\r\n\tbar"
>
-- xytroxon

P.S. Or use this form to select which whitespace characters to strip...

Code: Select all

(replace "^[ \t\r\n]+|[ \t\r\n]+$" raw_str "" 0)
"Many computers can print only capital letters, so we shall not use lowercase letters."
-- Let's Talk Lisp (c) 1976

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

It's a common pattern in newLISP to have a function described in the documentation as, eg:

Code: Select all

(func arg1 [arg2 [arg3]])
This can be called as:

Code: Select all

(func arg1)

(func arg1 arg2)

(func arg1 arg2 arg3)
where arg2 is optional unless arg3 is needed. What you can't do is:

Code: Select all

(func arg1 arg3)
because arg3 will be treated as if it was arg2.

find-all is worth studying in detail - Jeff wrote a good article about it: http://www.artfulcode.net/articles/usin ... -find-all/.

I don't think this nested bracket notation is described in the newLISP manual - it might be a useful addition one day. Although I think it's fairly standard...

TedWalther
Posts: 608
Joined: Mon Feb 05, 2007 1:04 am
Location: Abbotsford, BC
Contact:

Post by TedWalther »

cormullion wrote:It's a common pattern in newLISP to have a function described in the documentation as, eg:

Code: Select all

(func arg1 [arg2 [arg3]])
This can be called as:

Code: Select all

(func arg1)

(func arg1 arg2)

(func arg1 arg2 arg3)
where arg2 is optional unless arg3 is needed. What you can't do is:

Code: Select all

(func arg1 arg3)
because arg3 will be treated as if it was arg2.

find-all is worth studying in detail - Jeff wrote a good article about it: http://www.artfulcode.net/articles/usin ... -find-all/.

I don't think this nested bracket notation is described in the newLISP manual - it might be a useful addition one day. Although I think it's fairly standard...
Thanks, I did try with and without the option previous to the last option. Now I try the example again with that missing option, it sort of works. Now I'm wondering if PCRE has a "SUPERUNGREEDY" option, because with just UNGREEDY (512) it matches "hee hee\nhi" instead of "hee\nhi". But the fault is mine, I probably just need to get deeper knowledge of regexps.

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

Cormullion:

How about adding this (improvements welcome ;-) ):

"Arguments enclosed in brackets [ and ] are optional. When arguments are separated by a vertical bar | then one of them must be chosen."

in section "2. Data types and names" of the manual:

http://www.newlisp.org/newlisp_manual.html#type_ids

as a third paragraph.

TedWalther:

many options in PCRE can also be expressed inside the regular expression pattern instead of a number. So this:

Code: Select all

(find "(?i)newlisp" "the newLISP lanuage" 0)
is the same as this:

Code: Select all

(find "newlisp" "the newLISP lanuage" 1)
people accustomed to regex in other languages may find this easier to read.

These are other option letters:

i for PCRE_CASELESS
m for PCRE_MULTILINE
s for PCRE_DOTALL
x for PCRE_EXTENDED

See also here for a complete reference:

http://www.newlisp.org/downloads/pcrepattern.html

Locked