Page 1 of 1

using regular expression causes newlisp.exe terminated

Posted: Sat Mar 23, 2013 2:57 pm
by xmftlg
Files in attachment are test.lsp b.txt c.txt

test.lsp:

Code: Select all

(set 's (read-file "c.txt"))
(println (find-all {(?s)target=_blank>(?:(?!target=_blank>).)*?在线观看_百度视频}  s ) )

(exit
)
while b.txt and c.txt are actually html source code.

E:\newlisp>newlisp
newLISP v.10.4.7 on Win32 IPv4/6 UTF-8 libffi, execute 'newlisp -h' for options.

> (load "test.lsp")

And newlisp terminated abnormal.

change in test.lsp:

Code: Select all

(set 's (read-file "b.txt"))
E:\newlisp>newlisp
newLISP v.10.4.7 on Win32 IPv4/6 UTF-8 libffi, execute 'newlisp -h' for options.


> (load "test.lsp")
("target=_blank>銆?em>鍟﹀暒鍟﹀痉鐜涜タ浜?/em>銆嬪姩婕紙2瀛e叏锛夐珮娓呭湪绾
胯鐪媉鐧惧害瑙嗛")

Now see the correct string.

in utf8 env the string is :
("target=_blank>《<em>啦啦啦德玛西亚</em>》动漫(2季全)高清在线观看_百度视频")

Testing it in v10.4.5 is the same result.

D:\newlisp>newlisp
newLISP v.10.4.5 on Win32 IPv4/6 UTF-8 libffi, execute 'newlisp -h' for more inf
o.

> (load "test.lsp") ;;read c.txt

D:\newlisp>newlisp
newLISP v.10.4.5 on Win32 IPv4/6 UTF-8 libffi, execute 'newlisp -h' for more inf
o.

> (load "test.lsp") ;;read b.txt
("target=_blank>銆?em>鍟﹀暒鍟﹀痉鐜涜タ浜?/em>銆嬪姩婕紙2瀛e叏锛夐珮娓呭湪绾
胯鐪媉鐧惧害瑙嗛")

can anyone help?

Re: using of regular expression cause newlisp.exe terminated

Posted: Sat Mar 23, 2013 3:14 pm
by xmftlg
I also try to increase the newlisp stack like :

E:\newlisp>newlisp -s 100000 test.lsp

E:\newlisp>newlisp -s 1000000 test.lsp

but seems change nothing.

Re: using regular expression causes newlisp.exe terminated

Posted: Sun Mar 24, 2013 4:32 am
by Lutz
This is a problem in the PCRE library routines. See also here:
http://stackoverflow.com/questions/3613 ... lp-optimis

and here:
http://newlispfanclub.alh.net/forum/vie ... ash#p18722

On OSX this causes a crash, which occurs in pcre_exec(). It seems to have to do with nesting of HTML blocks.

Re: using regular expression causes newlisp.exe terminated

Posted: Tue Mar 26, 2013 2:42 pm
by xmftlg
Thanks Lutz.