regex again

For the Compleat Fan
Locked
cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

regex again

Post by cormullion »

Struggling again with these. I'm getting confused as to how many of the backslashes should be escaped. I'm trying to convert some like these from Perl, inside quotes:

Code: Select all

"^(([ ]{0,3}([*+-])[ \t]+)(?s:.+?) (\z|\n{2,}(?=\S)(?![ \t]*[*+-][ \t]+)))"
Is there a rough rule of thumb for when converting Perl regexen to newLISP -what should be escaped and what shouldn't?

jrh
Posts: 36
Joined: Mon Nov 14, 2005 9:54 pm
Location: Portland, Oregon

Re: regex again

Post by jrh »

cormullion wrote: Is there a rough rule of thumb for when converting Perl regexen to newLISP
Yes, don't. Use awk. It may be wordy but at least it is comprehensible.

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

What would be the benefit of using regexen via Awk rather than in PCRE via newLISP? Looks very much the same sort of thing to me...

jrh
Posts: 36
Joined: Mon Nov 14, 2005 9:54 pm
Location: Portland, Oregon

Post by jrh »

With awk you can break it up into program steps. Better yet is to get rid of all those awful special characters in regular expressions. Here is a discussion of a possible LISPy way out of Perl/regex hell:

http://c2.com/cgi/wiki?AlternativesToRegularExpressions

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

Hmm - some interesting musings there, but nothing practical or immediately useful. I'd rather struggle a bit with regular expression, imperfect though they may be, than completely fail in an attempt write a context-free grammar parser or whatever is needed.

HPW
Posts: 1390
Joined: Thu Sep 26, 2002 9:15 am
Location: Germany
Contact:

Post by HPW »

You may have a look at the regex-coach:

http://weitz.de/regex-coach/

(Written in lispworks common-lisp)
Hans-Peter

m i c h a e l
Posts: 394
Joined: Wed Apr 26, 2006 3:37 am
Location: Oregon, USA
Contact:

Post by m i c h a e l »

Hi cormullion!

Since I know you're using a Mac and Hans-Peter's recommendation runs on windows only, I'll mention my favorite regex helper: RegExhibt.

Image

This is the fourth one I've tried so far, and I think this regex helper is head and shoulders above the others.

Also, if you will be doing a lot of regexing (or cssing, htmling, or even javascripting), I highly recommend The VisiBone Browser Book. I used this heavily during the edit of the newLISP manual and while making the neglOOk website. When I think how useful it's been to me, I'm sorry I didn't mention it earlier!

m i c h a e l

Jeff
Posts: 604
Joined: Sat Apr 07, 2007 2:23 pm
Location: Ohio
Contact:

Post by Jeff »

There are two-inch-thick books on the differences between the different regex flavors. PCRE is something of a standard, and that's the library newLISP uses. newLISP has better regex support than most other languages. The only thing it lacks is named groups, which I think were invented for Python and are not standard; but it does have recursion, I believe, using ?R syntax.

I usually put regular expressions between {} so I don't have to double-escape. Your expression should be able to be used verbatim between [text] tags (since it uses curly braces, curly braces would need to be escaped).
Jeff
=====
Old programmers don't die. They just parse on...

Artful code

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

thanks - that's the sort of help i'd been hoping for!

rickyboy
Posts: 607
Joined: Fri Apr 08, 2005 7:13 pm
Location: Front Royal, Virginia

Re: regex again

Post by rickyboy »

cormullion wrote:Struggling again with these. I'm getting confused as to how many of the backslashes should be escaped. I'm trying to convert some like these from Perl, inside quotes:

Code: Select all

"^(([ ]{0,3}([*+-])[ \t]+)(?s:.+?) (\z|\n{2,}(?=\S)(?![ \t]*[*+-][ \t]+)))"
Holy Schneikees! What is *that* supposed to do? :-)

Seriously, whenever I write monstrosities like that, either I put a massive amount of comments explaining what each piece does, or I break up the regex string into smaller strings and assign them to meaningful symbol names (if only for my own benefit when I look at the code more than a week later). But you probably got this from non-cormullion code, right?

Sorry I can't help -- I always have to look it up in a book. I was doing OK until I got to (?s:.+?). :-)
(λx. x x) (λx. x x)

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

Yes, horrible things. I think it was Markdown lists or headers I was trying to match. It's those positive and negative lookaheads that hurt my brain the most.

I had originally hoped to just copy these regexes without looking 'inside them', but they didn't work first time... And then you start tinkering with them, adding backslashes etc. and it all starts to come unstuck ;-(

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

Jeff wrote:Your expression should be able to be used verbatim between [text] tags (since it uses curly braces, curly braces would need to be escaped).
It's possible that curly braces might work - the manual quoth "Balanced nested curly brackets may be used within a string. This aids in writing regular expressions or short sections of HTML."

I might investigate, when I'm feeling more optimistic about regexes...

Thanks Jeff!

HPW
Posts: 1390
Joined: Thu Sep 26, 2002 9:15 am
Location: Germany
Contact:

Post by HPW »

Another WIN-only tool for regex:

http://www.regexbuddy.com

Not free but with 29.95 € affordable.
Very easy to make and analyse a regex.

Also it's big brother powergrep is worth a look. (But not cheap)
Hans-Peter

jrh
Posts: 36
Joined: Mon Nov 14, 2005 9:54 pm
Location: Portland, Oregon

Re: regex again

Post by jrh »

rickyboy wrote: Holy Schneikees! What is *that* supposed to do?
Only a nitwit would write something like that or spend time figuring out what it did. I must say that the excessive use of macros in LISP is another example of this syndrome. I don't care how rich he is, Paul Graham is dead wrong.

Clear, concise, and simple is what I strive for. Complexity for its own sake is horse shit.

HPW
Posts: 1390
Joined: Thu Sep 26, 2002 9:15 am
Location: Germany
Contact:

Post by HPW »

Only a nitwit would write something like that or spend time figuring out what it did.
Paste it in RegexBuddy and let explain it:

HTML:
http://hpwsoft.de/anmeldung/html1/newLI ... exPost.htm

Hardcopy:
http://hpwsoft.de/anmeldung/html1/newLI ... exPost.png
Hans-Peter

Jeff
Posts: 604
Joined: Sat Apr 07, 2007 2:23 pm
Location: Ohio
Contact:

Post by Jeff »

Syntax should be as clear as possible. Some algorithms cannot be expressed simply, because they are not simple algorithms.
Jeff
=====
Old programmers don't die. They just parse on...

Artful code

jrh
Posts: 36
Joined: Mon Nov 14, 2005 9:54 pm
Location: Portland, Oregon

Post by jrh »

HPW wrote: Paste it in RegexBuddy and let explain it:
I would have to get a M$ Windoze system in order to do that. That seems like going from bad to worse! ;-)

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

That's neat, HP - almost makes it look sensible...!

There are indeed plenty of nitwits who write stuff like that - I just wanted some shortcuts to avoid having to be one of them!

jrh
Posts: 36
Joined: Mon Nov 14, 2005 9:54 pm
Location: Portland, Oregon

Post by jrh »

Jeff snipilly wrote:Syntax should be as clear as possible. Some algorithms cannot be expressed simply, because they are not simple algorithms.
They can however be broken up into simpler pieces, it's called modular programming. Regexes suck because they 1) don't do that and 2) use such terse symbols that one need constantly double escape stuff in order to accomplish the string search.

Hello... we're not using 115 baud teletype machines any longer. The apparent need for robots to decode regex expressions proves that the Perl and regex coders of complex expressions aren't clever, they are nitwits.

rickyboy
Posts: 607
Joined: Fri Apr 08, 2005 7:13 pm
Location: Front Royal, Virginia

Re: regex again

Post by rickyboy »

jrh wrote:Only a nitwit would write something like that or spend time figuring out what it did.
Ha, ha, ha, ha! Good one.
jrh wrote:Clear, concise, and simple is what I strive for. Complexity for its own sake is horse shit.
Am I the only one who noticed that you emphasized *clear*? I quite agree with you here, and in precisely the order you listed the characteristics. Simplicity in expression is usually gotten to automatically after one strives first for clarity and economy.
(λx. x x) (λx. x x)

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

I love clarity, simplicity, and economy too...

Given a choice between using 'awk' and deciphering regex - um, can I have another choice? :-)

HPW
Posts: 1390
Joined: Thu Sep 26, 2002 9:15 am
Location: Germany
Contact:

Post by HPW »

There is a new companion-product of Regexbuddy:

http://www.regexmagic.com/

This is not a regex editor, it seems to be a generator.
I will have a closer look.
Hans-Peter

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

heeee cormullion..I completly missed your enhancement on this tool..

need to download that right away... A nice visual addon for my
"Perl 5 Desktop Reference pocket from 1996" ;-) I just recovered
from shelf, hidden behind a Tcl-Pedia-Book the size of a b/w-TV
(a good place to hide btw.. being a pocket containing Perl..)..
-- (define? (Cornflakes))

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Post by newdep »

.. Only 19.95 euro for a regEx tool..

Great.. now if they only would remove the regex-part from it,
which is clearly simply with that tool,i would be pleased to pay
'Only 19.95 euro' for it..

Meanwhile im doing, and will be for the next week,
some masogistic regex debugging I not happy about..
-- (define? (Cornflakes))

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

you mean http://unbalanced-parentheses.nfshost.c ... t.src.html ? I suppose that still works?!

Locked