escaping characters problem

For the Compleat Fan
Locked
cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

escaping characters problem

Post by cormullion »

I'm processing some text so that various characters are replaced with alternative formulations. The problem is that the curly braces in the original text need to be replaced with \letteropenbrace{} but the backslashes need to be replaced with \letterbackslash{}, and I can't do both these operations independently.

One possible solution that occurred to me was to replace the '{' with some unique text that couldn't possibly occur in newLISP source code, then to replace it again once I'd finished the other characters. But is this the most efficient way of doing a sequence of changes? It looks inefficient as well, but perhaps it isn't.

Code: Select all

(set 'c to some string)

(set 'uuid1 (uuid))
(set 'uuid2 (uuid))

(replace "{" c uuid1)
(replace "}" c uuid2)

(replace {\} c {\letterbackslash{}})
(replace {$} c {\letterdollar{}} )
(replace {#} c {\letterhash{}})
(replace {!} c {\letterexclamationmark{}} )
(replace {|} c {\letterbar{}})
(replace {@} c {\letterat{}})
(replace {^} c {\letterhat{}})
(replace "%" c {\letterpercent{}} )
(replace "/" c {\letterslash{}} )
(replace "<" c {\letterless{}} )
(replace ">" c {\lettermore{}} )
(replace "~" c {\lettertilde{}} )
(replace "&" c {\letterampersand{}})
(replace "?" c {\letterquestionmark{}})
(replace "_" c {\letterunderscore{}})
(replace "’" c {\lettersinglequote{}})
      
; finally, replace the { and } 
(replace uuid1 c {\letteropenbrace{}})
(replace uuid2 c {\letterclosebrace{}})
Is there a better solution?

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

I had the same problem when writing the Wiki and IDE software and then again when writing syntax.cgi. I came to the same solution, you are using:

(1) translate strings or characters into something else to protect them from change by 'replace'

(2) do the replacements

(3) translate the protected strings and characters back

'replace' is pretty fast, specially when using the raw string replacement without regular expressions. The only thing to consider is: perhaps using 'uuid' for each character is a bit expensive, because each uuid is 36 characters long, allthough it is a 100% safe solution, because a uuid is unique :), it all depends on the characteristics of your text.

What I do in the Wiki is using HTML coded characters:

Code: Select all

        (replace "&" str "&")
        (replace "<" str "<")
        (replace ">" str ">")
        (replace " " str "&nbsp;")
in normal text these sequences are pretty much never occuring and at least in the Wiki or IDE program they proved to be a safe choice.

Lutz

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

using HTML is good idea - although as soon as you write it, of course, you realise that it's an example of the sort of example that will break your script.

Or to put it another way, can the script process itself?

Everything's very quiet on the newLISP front recently? Is everybody switching to Lua? .... :-/

HPW
Posts: 1390
Joined: Thu Sep 26, 2002 9:15 am
Location: Germany
Contact:

Post by HPW »

Everything's very quiet on the newLISP front recently? Is everybody switching to Lua? .... :-/
I would not call it quiet, since 9.1 is on the horizon now.
No reason to switch anywhere, simply using this great language!
As Paul Graham states: 'Try to solve hard problems'.

(hpwNewLISP 2.19 gets more tightly integrated with neobook's 5.5.3 new privat variable feature.)

Improving things that are very good yet, gets more and more difficult!
;-)
Hans-Peter

m i c h a e l
Posts: 394
Joined: Wed Apr 26, 2006 3:37 am
Location: Oregon, USA
Contact:

Post by m i c h a e l »

Hi Cormullion!
Cormullion wrote:Everything's very quiet on the newLISP front recently? Is everybody switching to Lua?
Perish the thought! Melissa and I have been hard at work on getting a version of neglOOk running on newLISP wiki. It is turning out quite awesome, if I may say so myself ;-) We hope to go live with it very soon.

Working with newLISP wiki has led me to an even greater appreciation of the inspiring work Lutz is doing on both the wiki and the language.

Lua is one of the languages I went some distance to learn. There are many things to commend in Lua, but one of them is not her speed of text processing using regular expressions (in my own experience, of course). This is the number one thing I end up using a language for.

Since I started programming in newLISP, I have not had a desire to try other languages. This has given me a chance to actually become more familiar with one language, while allowing me to see the logic (and the beauty) of the design that has gone into newLISP.

The wall is calling this flower back!

m i c h a e l

P.S. Who's Paul Graham? ;-)

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

Hi m i c h a e l - glad to see you around...! looking forward to neglooking at your site.

Don't get me wrong - I'm not saying anything negative about newLISP. It's partly that I was reading a bit about Lua recently (not really by choice, more because of luatex) and saw the page at http://www.lua.org/uses.html - quite an impressive list of people who've built it into their products or applications. I'm no programmer, so I don't know what it takes to 'get a programming language into' a product, but I do know that sometimes I'd prefer use my own favourite language rather than have to learn another one, so wish that more people chose newLISP...

Also partly because it's been quiet anyway round here.

HPW
Posts: 1390
Joined: Thu Sep 26, 2002 9:15 am
Location: Germany
Contact:

Post by HPW »

P.S. Who's Paul Graham?
The man who is promissing ARC! ;-)
When will that be? We have no idea. We reserve the right to take a very long time. It has been over 40 years since McCarthy first described Lisp. Another 2 or 3 aren't going to kill anyone. So please don't send us mail asking what Arc's status is or when it will be done. (When it's done, we'll tell you.)
I am glad that Lutz does not promise things, he make them alive!
Hans-Peter

William James
Posts: 58
Joined: Sat Jun 10, 2006 5:34 am

Re: escaping characters problem

Post by William James »

I'm processing some text so that various characters are replaced with alternative formulations. The problem is that the curly braces in the original text need to be replaced with \letteropenbrace{} but the backslashes need to be replaced with \letterbackslash{}, and I can't do both these operations independently.

One possible solution that occurred to me was to replace the '{' with some unique text that couldn't possibly occur in newLISP source code, then to replace it again once I'd finished the other characters. But is this the most efficient way of doing a sequence of changes? It looks inefficient as well, but perhaps it isn't.
I think that the best solution is to pass through the string only once. That way, the replacements won't be replaced.

Code: Select all

(define (translate ch)
  (case ch
    ({\} {\letterbackslash{}})
    ({$} {\letterdollar{}} )
    ({#} {\letterhash{}})
    ({!} {\letterexclamationmark{}} )
    ({|} {\letterbar{}})
    ({@} {\letterat{}})
    ({^} {\letterhat{}})
    ("%" {\letterpercent{}} )
    ("/" {\letterslash{}} )
    ("<" {\letterless{}} )
    (">" {\lettermore{}} )
    ("~" {\lettertilde{}} )
    ("&" {\letterampersand{}})
    ("?" {\letterquestionmark{}})
    ("_" {\letterunderscore{}})
    ("'" {\lettersinglequote{}})
    ("{" {\letteropenbrace{}})
    ("}" {\letterclosebrace{}})
    (true ch)))

(setq text {A set {2 3 5 e f} of numbers & letters.})

(replace "." text (translate $0) 0)

TedWalther
Posts: 608
Joined: Mon Feb 05, 2007 1:04 am
Location: Abbotsford, BC
Contact:

Re: escaping characters problem

Post by TedWalther »

Yes, that is the solution I came to as well.
Cavemen in bearskins invaded the ivory towers of Artificial Intelligence. Nine months later, they left with a baby named newLISP. The women of the ivory towers wept and wailed. "Abomination!" they cried.

schilling.klaus
Posts: 7
Joined: Thu Jul 05, 2012 5:24 am

Re: escaping characters problem

Post by schilling.klaus »

Does this approach also work for replacing unicode characters outside the ascii range with numerical html character references or the other way round?

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Re: escaping characters problem

Post by Lutz »

Yes, you can use unicode characters in regular expressions when using an UTF8 enabled version of newLISP:

Code: Select all

(replace "死" "abc死def死ghi" "生" 0) ;-> "abc生def生"

; utf8 same as using unicode

(replace "\u6b7b" "abc\u6b7bdef\u6b7b" "\u751F" 0) ;-> "abc生def生"
ps: you need a modern UTF8 enabled web browser and Chinese fonts to see this post correctly

Locked