Page 1 of 1
escaping characters problem
Posted: Sat Jan 20, 2007 12:56 pm
by cormullion
I'm processing some text so that various characters are replaced with alternative formulations. The problem is that the curly braces in the original text need to be replaced with \letteropenbrace{} but the backslashes need to be replaced with \letterbackslash{}, and I can't do both these operations independently.
One possible solution that occurred to me was to replace the '{' with some unique text that couldn't possibly occur in newLISP source code, then to replace it again once I'd finished the other characters. But is this the most efficient way of doing a sequence of changes? It looks inefficient as well, but perhaps it isn't.
Code: Select all
(set 'c to some string)
(set 'uuid1 (uuid))
(set 'uuid2 (uuid))
(replace "{" c uuid1)
(replace "}" c uuid2)
(replace {\} c {\letterbackslash{}})
(replace {$} c {\letterdollar{}} )
(replace {#} c {\letterhash{}})
(replace {!} c {\letterexclamationmark{}} )
(replace {|} c {\letterbar{}})
(replace {@} c {\letterat{}})
(replace {^} c {\letterhat{}})
(replace "%" c {\letterpercent{}} )
(replace "/" c {\letterslash{}} )
(replace "<" c {\letterless{}} )
(replace ">" c {\lettermore{}} )
(replace "~" c {\lettertilde{}} )
(replace "&" c {\letterampersand{}})
(replace "?" c {\letterquestionmark{}})
(replace "_" c {\letterunderscore{}})
(replace "’" c {\lettersinglequote{}})
; finally, replace the { and }
(replace uuid1 c {\letteropenbrace{}})
(replace uuid2 c {\letterclosebrace{}})
Is there a better solution?
Posted: Sat Jan 20, 2007 11:42 pm
by Lutz
I had the same problem when writing the Wiki and IDE software and then again when writing syntax.cgi. I came to the same solution, you are using:
(1) translate strings or characters into something else to protect them from change by 'replace'
(2) do the replacements
(3) translate the protected strings and characters back
'replace' is pretty fast, specially when using the raw string replacement without regular expressions. The only thing to consider is: perhaps using 'uuid' for each character is a bit expensive, because each uuid is 36 characters long, allthough it is a 100% safe solution, because a uuid is unique :), it all depends on the characteristics of your text.
What I do in the Wiki is using HTML coded characters:
Code: Select all
(replace "&" str "&")
(replace "<" str "<")
(replace ">" str ">")
(replace " " str " ")
in normal text these sequences are pretty much never occuring and at least in the Wiki or IDE program they proved to be a safe choice.
Lutz
Posted: Sun Jan 21, 2007 12:56 pm
by cormullion
using HTML is good idea - although as soon as you write it, of course, you realise that it's an example of the sort of example that will break your script.
Or to put it another way, can the script process itself?
Everything's very quiet on the newLISP front recently? Is everybody switching to Lua? .... :-/
Posted: Sun Jan 21, 2007 1:45 pm
by HPW
Everything's very quiet on the newLISP front recently? Is everybody switching to Lua? .... :-/
I would not call it quiet, since 9.1 is on the horizon now.
No reason to switch anywhere, simply using this great language!
As Paul Graham states: 'Try to solve hard problems'.
(hpwNewLISP 2.19 gets more tightly integrated with neobook's 5.5.3 new privat variable feature.)
Improving things that are very good yet, gets more and more difficult!
;-)
Posted: Sun Jan 21, 2007 4:17 pm
by m i c h a e l
Hi Cormullion!
Cormullion wrote:Everything's very quiet on the newLISP front recently? Is everybody switching to Lua?
Perish the thought! Melissa and I have been hard at work on getting a version of neglOOk running on newLISP wiki. It is turning out quite awesome, if I may say so myself ;-) We hope to go live with it very soon.
Working with newLISP wiki has led me to an even greater appreciation of the inspiring work Lutz is doing on both the wiki and the language.
Lua is one of the languages I went some distance to learn. There are many things to commend in Lua, but one of them is not her speed of text processing using regular expressions (in my own experience, of course). This is the number one thing I end up using a language for.
Since I started programming in newLISP, I have not had a desire to try other languages. This has given me a chance to actually become more familiar with one language, while allowing me to see the logic (and the beauty) of the design that has gone into newLISP.
The wall is calling this flower back!
m i c h a e l
P.S. Who's Paul Graham? ;-)
Posted: Sun Jan 21, 2007 5:13 pm
by cormullion
Hi m i c h a e l - glad to see you around...! looking forward to neglooking at your site.
Don't get me wrong - I'm not saying anything negative about newLISP. It's partly that I was reading a bit about Lua recently (not really by choice, more because of luatex) and saw the page at
http://www.lua.org/uses.html - quite an impressive list of people who've built it into their products or applications. I'm no programmer, so I don't know what it takes to 'get a programming language into' a product, but I do know that sometimes I'd prefer use my own favourite language rather than have to learn another one, so wish that more people chose newLISP...
Also partly because it's been quiet anyway round here.
Posted: Mon Jan 22, 2007 8:51 am
by HPW
P.S. Who's Paul Graham?
The man who is promissing ARC! ;-)
When will that be? We have no idea. We reserve the right to take a very long time. It has been over 40 years since McCarthy first described Lisp. Another 2 or 3 aren't going to kill anyone. So please don't send us mail asking what Arc's status is or when it will be done. (When it's done, we'll tell you.)
I am glad that Lutz does not promise things, he make them alive!
Re: escaping characters problem
Posted: Mon Apr 09, 2012 10:52 pm
by William James
I'm processing some text so that various characters are replaced with alternative formulations. The problem is that the curly braces in the original text need to be replaced with \letteropenbrace{} but the backslashes need to be replaced with \letterbackslash{}, and I can't do both these operations independently.
One possible solution that occurred to me was to replace the '{' with some unique text that couldn't possibly occur in newLISP source code, then to replace it again once I'd finished the other characters. But is this the most efficient way of doing a sequence of changes? It looks inefficient as well, but perhaps it isn't.
I think that the best solution is to pass through the string only once. That way, the replacements won't be replaced.
Code: Select all
(define (translate ch)
(case ch
({\} {\letterbackslash{}})
({$} {\letterdollar{}} )
({#} {\letterhash{}})
({!} {\letterexclamationmark{}} )
({|} {\letterbar{}})
({@} {\letterat{}})
({^} {\letterhat{}})
("%" {\letterpercent{}} )
("/" {\letterslash{}} )
("<" {\letterless{}} )
(">" {\lettermore{}} )
("~" {\lettertilde{}} )
("&" {\letterampersand{}})
("?" {\letterquestionmark{}})
("_" {\letterunderscore{}})
("'" {\lettersinglequote{}})
("{" {\letteropenbrace{}})
("}" {\letterclosebrace{}})
(true ch)))
(setq text {A set {2 3 5 e f} of numbers & letters.})
(replace "." text (translate $0) 0)
Re: escaping characters problem
Posted: Tue Apr 10, 2012 9:13 pm
by TedWalther
Yes, that is the solution I came to as well.
Re: escaping characters problem
Posted: Fri Jul 06, 2012 5:26 am
by schilling.klaus
Does this approach also work for replacing unicode characters outside the ascii range with numerical html character references or the other way round?
Re: escaping characters problem
Posted: Fri Jul 06, 2012 6:51 pm
by Lutz
Yes, you can use unicode characters in regular expressions when using an UTF8 enabled version of newLISP:
Code: Select all
(replace "死" "abc死def死ghi" "生" 0) ;-> "abc生def生"
; utf8 same as using unicode
(replace "\u6b7b" "abc\u6b7bdef\u6b7b" "\u751F" 0) ;-> "abc生def生"
ps: you need a modern UTF8 enabled web browser and Chinese fonts to see this post correctly