After doing some other newLISP related stuff back to this interesting discussion.
Lutz wrote:In Hartrock’s example the \000 characters are still in sb but text in [text],[/text] tags does not escape characters, so \000 is not shown, although part of sb.
This is OK for using these strings
inside newLISP; but it makes problems, if transferring their [text]...[/text] representation, because this omits part of the string (information loss).
There is such a use case: I'm in the process of writing a newLISP 'Inspector' server app for inspecting all newLISP symbols by a browser (planned to publish it in a while). Therefore all strings will be transferred from server to browser (via JSON) by using their :source representation.
Currently there is (using newlisp-10.6.4.tgz 2015-08-31):
- OK, no information loss -, and
Code: Select all
[unprintable chars] [text]this s a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is a testthis is ...
- not optimal, because some parts of the string are
missing -
as part of info in browser window.
Lutz wrote:Right now in 10.6.4 (in progress) base64 encoding is used for strings > 2047 characters and containing the [/text] tag when using save. I could drop the condition for [/text] and always use base64 for strings longer 2047 in the save function. That would make Hartrock’s sb string usable for the save function.
TedWalther wrote:The base64 solution bothers me, because it breaks the human readability of the (save) output.
Inspecting a newLISP system for development/debugging purposes needs two things:
- accuracy, and
- readability.
For serving 1. the internal representation has to be shown, which even for short strings is not optimal regarding 2.; e.g.
Code: Select all
>
[text]
One line...
second line...
third line...
[/text]
"\nOne line...\nsecond line...\nthird line...\n"
: here the [text][/text] representation is more readable than its internal "..." representation (for the 'Inspector' app I'm preferring accuracy, if in doubt).
Lutz wrote:
But I don’t want to give up the [text],[/tags] to use completely unescaped text (not binary content). This is frequently used in web programming.
They are also very nice for showing loaded text files as symbol evaluations (*.html, *.txt, *.js, etc.).
Lutz wrote:
I also don’t want to eliminate the 2047 limit for “ quoted string for speed in processing.
Lutz wrote:
The need to display code > 2047 and containing non-displayable binary info is very rare.
For debugging purposes there is such a need.
Lutz wrote: The save now will work with binary contents too if it always uses base64 transformation on strings > 2047 characters.
This is bad for readability of - longer - text files mentioned above; just tried to encode
https://www.base64encode.org/:
; resulting in Base64 format (UTF-8):
Note: result longer as source.
Idea:
What about switching to base64 transformation (or another accurate variant) only then, if there are unprintable chars?
This would give readability in most cases, but would also provide accuracy in the rarer ones, too.
TedWalther wrote:
Lutz, how about this; when doing "save", if the string is longer than 2048 bytes, instead of converting it to [text], convert it to (list num num num) where each num is a byte, value in the range 0..255 Or else (string str1 str2...) where str1 is a 2048 byte string in "" representation, as is str2, up until all the content is represented.
I like the simplicity and readability (compared with base64 encoding) of this approach.
Idea:
To combine the best of all variants:
- "..." as now for short strings.
- [text]...[/text] for longer strings not containing unprintable chars.
- (string str1 str2 ...) encoding for longer strings containing a small amount of unprintable chars.
- base64 encoding for longer strings containing a significant amount of unprintable chars (binary data in strings would not be very readable in other representations, too).
But 4. only then - due to its unreadability (checking values of single bytes not possible) -, if it gives
real improvements regarding mem or speed or ? (possibly I'm missing an important point here) compared with 3. (when is this the case?).