Why is the behavior of "trim" function so strange?

Q&A's, tips, howto's
Locked
psilwen
Posts: 21
Joined: Thu Jul 03, 2014 5:25 am

Why is the behavior of "trim" function so strange?

Post by psilwen »

Code: Select all

newLISP v.10.7.1 32-bit on Windows IPv4/6 UTF-8 libffi, options: newlisp -h

> (setq str (dup "\000" 10))
"\000\000\000\000\000\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 31))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 32))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim
(reverse "newlisp")

varbanov
Posts: 6
Joined: Mon Jul 01, 2013 1:33 pm
Location: Sofia, Bulgaria

Re: Why is the behavior of "trim" function so strange?

Post by varbanov »

Hi,

I'm still on a previous version and there's no problem ...

Code: Select all

newLISP v.10.6.2 32-bit on Win32 IPv4/6 UTF-8 libffi, options: newlisp -h

> (setq str (dup "\000" 32))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 52))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)
""
> 
Yours,
s.v.

CaveGuy
Posts: 112
Joined: Sun Oct 13, 2002 3:00 pm
Location: Columbus Ohio
Contact:

Re: Why is the behavior of "trim" function so strange?

Post by CaveGuy »

I am back from the future :) no problem there ?

Code: Select all

newLISP v.10.7.3 64-bit on Windows IPv4/6 libffi, options: newlisp -h

> (setq str (dup "\000" 52))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)
""
>
Bob the Caveguy aka Lord High Fixer.

psilwen
Posts: 21
Joined: Thu Jul 03, 2014 5:25 am

Re: Why is the behavior of "trim" function so strange?

Post by psilwen »

Code: Select all

newLISP v.10.7.3 32-bit on Windows IPv4/6 libffi, options: newlisp -h

> (setq str (dup "\000" 8))
"\000\000\000\000\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim
> (setq str (dup "\000" 1))
"\000"
> (trim str)
""
> (setq str (dup "\000" 2))
"\000\000"
> (trim str)
""
> (setq str (dup "\000" 3))
"\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 4))
"\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 5))
"\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 6))
"\000\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 7))
"\000\000\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 8))
"\000\000\000\000\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim
> (setq str (dup "\000" 9))
"\000\000\000\000\000\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim
> (setq str (dup "\000" 10))
"\000\000\000\000\000\000\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim
> (setq str (dup "\000" 11))
"\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim
> (setq str (dup "\000" 12))
"\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim
> (setq str (dup "\000" 13))
"\000\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim
> (setq str (dup "\000" 14))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim
> (setq str (dup "\000" 15))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim
> (setq str (dup "\000" 16))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim
> (setq str (dup "\000" 17))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim
> (setq str (dup "\000" 18))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 19))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 20))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0"
> (trim str)
""
> (setq str (dup "\000" 21))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000"
> (trim str)
""
> (setq str (dup "\000" 22))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000"
> (trim str)
""
> (setq str (dup "\000" 23))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 24))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 25))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 26))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 27))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 28))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 29))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 30))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 31))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 32))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim
> (setq str (dup "\000" 33))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim

Code: Select all

newLISP v.10.7.3 64-bit on Windows IPv4/6 libffi, options: newlisp -h

> (setq str (dup "\000" 8))
"\000\000\000\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 1))
"\000"
> (trim str)
""
> (setq str (dup "\000" 2))
"\000\000"
> (trim str)
""
> (setq str (dup "\000" 3))
"\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 4))
"\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 5))
"\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 6))
"\000\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 7))
"\000\000\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 8))
"\000\000\000\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 9))
"\000\000\000\000\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 10))
"\000\000\000\000\000\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 11))
"\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 12))
"\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 13))
"\000\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 14))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 15))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 16))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 17))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 18))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 19))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 20))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0"
> (trim str)
""
> (setq str (dup "\000" 21))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000"
> (trim str)
""
> (setq str (dup "\000" 22))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000"
> (trim str)
""
> (setq str (dup "\000" 23))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000"
> (trim str)
""
> (setq str (dup "\000" 24))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim
> (setq str (dup "\000" 25))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim
> (setq str (dup "\000" 26))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim
> (setq str (dup "\000" 27))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim
> (setq str (dup "\000" 28))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim
> (setq str (dup "\000" 29))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim
> (setq str (dup "\000" 30))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim
> (setq str (dup "\000" 31))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim
> (setq str (dup "\000" 32))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim
> (setq str (dup "\000" 33))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim
> (setq str (dup "\000" 34))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim
> (setq str (dup "\000" 35))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim
> (setq str (dup "\000" 36))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim
> (setq str (dup "\000" 37))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim
> (setq str (dup "\000" 38))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim
> (setq str (dup "\000" 39))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
> (trim str)

ERR: not enough memory in function trim
> (setq str (dup "\000" 40))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0"
> (trim str)
""
> (setq str (dup "\000" 41))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000"
> (trim str)
""
> (setq str (dup "\000" 42))
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\00
0\000\000"
> (trim str)
""
(reverse "newlisp")

psilwen
Posts: 21
Joined: Thu Jul 03, 2014 5:25 am

Re: Why is the behavior of "trim" function so strange?

Post by psilwen »

newLISP v.10.7.3
newLISP v.10.7.1
newLISP v.10.7.0
All have this issues.

Usually starting from 8, 16, 24, 32...., depending on the version.

But newLISP v.10.6.2,I tested from 1 to 65536, everything was fine.
(reverse "newlisp")

HPW
Posts: 1390
Joined: Thu Sep 26, 2002 9:15 am
Location: Germany
Contact:

Re: Why is the behavior of "trim" function so strange?

Post by HPW »

Hello,

trim seems not to like the null-character.
So is it valid whitespace character?

Code: Select all

newLISP v.10.7.2 32-bit on Windows IPv4/6 libffi, options: newlisp -h
> (setq str "\t\000   \t")
"\t\000   \t"
> (trim str)

ERR: not enough memory in function trim

Code: Select all

> (setq str "\tabc\000   abc\t")
"\tabc\000   abc\t"
> (trim str)
"abc\000   abc"
>
Regards
Hans-Peter

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Re: Why is the behavior of "trim" function so strange?

Post by Lutz »

'trim' was overrunning memory when all characters in string < 32 (space).

Fixed here:
http://newlisp.nfshost.com/downloads/de ... nprogress/

HPW
Posts: 1390
Joined: Thu Sep 26, 2002 9:15 am
Location: Germany
Contact:

Re: Why is the behavior of "trim" function so strange?

Post by HPW »

Hello Lutz,

Thanks for the fix.
Will characters <32 now handled as whitespace and get trimmed?
Or do they stay?

Regards
Hans-Peter

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Re: Why is the behavior of "trim" function so strange?

Post by Lutz »

See last paragraph here:
http://www.newlisp.org/downloads/develo ... 10.7.3.txt

So the 10.7.3 now is more forgiving when a string contains zero characters, which it should'nt in perfect ASCII or UTF-8 where the zero character marks the end of the string.

psilwen
Posts: 21
Joined: Thu Jul 03, 2014 5:25 am

Re: Why is the behavior of "trim" function so strange?

Post by psilwen »

trim still has bug.

Code: Select all

newLISP v.10.7.3 32-bit on Windows IPv4/6 UTF-8 libffi, options: newlisp -h

> (setq str "\xbb\xe1")
"会"
> (length (trim str))
2
> (length (trim str "\000"))
5
> (setq str "\xce\xaa\xca\xb2\xc3\xb4\xbb\xe1")
"为什么会"
> (length (trim str))
8
> (length (trim str "\000"))
11
(reverse "newlisp")

TedWalther
Posts: 608
Joined: Mon Feb 05, 2007 1:04 am
Location: Abbotsford, BC
Contact:

Re: Why is the behavior of "trim" function so strange?

Post by TedWalther »

It occurs to me, if I was trimming a string, and it had \000 characters on either the left or the right side, I'd want them GONE. If it is a binary string where the \000 character belongs, then I wouldn't be running trim on it, since trim is a textual function.
Cavemen in bearskins invaded the ivory towers of Artificial Intelligence. Nine months later, they left with a baby named newLISP. The women of the ivory towers wept and wailed. "Abomination!" they cried.

ralph.ronnquist
Posts: 228
Joined: Mon Jun 02, 2014 1:40 am
Location: Melbourne, Australia

Re: Why is the behavior of "trim" function so strange?

Post by ralph.ronnquist »

Technically, since neither "\xbb\xe1" nor "\xce\xaa\xca\xb2\xc3\xb4\xbb\xe1" are valid UTF-8 strings, the trim behaviour is conveniently undefined. It appears the trim function expands "\xbb" into a 2-byte code, and "\xe1" into a three byte code, making its output valid UTF-8. ("\xce\xaa", "\xca\xb2" and "\xc3\xb4" are valid UTF-8 characters).

Locked