newLISP Development Release v.10.5.5

Lutz · Post by **Lutz** » Wed Nov 20, 2013 2:39 pm

This development release has small improvements and bug fixes in several areas.

File and release notes: http://www.newlisp.org/downloads/development/

hartrock · Post by **hartrock** » Thu Nov 21, 2013 12:34 am

There seems to be a memory leak in your apply implementation for arrays without using array-list: apply after removing array-list leads to continuously increasing mem usage in some code of mine.

From diffing the code against 10.5.4 I think, the culprit may be not popping additional list, if there is an array arg (see ';;->' markers):

Code: Select all

CELL * p_apply(CELL * params)
{
CELL * expr;
CELL * args;
CELL * cell;
CELL * result;
CELL * func;
ssize_t count, cnt;
UINT * resultIdxSave;

func = evaluateExpression(params);

cell = copyCell(func);
expr = makeCell(CELL_EXPRESSION, (UINT)cell);
params = getEvalDefault(params->next, &args);

if(args->type == CELL_ARRAY)
        pushResult(args = arrayList(args, FALSE));
;;-> here a list will be generated, but ..
if(args->type != CELL_EXPRESSION)
    {
    if(isNil(args))
        {
        pushResult(expr);
        return(copyCell(evaluateExpression(expr)));
        }
    else
        return(errorProcExt(ERR_LIST_EXPECTED, args));
    }

if(params != nilCell)
    getInteger(params, (UINT *)&count);
else count = -1;
if(count < 2) count = MAX_LONG;

resultIdxSave = resultStackIdx + 2;
;;-> .. is this taken into account here?
...

Just an idea seeing this code the first time...

Lutz · Post by **Lutz** » Thu Nov 21, 2013 1:11 am

There is no cell leak or memory leak :)

array-list creates a new list but it gets popped off from the result stack in evaluateExpression() after the function returns. The cleanupResults() happens only for the results from the individual apply operations when it had the reduce parameter during the execution of apply. This is necessary to avoid result stack overflow on long lists or arrays. All other minor result stack cleanup is done after function return.

Here is the proof that there is no cell leak:

Code: Select all

> (set 'A (array 5 '(1 2 3 4 5)))
(1 2 3 4 5)
> (apply + A) (sys-info)
15
(441 576460752303423488 408 1 0 2048 0 16648 10505 1411)
> (apply + A) (sys-info)
15
(441 576460752303423488 408 1 0 2048 0 16648 10505 1411)
> (apply + A) (sys-info)
15

and you do a:

Code: Select all

(time (apply + A) 1000000)

and check activity monitor on OSX or taskmgr on Windows for memory consumption.

hartrock · Post by **hartrock** » Thu Nov 21, 2013 1:52 am

I can confirm, that running your example gives no mem leak.

On the other side I trust the 'top' command, if running two variants -- one with, one without array-list (no other changes) -- of code from me:

Code: Select all

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM     TIME+ COMMAND           
21814 sr        20   0 22260 3308 1144 R  99.4  0.2   2:46.76 newlisp           
21815 sr        20   0 26612 7796 1144 R  99.1  0.4   2:45.19 newlisp

The code is too long for posting it as an example: I try to reproduce the effect with minimal code (stay tuned).

Lutz · Post by **Lutz** » Thu Nov 21, 2013 2:19 am

Things are a bit more complicated. On longer lists or arrays you may see an initial rise in memory consumption, but then it stays stable even if the same operation is repeated millions of time. The reason being that both, newLISP an the OS retain / reserve memory although freed and not needed anymore.

newLISP will retain some memory allocated for cells for speed reasons. Although cell count is going down again, it returns cells to a free-cell queue to make future cell allocations faster. All other memory, which is not cells, gets freed immediately but may be reserved by the OS for that process to improve performance. E.g.: on Unix 'top' you may see that an initial memory increase gradually disappears after a while although newLISP is sitting idle.

The effects described are not specific to the apply function, but can be observed on many other operations.

hartrock · Post by **hartrock** » Thu Nov 21, 2013 2:20 am

Try this:

Code: Select all

(set 'a (array 10000 (sequence 0 10000)))
;;
(define (til-stats cont divisor)
  (let ((tilSize (/ (length cont) divisor))
        (tils (array divisor)))
    (dotimes
     (chunkIx divisor)
     (let (off (* chunkIx tilSize))
       (++ (tils chunkIx) (apply + (off tilSize cont)))))
    tils))
(define (centil-stats cont)
  (til-stats cont 10))
;;
(time (centil-stats a) 10000)

This triggers a mem leak increasing very fast.

hartrock · Post by **hartrock** » Thu Nov 21, 2013 2:41 am

More condensed (triggering the same problem):

Code: Select all

(set 'a (array 10000 (sequence 0 10000)))
(time (apply + (0 1000 a)) 10000)

Note: 10000 times may be over too fast.

hartrock · Post by **hartrock** » Thu Nov 21, 2013 2:55 am

Lutz wrote:There is no cell leak or memory leak :)

array-list creates a new list but it gets popped off from the result stack in evaluateExpression() after the function returns. The cleanupResults() happens only for the results from the individual apply operations when it had the reduce parameter during the execution of apply. This is necessary to avoid result stack overflow on long lists or arrays. All other minor result stack cleanup is done after function return.

Thanks for the explanation: this is good for understanding interpreter implementation.

But there is a leak somewhere...

Lutz · Post by **Lutz** » Thu Nov 21, 2013 3:15 am

Yes, I can see it too on your last example, will investigate.

Lutz · Post by **Lutz** » Thu Nov 21, 2013 3:50 am

turned out to be the resultIdxSave as you suspected - should be +3.

Code: Select all

resultIdxSave = resultStackIdx + 2 + isArray;

fixed here: http://www.newlisp.org/downloads/develo ... nprogress/

Lutz · Post by **Lutz** » Thu Nov 21, 2013 5:31 pm

Turns out this bug depends on the compiler used. On Mac OS X 10.9 Maverick it doesn't matter if using 10.5.5 without the fix or with the fix. In both cases all examples will not produce the leaks. OS X 10.9 Maverick seems to use the Clang compiler and not the original gcc.

When compiled for OS X 10.6 Leopard, Ubuntu 13.04 or Windows XP, the leak can be shown and cannot be fixed.

For now I will take out apply for arrays - map on arrays seems to be ok on all platforms.

hartrock · Post by **hartrock** » Fri Nov 22, 2013 12:49 am

After some trial and error I've found a solution for apply, which works for me.

Try this at the begin of p_apply():

Code: Select all

CELL * p_apply(CELL * params)
{
CELL * expr;
CELL * args;
CELL * cell;
CELL * result;
CELL * func;
ssize_t count, cnt;
UINT * resultIdxSave;
CELL * tmp;

func = evaluateExpression(params);

cell = copyCell(func);
expr = makeCell(CELL_EXPRESSION, (UINT)cell);
params = getEvalDefault(params->next, &args);

 if(args->type == CELL_ARRAY) {
   /* pushResult(args = arrayList(args, FALSE)); */
   /* pushResult((args = (arrayList(args, FALSE)))); */ /* does not work, too */
   /* but this is OK */
   tmp = arrayList(args, FALSE);
   pushResult(args = tmp);
 }
;; ...

After finding the macro:

Code: Select all

newlisp.h:#define pushResult(A) (*(++resultStackIdx) = (UINT)(A))

, which will be expanded to:

Code: Select all

(*(++resultStackIdx) = (unsigned long)(args = arrayList(args, 0)));

and reading some discussions about C sequence points related to assignments, I nevertheless think, that this should work. But possibly the compilers are pushing the old value of args onto the stack.

Another topic:
What remains unclear to me: when/how will the args pointer, which will be overwritten by the arrayList generated list, be free'd?

Lutz · Post by **Lutz** » Fri Nov 22, 2013 3:17 pm

Thanks, yes it is definitely a sequence point issue. The following works on all platforms and is the way it is coded now:

Code: Select all

if(args->type == CELL_ARRAY)
    {
    args = arrayList(args, FALSE);
    pushResult(args);
    }

This is the way I had it coded first and tested everywhere. Then changed it to the short form which still worked on my OSX 10.9 Clang based development system. But would break on gcc based distribution compiles and was not tested again :(

To your question about memory management of the stuff pointed to by args pointer:

The original address in *args points to the result of evaluating the parameter given to apply. This is a copy of the the first 1000 elements in array a. This piece of memory already has its address pushed on the result stack by the previous (0 1000 a) evaluation process. When apply returns, that portion of the result stack will be processed and memory freed.

In newLISP each function creating new memory objects must decide right away what happens to that memory in the future. It either has to be deleted before exiting the function, or a pointer to it has to be pushed on the result stack or it has to be assigned to a symbol (deleting its old contents) or inserted into an existing structure.

Many times when making copies of cells, they are already on the result stack and then popped off and taken. Many cell copies you see in the code, never really happen.

newLISP's synchronous memory management is the reason that newLISP is fast, small and uses little resources compared to other scripting languages. There are no unexpected garbage collection cycles taking away processing time at unexpected moments.

hartrock · Post by **hartrock** » Fri Nov 22, 2013 7:40 pm

Lutz wrote:Thanks, yes it is definitely a sequence point issue. The following works on all platforms and is the way it is coded now:
Code: Select all
if(args->type == CELL_ARRAY)
    {
    args = arrayList(args, FALSE);
    pushResult(args);
    }
This is the way I had it coded first and tested everywhere. Then changed it to the short form which still worked on my OSX 10.9 Clang based development system. But would break on gcc based distribution compiles and was not tested again :(

Where do you see the undefined behavior regarding sequence points? I've tried to find it in

hartrock wrote:
Code: Select all
(*(++resultStackIdx) = (unsigned long)(args = arrayList(args, 0)));

with no success (so far).
From what I've read, it seems to be, that (at least near) return from a function is a sequence point: this is the reason why I think, the assignments should work from right to left after calling the function(and they must not use the old value of args). My understanding may be wrong here, though (it's not the easiest topic).
This topic is of practical interest for C programmers in sense of 'how to code and what to avoid' rules.
If there is no undefined behavior, then gcc has a bug here, which would be of interest, too.

Lutz wrote: To your question about memory management of the stuff pointed to by args pointer:
...

Thanks for your explanations: this helps in understanding the workings of the interpreter.

Lutz · Post by **Lutz** » Fri Nov 22, 2013 9:14 pm

Looking at the pointer values passed and assigned and the resultStackIdx, I see that the sequence of assignments is correct. First: args = arrayList(args, FALSE) second: *(++resultStackIdx) = args

But incrementing resultStackIdx only works the first time, the second time entering the apply statement, resultStackIdx doesn’t increment and is stuck at its old value forever, no matter how often I enter the apply statement. Now the old pointer is overwritten again and again and has never a chance to get processed by popResult() higher up in evaluateExpression() for freeing cells and memory.

So it seems not to be a sequence point problem, but a problem with the ++ operator in gcc but only under certain circumstances.

Googling for: problem with ++ operator in gcc , I get a few links about this.

Ps: my name appearing in red color is a bug in the forum configuration and Ryon is trying to chase it down.

hartrock · Post by **hartrock** » Sat Nov 23, 2013 5:55 am

[update2] gcc is not guilty; see reply below. Wrong code example has led me to the problem...
[update] Wrong code example removed.

Note:
It seems to be difficult to create an example with minimal code to trigger this bug (have tried a while); so I don't know how to bug report this to the gcc maintainers (any idea?).

[update2] No wonder this is difficult, if there is no bug...

johu · Post by **johu** » Sun Nov 24, 2013 12:03 am

Hello, Lutz.

I finished the translartion of v.10.5.5's manual.

newlisp_manual-10505

And in 10.5.5's manual,

line 2796

to the name-space which owns the that symbol.</p>

line 9160

but probabilities are compounded using the Chi² method.

Are these OK?

Regards,

Lutz · Post by **Lutz** » Sun Nov 24, 2013 12:45 am

Thanks Johu, there are more changes coming in the 10.5.6 manual.

http://www.newlisp.org/downloads/develo ... nprogress/

hartrock · Post by **hartrock** » Sun Nov 24, 2013 10:47 am

Lutz wrote:Looking at the pointer values passed and assigned and the resultStackIdx, I see that the sequence of assignments is correct. First: args = arrayList(args, FALSE) second: *(++resultStackIdx) = args

But incrementing resultStackIdx only works the first time, the second time entering the apply statement, resultStackIdx doesn’t increment and is stuck at its old value forever, no matter how often I enter the apply statement. Now the old pointer is overwritten again and again and has never a chance to get processed by popResult() higher up in evaluateExpression() for freeing cells and memory.

So it seems not to be a sequence point problem, but a problem with the ++ operator in gcc but only under certain circumstances.

It's not a gcc problem.
The macro:

Code: Select all

newlisp.h:#define pushResult(A) (*(++resultStackIdx) = (UINT)(A))

, called as:

Code: Select all

pushResult(args = arrayList(args, FALSE));

, which will be expanded to:

Code: Select all

(*(++resultStackIdx) = (unsigned long)(args = arrayList(args, 0)));

, has side effects! As far as I know the compiler need not assume this possibility, so it can freely ++resultStackIdx before calling arrayList().
Problem is, that resultStackIdx will be changed by both ++resultStackIdx and calling arrayList(). Call chain is arrayList() -> copyCell() -> popResult() (last one changes resultStackIdx).
So it depends on the sequence of changing resultStackIdx what happens: moving arrayList() call into an extra statement before ++resultStackIdx, enforces the correct one (like already done).

As a rule it could be stated: safest is to pushResult() only a pointer already computed (it's not visible which call could have side effects to global resultStackIdx, and there may be a change in called function later). A single developer may break this rule, if he knows, that there are and will be no side effects from a call ;-)

newlispfanclub.alh.net

newLISP Development Release v.10.5.5

newLISP Development Release v.10.5.5

mem leak [Re: newLISP Development Release v.10.5.5]

Re: newLISP Development Release v.10.5.5

Re: newLISP Development Release v.10.5.5

Re: newLISP Development Release v.10.5.5

Re: newLISP Development Release v.10.5.5

Re: newLISP Development Release v.10.5.5

Re: newLISP Development Release v.10.5.5

Re: newLISP Development Release v.10.5.5

Re: newLISP Development Release v.10.5.5

Re: newLISP Development Release v.10.5.5

[fix] apply mem leak

Re: newLISP Development Release v.10.5.5

Re: newLISP Development Release v.10.5.5

Re: newLISP Development Release v.10.5.5

not a gcc bug

Re: newLISP Development Release v.10.5.5

Re: newLISP Development Release v.10.5.5

gcc is not guilty [Re: newLISP Development Release v.10.5.5]