newlispfanclub.alh.net

Posted: **Fri Feb 17, 2006 5:44 pm**

I have archive:

E:\dict\enrufull>7za l enrufull.dic.dos.7z
7-Zip (A) 4.20  Copyright (c) 1999-2005 Igor Pavlov  2005-05-30
Listing archive: enrufull.dic.dos.7z

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------
2006-02-15 19:15:18 ....A     37193160     10484024  enrufull.dic.dos
------------------- ----- ------------ ------------  ------------
                              37193160     10484024  1 files

I try unpack it into memory:

Code: Select all

(setq my-dict (exec "7za e -so E:\\dict\\enrufull\\enrufull.dic.dos.7z"))
(read-line)  # "pause"

and in point "pause" my newlisp-program get more then 70 Mb of RAM
It is lots of memory(IMHO)
How I can avoid the problem?

Posted: **Sat Feb 18, 2006 3:34 am**

Can you show us the return list of the 'exec' statement? What is in 'my-dict' and what does '7za e -so' do?

The original file already has 30Mbyte, if '7za' is some kind of extraction utility than 70Mbyte with other overhead of putting every line in a list seems not too much, what is '7za e -so' for?

Lutz

Posted: **Sat Feb 18, 2006 10:52 am**

enrufull.dic.dos is big text file
command "7za e -so enrufull.dic.dos.7z" extract it to standard output
I think that problem may be in any "external" command.
Example 2:
I have text file eng1.txt (10119386 byte) and after command
(setq my-dict (exec "type eng.txt"))
before
(read-line)
my newlisp-program get near 20 Mb of RAM
It is twice size of eng1.txt :(

Posted: **Sat Feb 18, 2006 2:32 pm**

I'm not Lutz ;-) But it seems to be normal:
first memory allocation occurs when exec collects a program's output.
second - occurs when it's copied to new symbol my-dict.
After that, first piece of memory is internally freed and will be reused next time.
It just can't be released back to Windows.

Posted: **Mon Feb 20, 2006 1:51 pm**

It is not good, if it is so... :-(

Posted: **Mon Feb 20, 2006 1:59 pm**

Under Linux we have such problems, or no?

Posted: **Mon Feb 20, 2006 6:48 pm**

Hmm... As I can see, under linux newlisp doesn't free large memory allocations for symbols - just reuse it.
Also, as I can see, the memory, allocated by exec is completely freed.
So, I may be wrong about my previous descision...

But, occasionally, I just found another issue:
I load a very large file, consists of small (10 char) lines. After exec's output was parsed to a list (according to documentation) I got a large memory overhead, much more than two times, and I suspect, that it is caused by "list" internal structure. Your dictionary file may have similar issue.

Try with an exec statement that returns a small number of very long lines and compare memory results.

Posted: **Mon Feb 20, 2006 7:10 pm**

...oh my english isn't much clear :-)
I think the list's cell storage overhead is not an "issue" - just a logically attended behavior.

Posted: **Tue Feb 21, 2006 10:05 am**

I got files short.txt and long.txt

Code: Select all

(write-file "short.txt" (dup "aaaaaaaaa\n" 1000000))
(write-file "long.txt" (dup (append (dup "b" 1000000) "\n") 10) )

and tests

Code: Select all

(setq dict (exec "type long.txt"))              #test1
#(setq dict (exec "type short.txt"))              #test2
#(setq dict (parse (read-file "long.txt") "\n"))  #test3
#(setq dict (parse (read-file "short.txt") "\n")) #test4
(read-line)  # "pause"
(exit)

Amount of memory:
in test1 near 11Mb
in test2 near 60Mb
in test3 near 11Mb
in test4 near 60Mb

Conclusion: long list = big memory

newlispfanclub.alh.net

Lots of memory

Lots of memory