Page 1 of 1
Lots of memory
Posted: Fri Feb 17, 2006 5:44 pm
by alex
I have archive:
Code: Select all
E:\dict\enrufull>7za l enrufull.dic.dos.7z
7-Zip (A) 4.20 Copyright (c) 1999-2005 Igor Pavlov 2005-05-30
Listing archive: enrufull.dic.dos.7z
Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------
2006-02-15 19:15:18 ....A 37193160 10484024 enrufull.dic.dos
------------------- ----- ------------ ------------ ------------
37193160 10484024 1 files
I try unpack it into memory:
Code: Select all
(setq my-dict (exec "7za e -so E:\\dict\\enrufull\\enrufull.dic.dos.7z"))
(read-line) # "pause"
and in point "pause" my newlisp-program get more then 70 Mb of RAM
It is lots of memory(IMHO)
How I can avoid the problem?
Posted: Sat Feb 18, 2006 3:34 am
by Lutz
Can you show us the return list of the 'exec' statement? What is in 'my-dict' and what does '7za e -so' do?
The original file already has 30Mbyte, if '7za' is some kind of extraction utility than 70Mbyte with other overhead of putting every line in a list seems not too much, what is '7za e -so' for?
Lutz
Posted: Sat Feb 18, 2006 10:52 am
by alex
enrufull.dic.dos is big text file
command "7za e -so enrufull.dic.dos.7z" extract it to standard output
I think that problem may be in any "external" command.
Example 2:
I have text file eng1.txt (10119386 byte) and after command
(setq my-dict (exec "type eng.txt"))
before
(read-line)
my newlisp-program get near 20 Mb of RAM
It is twice size of eng1.txt :(
Posted: Sat Feb 18, 2006 2:32 pm
by Dmi
I'm not Lutz ;-) But it seems to be normal:
first memory allocation occurs when exec collects a program's output.
second - occurs when it's copied to new symbol my-dict.
After that, first piece of memory is internally freed and will be reused next time.
It just can't be released back to Windows.
Posted: Mon Feb 20, 2006 1:51 pm
by alex
It is not good, if it is so... :-(
Posted: Mon Feb 20, 2006 1:59 pm
by alex
Under Linux we have such problems, or no?
Posted: Mon Feb 20, 2006 6:48 pm
by Dmi
Hmm... As I can see, under linux newlisp doesn't free large memory allocations for symbols - just reuse it.
Also, as I can see, the memory, allocated by exec is completely freed.
So, I may be wrong about my previous descision...
But, occasionally, I just found another issue:
I load a very large file, consists of small (10 char) lines. After exec's output was parsed to a list (according to documentation) I got a large memory overhead, much more than two times, and I suspect, that it is caused by "list" internal structure. Your dictionary file may have similar issue.
Try with an exec statement that returns a small number of very long lines and compare memory results.
Posted: Mon Feb 20, 2006 7:10 pm
by Dmi
...oh my english isn't much clear :-)
I think the list's cell storage overhead is not an "issue" - just a logically attended behavior.
Posted: Tue Feb 21, 2006 10:05 am
by alex
I got files short.txt and long.txt
Code: Select all
(write-file "short.txt" (dup "aaaaaaaaa\n" 1000000))
(write-file "long.txt" (dup (append (dup "b" 1000000) "\n") 10) )
and tests
Code: Select all
(setq dict (exec "type long.txt")) #test1
#(setq dict (exec "type short.txt")) #test2
#(setq dict (parse (read-file "long.txt") "\n")) #test3
#(setq dict (parse (read-file "short.txt") "\n")) #test4
(read-line) # "pause"
(exit)
Amount of memory:
in test1 near 11Mb
in test2 near 60Mb
in test3 near 11Mb
in test4 near 60Mb
Conclusion: long list = big memory