Parallelism and newLISP

Excalibor · Post by **Excalibor** » Fri Jun 05, 2009 9:12 pm

Greetings!

I'm sure many of you have read Guy Steele's recent article about multicore, parallel programming...

If you have not, here's the link to the PDF file: http://groups.csail.mit.edu/mac/users/g ... Steele.pdf

The thing is, he splicitely mentions linked lists as being conceptually sequential and thus hardly parallelizable (this is, traditional LISP lists) and exposes some interesting tree proposals as parallelization strategies (not that I have been able to understand everything at a first glance, and I haven't been able to really read the article stopping where things get hairy).

Anyway... I was wondering how's newLISP ready for the multicore future (coincidentally this netbook of mine presents itself as a dual-core CPU, I think it's because Linux is treating hyperthreading as multicore... anyway, maybe Intel's Atom is dual core, I dunno, and I don't really care right now :-P ---)

This is just for the sake of discussion and planning... I did some useful scripts in newLISP at work and it may become my second tool for some more, after Perl, and some of those will do some really heavy string comparison magic, so I was wondering how to parallelize them to be able to take HP Blade's dual-core with hyperthreading presenting themselves as 4-core machines... :-)

I am thinking that I may divide the search space so I can launch 4 parallel processes at once, and thus saving, surely, lots of time designing the algorithms, but I thought that asking you guys may be very wise nevertheless... :-)

Anyway, thanks for your thoughts!

itistoday · Post by **itistoday** » Tue Jun 09, 2009 6:14 pm

I honestly don't think it's ready, at least not right now. Clojure seems to have definitively solved this problem though, being designed to address parallelism from the ground up.

Lutz · Post by **Lutz** » Tue Jun 09, 2009 6:54 pm

newLISP is very well suited for parallelism, it has support for processes and a Cilk like API (since v 10.0) for writing applications supporting parallelism and also has support for distributed applications with 'net-eval'.

See here:

http://www.newlisp.org/newlisp_manual.h ... processing

and here for an example:

http://www.newlisp.org/newlisp_manual.html#spawn

You could write something similar in Clojure using available Java libraries, but its not part of the language as in newLISP, where its already built-in. The same is true for distributed applications.

Excalibor · Post by **Excalibor** » Tue Jun 09, 2009 7:21 pm

Lutz,

Thanks for your reply. Yeah, I see the Cilk abilities as a huge step forward, and it's something I'd really like to esplore with calm and joy (which means I haven't been able to do anything less that a quick glance at them).

My question was more in the interpreter line. I mean, when I launch newlisp on my "dual"-core netbook, it's one of the processor that gets fired up, not both of them. I'd like to see how it behaves using spawn or Cilk, but I was wondering if there's anything (or the possibility) for the interpreter itself to see that a certain code can be launched in several cores at once and do so.

I'm afraid I cannot show any examples out of my head right now, but the same way an optimizer can unroll loops, or you can do tail call optimization if feasible (I mean, it can be done, not that newlisp does or should do) maybe there was someway to autoparallize some parts of a certain code automagically.

I don't really know how difficult this may be (I mean, I know it is a very hard problem, but not how hard it can be, either in general nor in newlisp in particular), but Steele's paper and what he said about a linked list being essentially sequential got me to thought if there's anything inherently sequential in newlisp, and if there's maybe anything (hashes? dunno....) parallelizable as well (as it uses red-black trees under the hood, and so on...)

It's not that it can't be done with the tools we already have, but if there's anything on the language and its implementation that's already (or about) to be ready for multicore... I'll leave multinode aside, as I understand distributed computing a different (although not necessarily by a large amount) a slightly different problem.

Of course, if the interfaces are high enough level, multicore and multinode can be just seeen as the same problem, but I was wondering if I have nodes A and B doing things, and each node has a multicore, how can messages from node A to B be fired up and use the available cores in B without previous knowledge of those multicores as new nodes.

Maybe I'm doing a simple question in a twisted way...

Anyway, you have reminded me of Cilk, and it's a good exercise for me to learn, so doubly thanks!

laters,

itistoday · Post by **itistoday** » Tue Jun 09, 2009 7:35 pm

Lutz wrote:You could write something similar in Clojure using available Java libraries, but its not part of the language as in newLISP, where its already built-in. The same is true for distributed applications.

Clojure takes a radically different approach to multithreaded programming, and it's built-into the language, no Java libraries needed or recommended. The language itself enforces programs that are multi-threaded safe.

With newLISP's Cilk api you still have the same issues with shared data as you do in most languages, Clojure is designed to efficiently and safely solve this problem with its transactional sync command and agents. It uses lock-less data structures to do it too (no mutex locks or semaphores).

See this video for more info. In it, it shows a short program written in Clojure that simulates an ant colony where each ant is its own thread, operating on a shared world that is drawn to the screen (the drawing operation is also in its own thread). No synchronization primitives such as locks or semaphores are used.

I'm fairly certain that something like that would be very difficult to do in newLISP, as it is in all other languages.

Lutz · Post by **Lutz** » Tue Jun 09, 2009 9:12 pm

You are right, I didn't know they avoid shared memory updating by completely avoiding mutable data when doing multi-threading. But so does newISP:

Reading quickly through the slides of Rich Hickey's presentation, it seems to me that his Agent model and the Cilk API are in fact very similar. These are some of the similarities.

- Concurrency:

Clojure: Actions are dispatched using send or send-off, which return immediately.

newLISP: Actions are dispatched using Cilk 'spawn', which also return immediately and don't mutate data in the caller either unless you want to do so at the safe sync point.

Clojure: Can coordinate with actions using await.

newLISP: Coordinates using Cilk 'sync' (with or without callbacks). No mutex locks or semaphores are involved. The original data are not changed.

newLISP is "thread safe" because it uses processes (not threads) which start with their own copy of the data. Because of newLISP's small memory footprint and start-up time, this is still pretty efficient, specially on Linux.

Clojures thread model perhaps supports a finer granularity than newLISP's model based on processes protected from each other. But newLISP's module is also more robust, in Clojure all threads run in the same JVM.

- Persistent data Sructures:

Clojure: Immutable, + old version of the collection is still available after 'changes'

newLISP: same here, after the sync, we have the old version of the data in the sender (spawn caller) and the new 'changed' versions from the returning agent(s).

itistoday · Post by **itistoday** » Tue Jun 09, 2009 9:32 pm

Lutz wrote:You are right, I didn't know they avoid shared memory updating by completely avoiding mutable data when doing multi-threading. But so does newISP:

Actually, Clojure *always* avoids mutable data, it is a pure functional programming language, whereas newLISP is not, and this means the two are fundamentally different.

Reading quickly through the slides of Rich Hickey's presentation, it seems to me that his Agent model and the Cilk API are in fact very similar. These are some of the similarities.

They are different; Agents are similar to Actors. Because they are Agents, you can send them multiple messages, you cannot do this with Cilk.

Clojure goes beyond Actors (hence the difference in naming) and even allows different agents to have *direct* access to their internal variables. Again, there is no counterpart in newLISP.

newLISP is "thread safe" because it uses processes (not threads) which start with their own copy of the data. Because of newLISP's small memory footprint and start-up time, this is still pretty efficient, specially on Linux.

You could do the same thing in a C program with threads. What makes Clojure unique is its treatment of shared data. With regards to accessing shared data, newLISP is not guaranteed to be thread-safe, whereas Clojure can guarantee that--at compile time.

Clojure has language support for a database like transactional model for accessing shared data (no SQL involved ;-}). This allows multiple threads to access and modify shared data safely and efficiently. Again, there is no counterpart in newLISP, or any other language (to my knowledge).