Request for Closures - Contexts insufficient

Pondering the philosophy behind the language
Locked
itistoday
Posts: 429
Joined: Sun Dec 02, 2007 5:10 pm
Contact:

Request for Closures - Contexts insufficient

Post by itistoday »

This thread has taken a turn for the slightly different topic of "reference lists", see posts below..

Closures are a very useful thing, and unfortunately newLISP doesn't have support for real closures.

I'm aware of the existence of contexts, but they are, in my opinion, a poor replacement for closures. They solve some problems, but are not a proper solution for the sorts of problems that closures solve.

The main reason is that they must be named, and they cannot be created on-the-fly around code as closures can. This has a huge impact on the readability of code, and on what newLISP can and cannot do.

Problems with regards to namespace as described by Kazimir exist, and the solution to them is not contexts - it's closures.

An example of a problem that could be solved much better, and would result in faster performance, is the ability to memoize functions as shown in this thread on streams. The current solution is poor as it involves doing a lookup in a red-black tree, and makes it inefficient to memoize functions where the arguments can be switched (and still have the same result, i.e. (+ 1 2) vs (+ 2 1)).

Contexts have to be named. If a context with that name already exists somewhere, you get a conflict. You can solve this by numbering them, but that's just an ugly hack and pollutes the namespace and readability of code. Contexts cannot have contexts within them. Closures would fix all of these problems, and make newLISP far more appealing to many people.

Is there any possibility of this happening? Perhaps something similar, like "anonymous contexts"? Granted, I may be wrong on some things here (it's been a while since I looked into the stream stuff), but I think most of these reasons still stand. Thoughts?
Last edited by itistoday on Thu Apr 02, 2009 3:06 am, edited 4 times in total.
Get your Objective newLISP groove on.

Kazimir Majorinc
Posts: 388
Joined: Thu May 08, 2008 1:24 am
Location: Croatia
Contact:

Post by Kazimir Majorinc »

We have mutable functions. I think mutable functions can do everything as closures can, but not vice versa. Because, mutable functions can be changed from outside, not only "from inside."

Problem is that functions do not know their names, so, they cannot change themselves. Except if we define them that way:

(set 'mn1 (lambda((my-name 'mn1))(println "my name is " my-name)))
(mn1)


This is horrible weapon! This function can change itself, and it can change its name, and it can change other functions and their names.

We just need some practice to learn how to use it, eventually develop some interface that reminds on closures.

itistoday
Posts: 429
Joined: Sun Dec 02, 2007 5:10 pm
Contact:

Post by itistoday »

That does not provide the functionality of closures, and actually illustrates another problem of newLISP, that of dynamic scope, which means that in newLISP you're pretty much always dealing with the global scope.

It's almost as if you were to write C++ code and declare global variables all over the place and never use classes. People would look at you like you were crazy. This makes writing large projects in newLISP difficult, and that in turn puts off a lot of people to the language.

Really this is a problem of namespace and newLISP's Once-Only-Referencing. If newLISP supported lexical scope, that would be a huge step toward more adoption, and would increase the utility and power of the language.
Get your Objective newLISP groove on.

itistoday
Posts: 429
Joined: Sun Dec 02, 2007 5:10 pm
Contact:

Post by itistoday »

Would it not be possible for example, to define newLISP's "set" family of functions in such a way, such that whenever it's invoked, it checks to see if the symbol was previously referencing a closure, and then decrements the closure's ref-count?

Admittedly, I haven't thought this through very much, so you may treat me as if I'm speaking out of my ass, but could you do something like this:

Code: Select all

(set 'a (closure (b) (setf b 5)))
(a) ; => 5
(println b) ; => nil
(set 'a 5)
Upon the fourth line, the closure would be released from memory, not for the reasons it is now, but because its refcount would be decremented to 0. And you could also check for things such as closures referencing themselves to prevent situations where they'd stick around in memory forever. The basic idea though is that you could make anonymous closures, and whenever they would be passed around, it would always be by-reference (no copying).
Last edited by itistoday on Fri Mar 27, 2009 8:39 pm, edited 1 time in total.
Get your Objective newLISP groove on.

Kazimir Majorinc
Posts: 388
Joined: Thu May 08, 2008 1:24 am
Location: Croatia
Contact:

Post by Kazimir Majorinc »

itistoday, the function I described really can do everything closures can, one just needs to write some interface around it. That is why all these Newlisp metaprogramming features are here.

For lexical scope, perhaps you can use my genlet, genlocal, gensym, ... these give you functionality of lexical scope, and they still work with macros. Real lexical scope cannot work with Newlisp macros.

Code: Select all

(load "http://www.instprog.com/Instprog.default-library.lsp") 

(set 'outer
     (lambda-macro()
       (set 'i 1)))

(genlet((i 8))
      (outer i)
      (println i)) ;=> 8

itistoday
Posts: 429
Joined: Sun Dec 02, 2007 5:10 pm
Contact:

Post by itistoday »

Perhaps you can help me out, I'm trying to wrap my head around all of this but I'm always pressed for time... :-(

Can you define this in newLISP without using contexts:

Code: Select all

; Scheme code, written using newLISP names like 'setf' and 'fn':

(setf f (let ((var 0)) (fn () (setf var (+ 1 var)) var)))
(f) ; => 1
(f) ; => 2
Last edited by itistoday on Fri Mar 27, 2009 11:21 pm, edited 1 time in total.
Get your Objective newLISP groove on.

itistoday
Posts: 429
Joined: Sun Dec 02, 2007 5:10 pm
Contact:

Post by itistoday »

Thought about this whole thing some more, and I think what I'm trying to get at really doesn't have so much to do with closures as it does with the concept of an anonymous object that can be passed around without copying it.

In otherwords, a list that doesn't get copied. Is this possible to do in newLISP using the referencing scheme that I described above? I guess this question really goes to Lutz since he knows the internals of the language.

You could then do "FOOP", except now it wouldn't be "purely functional", you could have real objects in newLISP.

Closures would be nice too though, as they get rid of the need to use the letex function, and generating crazy symbol names.
Get your Objective newLISP groove on.

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

- Nested contexts: I have used nested namespaces in an earlier language project. RuleTalk (a kind of Prolog) could nest contexts to any to level you wished.

When starting newLISP I deliberately cut out nesting of namespaces. They introduced unnecessary complexity.

As a side note: The language also had tail recursion elimination because recursion was the main flow control mechanism. Limiting flow control to recursion and continuations was an unnatural limitation in newLISP, perhaps beautiful from a mathematician's point of view, but a pain to use and understand for people from other disciplines.

- Functions with state in newISP and state serialization:

Code: Select all

> (define (foo:foo) (inc foo:cnt))
(lambda () (inc foo:cnt))
> (foo)
1
> (foo)
2
> (foo)
3
> foo:cnt
3 
> (save "foo.lsp" 'foo)
true
> 
start a new newLISP session later:

Code: Select all

> (load "foo.lsp")
MAIN
> (foo)
4
> 
How would you do serialization (to a human readable file) in Scheme or Common Lisp? Without extra code this would not be possible.

see also here: http://www.newlisp.org/index.cgi?Closures
and here about RuleTalk: http://www.donlucio.net/index.cgi?page=Projects further down on the page.

itistoday
Posts: 429
Joined: Sun Dec 02, 2007 5:10 pm
Contact:

Post by itistoday »

In the post I made above I'm not really referring to nested contexts, but the concept of a list that can be passed by reference.

Don't you think that'd be a neat and useful addition to the language?

By using some low-level implementation of the 'set' function, could you use automatic reference counting to keep track of it?

Code: Select all

(set 'x (list-ref '(0 1 2))) ; reference count of one, because symbol of 'x'
(define (modify-list L) (setf (L 1) 2))
(modify-list x)
x ; => (0 2 2)
(setf x nil) ; => deallocates the list
You'd be able to do real OOP in newLISP then, with objects maintaining state, passing them around, keeping arrays of them, etc. and it would allow quickly passing around large lists without having to come up an entire context for them. There's probably some obvious memory-related issues to this that I'm forgetting, but perhaps it could be done?
Get your Objective newLISP groove on.

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

I realize this thread is now more about closures, OOP and reference passing, but nested context where mentioned in the first post and have been mentioned before by other posters, so I wanted to talk about it shortly.

Also some words about dynamic scoping and globals: If you mostly avoid globals (as most newLISP programmers do, except when writing short trivial scripts), then the rules of dynamic scoping in newLISP offer the same isolation as lexical scoping because their variable environment is only defined by the local function parameters and let/letn/letx/local/doxxxx/for -parameters. Variable capture is avoided by never using quoted symbols for reference passing, but using contexts.

But now to OOP and and reference counting:

Reference counting memory management algorithms take significant computing resources in memory and speed, and it is difficult to weave the work they perform smoothly with the main language VM process. There is no modern garbage collection algorithm, which doesn't display unexpected pauses in code execution.

ORO is based on the observation that for most memory objects created, you can already define their future life and point of deletion at the moment of creation. ORO performance is synchronous and predictable.

Lexical scoping in combination with garbage collection makes reference passing easier in a programming language. But there is reference passing and maintenance of state in newLISP too. It just works differently and demands a different programming style but is independent of traditional garbage collection and it's drawbacks.

By default newLISP does reference passing only in and out nested, built-in functions, but for user-defined functions passing copies is the default. To pass data by reference to user-defined functions, either newLISP macros or namespaces must be used.

There is no such things as "real OOP" there are just different ways to encapsulate data and behavior ;-) ... different flavors of OOP. What you call "an entire context" is just the small overhead of a symbol holding a symbol-tree pointer. The overhead of a namespace doesn't require more resources than the overhead of a lexical function closure. The following variation of your example adds minimal overhead for wrapping the data in a context and the modifying function can stay the same.

Code: Select all

> (set 'x:x '(0 1 2))
(0 1 2)
> (define (modify-list l) (setf (l 1) 2))
(lambda (l) (setf (l 1) 2))
> (modify-list x)
2
> x:x
(0 2 2)
> 
What you are looking for: anonymous objects maintaining state, collected in arrays and lists and automatically memory-managed, can all be accomplished with FOOP.

Since 10.0.1 FOOP objects can be held in default functors too for reference passing. If you look into real world applications, there are few data-objects big enough to really need reference passing in the first place.

FOOP is closer to many other OOP flavors than it looks at first. A FOOP object is a data object with a special pointer to the collection of methods which can be used for it. The same is true for many other OOP system implementations.

The "F" in FOOP illustrates the fact, that the data object is at the same time the constructor functional expression to create the object:

The first member of the data expression can be interpreted as either a pointer to the context holding the class methods applicable to the object or at the same time can be interpreted as the default functor of that class holding the constructor function for objects of that class. Evaluating the data expression is evaluating the constructor to make this data object.

This whole idea is captured in the following object constructor definition (built into newLISP since 10.0.0):

Code: Select all

(define (Class:Class) (cons (context) (args)))
Michael Michaels came up with this beautiful definition about a year ago and it basically defines all of FOOP! The only other thing necessary was the definition of the colon ":" operator implementing polymorphism which at that time existed as a newLISP macro and now is built-in natively. The colon operator takes the first part of the data objects to select a method from a specific class.

itistoday
Posts: 429
Joined: Sun Dec 02, 2007 5:10 pm
Contact:

Post by itistoday »

On FOOP, I'm aware of it and I do like it, but it doesn't allow certain common OOP paradigms, at least ones that are common in most other languages.

For example, if you have a "large object" (many methods and capabilities) that's used throughout your code by many entities, then quite often it will have methods that change it. However, you want these changes to persist with that object *itself*, so that when other parts of your code talk to that *specific* object, they get the latest changes.

With FOOP you would need to replace the entire object with a modified version of itself. And actually you wouldn't be able to have a "centralized object" that other objects talk to/reference. You can't do this sort of basic OOP in newLISP. The reason is because you can't have a reference to this object without devoting an entire context to it, without that, you'd have different copies of the object all over the place, all with different values.

And I say "entire context" not because of any performance issues, but because there is only one place where contexts live, and that is the global scope, and as such you're limited on available names. If you're project is huge, and has multiple such objects, then it puts a huge burden on the developer to make sure that any code that he uses, and all of the libraries that he includes, don't result in name conflicts between the contexts. They are unique, and that's a huge problem with newLISP.

Your code example for the 'x:x variable/context, is a perfect example of this issue. Who knows what could have been in 'x:x already? Who knows what sort of havoc you could have wreaked on some third-party library just by doing that? Multiply this by several thousand, and that's the situation that you have with a large project using newLISP (as I'm potentially considering embarking upon).

FOOP is neat, I'm not bashing it, I like it. But it's not a solution to all OOP problems, it restricts you to a minimal set of capabilities, and as I love the language you've created, I'm trying to argue for something that would improve it and also result in increased adoption (I think).

As for garbage collection, I'm honestly not sure if what I was saying about the "reference list" requires it. If you implemented it the way that I described it there would be no garbage collector. If you could guarantee that variables are set by only one or two functions, then all of the code to do memory management would be put into those one or two low-level "set" functions.

With one version of "set", it would increment/decrement the reference count (potentially resulting in the deallocation of the list), and another version of the "set" function would not affect the reference count at all, thereby allowing weak-references to these "reference lists".

I know that there may be other memory related issues here, and it's true that you would then be able to write newLISP code that could have memory leaks, but I think it's worth it because it would greatly improve the capabilities of the language.

These three things would greatly improve the capabilities of the language:

- Reference lists (would allow more OOP, faster execution, avoid need for contexts in many situations)
- Nested contexts (would make it easier to avoid name conflicts)
- Lexical "let"-type function (would allow lexical scope in newLISP, make it easier to write code that needs it, gets rid of need for confusing letex, makes solutions to certain types of problems faster)

The first two are probably more significant. Please let me know if I've gone wrong somewhere here, but so far I don't think that what I'm suggesting is crazy...
Get your Objective newLISP groove on.

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

If you're project is huge, and has multiple such objects, then it puts a huge burden on the developer to make sure that any code that he uses, and all of the libraries that he includes, don't result in name conflicts between the contexts.
Not at all: except for the names of other contexts, you are free to choose any new symbol names inside a namespace. You can even overwrite the global built-in primitives by prefixing them with the context name (but will not be able to serialize the context).

The context system has had a thorough workout in a programming team/startup three years ago and some subtle improvements in the context system were made during that time.

The group was a mixture of experienced and unexperienced programmers building a big, distributed sentence search engine. We never had the feeling that there were dangers to step on each others feet. Code was organized in namespace modules. Module names had a convention. There was one module per file and they where pre-declared in a central location. Each programmer had responsibility for his own modules, but a few modules had multiple authors. Development was controlled using Subversion (SVN) customized with newLISP syntax highlighting. We had a small QA team and people working from remote locations too.

newLISP dosn't want to and never will be a full-blown OO language with all the paradigms you find in C++ or Java. There is no inheritance to share or overwrite methods higher in a class hierarchy, but you can do simple mix-ins using 'new'. Its main paradigms are functional programming and namespaces. Over the years both paradigms have been fused together quite well (i.e. in FOOP).

To your last tree points (in different order):

I talked about my bad experience with nested contexts earlier in this thread (-> RuleTalk).

About reference counting: its not as trivial, as it looks at first. Here are few of the problems, you have to deal with when implementing such a system:

Reference counting GC is memory intense and leaves a lot of referenced cells in memory, which are never used; there are several experimental studies about this. There is an overhead of a reference counter in each cell. If the counter is not wide enough, it may overflow. In long-running systems (like our sentence search machine, mentioned above) there is always that danger. Reference garbage (non-zero counter, but not used) can only be avoided when knowing about the semantics of the program and analyzing it otherwise. Processing time has reported to be high: each time a symbol has a reassignment, the counters in old and new target cells have to be updated and checked for zero or overflow.

Reference counting is one of the older garbage collection systems and many optimizations have been developed to deal with it's disadvantages. All these have negative impact on performance and memory usage. The main optimization idea is, to exclude cells, which can be reclaimed early. This is a big part of what ORO does. So ORO does leverage some of the techniques developed as optimization for other garbage collection algorithms. The only advantage of ref-counting is easy interleaving with the main VM process.

The average, real world impact on value-copy passing versus reference passing is greatly exaggerated by most proponents of classic garbage collection. After doing measurements in newLISP done not only on constructed examples but also real application code, I realized that the issue could be neglected. Memory and performance saving of ORO more than compensate for the negatives of value-copy passing.

Although 10.0 introduced reference passing in and out of all built-in functions by default, the overall impact on speed (measured with the Debian language shootout suite) was only about 5%. Of course in isolated examples the difference can be thousands of % on bigger objects, but in a typical code mixture its little. When dealing with user-defined functions, which still do value-copy by default, use context-wrappers for reference passing. Typically these cases are rare (in my own code anyway) amd I have not seen naming confusions becuase of this.

Lexical 'let'-functions don't have any advantage over dynamically scoped 'let'-functions, unless they contain free variables and you want to have closures based on this. 'letex' is a whole different animal, it's main usage is merging macro-like expansion and evaluation into one built-in function.

I know that in principal you like newLISP and you are always among the first to defend it :-). Perhaps, if you change your newLISP programming style, adapting to the way newLISP does certain things, you will see, that it very well can be a programming-team language and that newLISP can deal with bigger systems too. Perhaps you will see that using the functional and namespace paradigms, problems can be solved in an elegant way; problems on which OO would have been applied traditionally.

The method of using dynamic scoping inside namespaces opens up a whole new world of little explored programming paradigms. Things like Aspect Oriented Programming and Programming for Separation of Concerns come to mind. I was invited to ILC-2009 and wanted to talk about these topics in relation to newLISP, but had to decline because of time and scheduling conflicts. There will be an article about this on newlisp.org one day. Namespaces and dynamic scoping gain new value when discussed from these angles.

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

Hey Lutz - I said you should start blogging! These would make excellent blog posts. :)

Also, I remember you saying somewhere that the old way of writing large monolithic programs wasn't newLISP's way; instead we should be thinking about small agile agents running in their dozens and hundreds on multiple computer nodes. That would be too much of a challenge for some of us (I'm a scripter, not a programmer :-), but perhaps another way of approaching the design of a large project...

Kazimir Majorinc
Posts: 388
Joined: Thu May 08, 2008 1:24 am
Location: Croatia
Contact:

Post by Kazimir Majorinc »

is there a chance for system variable, say $me that returns *reference* of the function or macro? (By reference I think on "any expression that evaluates to given function?")

Code: Select all

(set 'x (lambda() $me)
(x)  => x ; i.e. symbol x 
Or

Code: Select all

(setf (L 1) (lambda() $me)
(L (- 2 1)) => (L 1) ; i.e. list (L 1)
Maybe I asked that already but I forgot ...

itistoday
Posts: 429
Joined: Sun Dec 02, 2007 5:10 pm
Contact:

Post by itistoday »

Lutz wrote:
If you're project is huge, and has multiple such objects, then it puts a huge burden on the developer to make sure that any code that he uses, and all of the libraries that he includes, don't result in name conflicts between the contexts.
Not at all: except for the names of other contexts
That's exactly my point, it's the names of the contexts that you have to worry about, not the names of symbols. As I mentioned above, a simple and seemingly harmless line like this:

Code: Select all

> (set 'x:x '(0 1 2))
Can become a real liability if some third-party library uses 'x:x to do important work. It could lead to hard to find bugs and puts a burden on the developer and team to be very careful when using contexts. It means that each time you want to use a context to pass data by reference (and in a large project you'll do this all the time), you've got to make sure you're not using a context you shouldn't be.
newLISP dosn't want to and never will be a full-blown OO language with all the paradigms you find in C++ or Java. There is no inheritance to share or overwrite methods higher in a class hierarchy, but you can do simple mix-ins using 'new'. Its main paradigms are functional programming and namespaces. Over the years both paradigms have been fused together quite well (i.e. in FOOP).
I'm not calling for full-blown OOP, just reference lists, and I'm not even saying that all lists should be reference lists, there would be special functions to create them, perhaps a special syntax as well, for example, instead of the quote, you could use a backtick or something like that:

Code: Select all

(set 'a-ref-list `(0 1 2))
I talked about my bad experience with nested contexts earlier in this thread (-> RuleTalk).
Not in much detail though... They're really not that important though if reference lists are implemented.
About reference counting: its not as trivial, as it looks at first. Here are few of the problems, you have to deal with when implementing such a system:

Reference counting GC is memory intense and leaves a lot of referenced cells in memory, which are never used; there are several experimental studies about this. There is an overhead of a reference counter in each cell. If the counter is not wide enough, it may overflow. In long-running systems (like our sentence search machine, mentioned above) there is always that danger. Reference garbage (non-zero counter, but not used) can only be avoided when knowing about the semantics of the program and analyzing it otherwise. Processing time has reported to be high: each time a symbol has a reassignment, the counters in old and new target cells have to be updated and checked for zero or overflow.

Reference counting is one of the older garbage collection systems and many optimizations have been developed to deal with it's disadvantages. All these have negative impact on performance and memory usage. The main optimization idea is, to exclude cells, which can be reclaimed early. This is a big part of what ORO does. So ORO does leverage some of the techniques developed as optimization for other garbage collection algorithms. The only advantage of ref-counting is easy interleaving with the main VM process.
I have to admit here that I don't know enough on the subject to really put up a strong defense. However, based only on what you've said above, I'm still not convinced that it would impact performance or memory, or have any of the side-effects that you're describing.

First, there wouldn't be a GC since the reference counting is done in the 'set' function, therefore I would imagine you'd still have consistent execution times.

Second, I'm really skeptical when you say that it would impact performance. I bet that those studies were done on languages where everything was reference counted, which is quite different from what I'm proposing, which would be to have only a single special type that was reference counted. Your 'set' function would look (very roughly) like this:

Code: Select all

ptr_t set(ptr_t symbol, ptr_t value)
{
	if ( typeof(symbol) == TYPE_REFLIST ) {
		releaseRefList(symbol);
	}
	if ( typeof(value) == TYPE_REFLIST ) {
		retainRefList(value);
	}
	_set(symbol, value);
}

// set_weakRef defined here - does not release or retain
Seeing as this code would only be run on reference lists, I doubt it would impact performance at all, you'd still be able to write the same kinds of newLISP programs you can now, without ever using reference lists. That functionality would simply be there when you need it.
Get your Objective newLISP groove on.

Kazimir Majorinc
Posts: 388
Joined: Thu May 08, 2008 1:24 am
Location: Croatia
Contact:

Post by Kazimir Majorinc »

You can try this

Code: Select all

(set 'reflist74 '(0 1 2))
(set 'a-ref-list 'reflist74)
So, reflist74 is the reference. Code can be bit more complicated, because always some "dereference" is needed, for example,

Code: Select all

(push 3 (eval a-ref-list))
instead of

Code: Select all

(push 3 a-ref-list)
but it doesn't look like big deal. Advantage is that semantics of the language stays simple. With time, the functions like following one can be defined:

Code: Select all

(set 'PUSH (lambda (x L)(push x (eval L)))))
(PUSH 10 a-ref-list) =>(10 0 1 2)
Adding new type, especially if similar to existing, make language more complicated and less comprehensible. Every general purpose function should then take care about both cases, "reference" and "nonreference" lists.

itistoday
Posts: 429
Joined: Sun Dec 02, 2007 5:10 pm
Contact:

Post by itistoday »

Kazimir, here is a really simple, common example of why that's inferior to reference lists:

Say you have a system that keeps track of some sort of a "node" (a chunk of various information), for example, a list of users, or a list of nodes representing various locations on a map (cities, restaurants, etc.).

Now, you're developing a very complicated system that uses these nodes all over the place, and has hundreds of functions for manipulating them. But, you only have a single list of nodes in the system.

For example, let's stick with the "users" example and say our system is this forum. You do not want to have multiple copies of users all over the place. All functions are given a "global user" object, and when they modify that user, that change is global.

Sure, you could do this in newLISP by storing them in a "UsersContext", perhaps either as huge list of lists, or maybe a red-black tree that uses a userID to refer to the user, but then you are not able to temporarily de-couple the individual users from that list. In your example, you are forced to associate a list with a specific symbol because you can't use reference lists.

In this example of a forum, anytime a user needed to be modified, you would have to pass around the entire context to functions that normally would simply take the user itself, and then these functions would have to do a lookup on the context/list to modify the user. This could result in a significant performance hit, because you've got 10,000 users, and each lookup is *at best* O(log(n)) (red-black tree, using the userID as a key).

Plus, if you don't want all of your user-modifying functions to look like this:

Code: Select all

(define (modify-username user-list user-id new-name) ... )
Then you end up having to make sure that all functions make outgoing references to the UsersContext context, a *very* odd thing to do because it means that all of your functions are handicapped in that they can only be used to modify users that are stored in that particular context.

Thus, the lack of reference lists means that you're forced to write poor code that's unnecessarily slow too.

This is just a trivial example, I could give you thousands of others because the concept of "node" is so common in large projects.
Kazimir Majorinc wrote:Adding new type, especially if similar to existing, make language more complicated and less comprehensible.
I do not think so. Reference lists will behave exactly as normal lists do, it's just up to the programmer to make sure not to do something silly that could cause a memory leak, as many are already used to doing with languages like C, C++, and Objective-C. You could also add functions like 'ref-list?' so that if for some reason you needed to distinguish between a list and reflist, you could.
Every general purpose function should then take care about both cases, "reference" and "nonreference" lists.
They already distinguish amongst many other types, this wouldn't be difficult, and the two would be represented internally very similarly. I think the advantages of reference lists would far outweigh (as far as I can tell), any disadvantages.

Edit: One thing you could do, is generate 10,000 symbols to represent your users and store them in the list and pass them around doing the silly eval trick on them. This is not only poor programming style, but it would slow down the performance of the system greatly due to the constant eval function calls, not to mention because it would fill up symbol tree with 10,000 additional symbols (what if you had a million nodes??), slowing down every symbol lookup. And then you'd still have to make sure that nobody else accidentally uses your symbols for some different purpose (again, part of the poor programming practice).
Get your Objective newLISP groove on.

Kazimir Majorinc
Posts: 388
Joined: Thu May 08, 2008 1:24 am
Location: Croatia
Contact:

Post by Kazimir Majorinc »

Edit: One thing you could do, is generate 10,000 symbols to represent your users and store them in the list and pass them around doing the silly eval trick on them. This is not only poor programming style, but it would slow down the performance of the system greatly due to the constant eval function calls, not to mention because it would fill up symbol tree with 10,000 additional symbols (what if you had a million nodes??), slowing down every symbol lookup. And then you'd still have to make sure that nobody else accidentally uses your symbols for some different purpose (again, part of the poor programming practice).
Well, difference is probably that I think that it is not silly trick but adequate approach for Lisp. O(log(n)) is theoretically satisfactory, and in practice look up in the symbol table almost doesn't depend on size of table.

symbols: 1
expression: (begin (setf j0 1) (setf j0 1) (setf j0 1))
times: 1 000 000
total time: 316

symbols: 10
expression: (begin (setf j8 1) (setf j5 1) (setf j4 1))
times: 1 000 000
total time: 289

symbols: 100
expression: (begin (setf j35 1) (setf j89 1) (setf j82 1))
times: 1 000 000
total time: 288

symbols: 1000
expression: (begin (setf j746 1) (setf j174 1) (setf j858 1))
times: 1 000 000
total time: 319

symbols: 10000
expression: (begin (setf j7105 1) (setf j5135 1) (setf j3039 1))
times: 1 000 000
total time: 295

symbols: 100000
expression: (begin (setf j1498 1) (setf j9140 1) (setf j36445 1))
times: 1 000 000
total time: 326

symbols: 1000000
expression: (begin (setf j147312 1) (setf j165898 1) (setf j988525 1))
times: 1 000 000
total time: 321

symbols: 10000000
expression: (begin (setf j4456923 1) (setf j1190832 1) (setf j46693 1))
times: 1 000 000
total time: 297

itistoday
Posts: 429
Joined: Sun Dec 02, 2007 5:10 pm
Contact:

Post by itistoday »

Those are impressive benchmarks, I'm going to write some of my own benchmarks to verify these results as I suspect there may be some unreality to them, but if there's really virtually no performance hit then this would be an ugly (IMO) but viable way of doing things.

Right now I'm refreshing my newLISP skills (haven't programmed in it in a bit) by studying your genlet function (very impressive stuff), and then I'll be on to the benchmarks, so hopefully you'll hear from me soon. :-)
Get your Objective newLISP groove on.

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

Although Kazimir's deliberations about gensym etc. are enlightening and interesting because they reveal more about the inner workings and performance of newLISP, there is too much worry about variable capture in newLISP.

For the less initiated here some rules and explanations regarding variable capture and dangers perceived when there are none.

- there is no danger of variable capture when reusing variable names in nested functions, 'let' and nested loop expressions. All parameter variables are internally saved on an environment stack and restored after use. You can even do the following without danger:

Code: Select all

(dotimes (i 3) (println "->" i) 
    (dotimes (i 3) (println i)))
allthough the inner loop uses the same variable name than the outer loop, it is totally safe in newLISP and unsafe in many other languages.

The same is true for functions made with 'define'. Even the following somewhat crazy code will work flawlessly with 'let' reusing the the same variable names:

Code: Select all

(define (foo x y)
	(println x ":" y)
	(let ( (x (div x 2)) (y (div y 2)) )
		(println x "::" y))
	(println x ":" y))

> (foo 3 4)
3:4
1.5::2
3:4
4  ;  return value of foo
> 
)
the 'let' expression knows, that the left x is not the same as the right x. The left x is from the inner 'let' and the right x from the 'foo' parameter. After leaving 'let' everything is just like it was before. You also could call a function 'bar' with x and y parameter names from inside 'foo' or 'let' and nothing bad would ever happen.

When using normal functions made with 'define', variable capture can only occur when passing quoted symbols. Passing quoted symbols is something very rarely happening in newLISP anyway. If you have to do it, then either isolate that function in a context/namespace or use 'args' to retrieve the parameters. Both methods are safe against variable capture.

- the only real danger of variable capture occurs in fexpr's (define-macro in newLISP). Using something like 'gensym' and 'eval' to work around this will work, but you still affect performance and on repeated calling more and more symbols will be created by 'gensym' and waste memory. They could be deleted after use, but that would take even more CPU cycles. It is better to use 'args' and have no parameter variables at all or to enclose the function-macro and parameters in its own context:

Code: Select all

(define-macro (foo:foo foo:x foo:y)
	....
)

; or as an alternative

(context 'foo)
(define-macro (foo x y)
    ...
)
(context MAIN)

; and in either case you can call with the simple name

(foo ...)
'foo' is now a global macro (actually fexpr) and completely safe against variable capture, even when passing quoted symbols. There is no performance loss doing it this way, even when writing millions of macros.

- also any other function or macro (fexpr) in a module 'foo:this', 'foo:that' is completely safe against variable capture/confusion, even when passing quoted symbols.

- last not least: why does newLISP call fexprs macros? Because from a usage point of view they are used to accomplish the same thing: write functions which do not follow the usual parameter evaluation rules of evaluating all parameters at first but control parameter evaluation yourself using 'eval'. I find the application oriented definition more useful.

Kazimir Majorinc
Posts: 388
Joined: Thu May 08, 2008 1:24 am
Location: Croatia
Contact:

Post by Kazimir Majorinc »

I cannot but agree with Lutz ...

itistoday
Posts: 429
Joined: Sun Dec 02, 2007 5:10 pm
Contact:

Post by itistoday »

I disagree on several points:
Lutz wrote:- there is no danger of variable capture when reusing variable names in nested functions, 'let' and nested loop expressions. All parameter variables are internally saved on an environment stack and restored after use. You can even do the following without danger:

Code: Select all

(dotimes (i 3) (println "->" i) 
    (dotimes (i 3) (println i)))
allthough the inner loop uses the same variable name than the outer loop, it is totally safe in newLISP and unsafe in many other languages.
While I agree with you, I don't think anyone claimed that there was a danger of variable capture with normal functions. I also think that this is not only true for newLISP but for other LISPs as well, and other languages too (even C can do this). So there is nothing spectacular about the nested dotimes loop...

As you say, the real danger is with macros/fexprs, and in my studies I've discovered that this is doubly-so for newLISP (will provide further details on this below).
It is better to use 'args' and have no parameter variables at all or to enclose the function-macro and parameters in its own context:

Code: Select all

(define-macro (foo:foo foo:x foo:y)
	....
)

; or as an alternative

(context 'foo)
(define-macro (foo x y)
    ...
)
(context MAIN)

; and in either case you can call with the simple name

(foo ...)
'foo' is now a global macro (actually fexpr) and completely safe against variable capture, even when passing quoted symbols. There is no performance loss doing it this way, even when writing millions of macros.
Here I have to strongly disagree. The context solution that you present is not a real solution, it may be something to use when writing short, single-file programs in newLISP, but because newLISP doesn't support nested contexts you're begging for trouble.

If a language has aspirations of expansion and wider adoption (as newLISP does), then it had better assure the legions of developers out there that their code will work as expected, and that it won't be broken due to name conflicts with any of the hundreds of third-party libraries available to it.

C has this problem, but even it is equipped to handle this through its use of header files and the 'static' keyword. The only solution available to this problem in newLISP is naming conventions and contexts. You can't just blow away millions of available contexts, as that's exactly what you're proposing.

In semi-large project, or even a smaller one that links to many libraries, there are thousands of functions/macros. If for every macro you dedicated an entire namespace you would almost be guaranteed to run into a conflict. Run your code it works fine, load this library and suddenly everything is broken. Even if by some fluke you managed to avoid any name conflicts, you would be under tremendous pressure for the rest of your development cycle to make sure that you watch which names you pick for your macros/contexts. Personally, I do not accept the above as a real solution to the variable capture problem.
- also any other function or macro (fexpr) in a module 'foo:this', 'foo:that' is completely safe against variable capture/confusion, even when passing quoted symbols.
I'm fairly certain that is not true when it's called by another function/macro in the same context...

From what I can tell, when writing safe macros (as most macros should be), you should never name your arguments and always access them through the args function. That immediately avoids one obvious form of variable capture:

Code: Select all

> (define-macro (foo x) (println (eval x)))
(lambda-macro (x) (println (eval x)))
> (setq x 5)
5
> (foo x)
x
However, there are many different kinds of variable capture, and I recently ran into a great example of one while studying a Scheme tutorial:

Code: Select all

; Scheme
(define-macro my-or
	(lambda (x y)
		`(if ,x ,x ,y)))

; newLISP equivalent
(define-macro (my-or)
	(let (temp (eval (args 0)))
		(if temp temp (eval (args 1)))))
'my-or' is a function that should only evaluate its first argument once, if that evaluates to true then that value is returned, otherwise the second argument is returned.

my-or doesn't use any named arguments, but it creates a symbol called 'temp', and that leads to trouble:

Code: Select all

> (setq temp 5)
5
> (my-or nil temp)
nil ; Ack! Should be 5!
I spent a while studying Kazimir's genlet function (wrote my own version of it), but then I discovered that even genlet won't save you here because of the use of (args). Because newLISP macros are fexprs, you cannot use the Scheme solution either:

Code: Select all

(define-macro my-or
  (lambda (x y)
    (let ((temp (gensym)))
      `(let ((,temp ,x))
         (if ,temp ,temp ,y)))))
I eventually did figure out how to write my-or in newLISP, but that's a subject for another thread.

While I do love many things about newLISP, I think there are valid points to be brought up about some of its weaknesses. To me they are potential pit-falls of the language. At the very least I think these are issues that people looking to learn the language should be aware of.
Get your Objective newLISP groove on.

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

I also think that this is not only true for newLISP but for other LISPs as well, and other languages too (even C can do this). So there is nothing spectacular about the nested dotimes loop
Other languages allow this only if you create a new scope using a new variable declaration, e.g. in JavaScript: for(var i; ...) or Java: for(int i; ...). In C you would have to create a new scope using braces: {int i; for(i; ...) {}}.

The special thing about newLISP is, that it creates the new scope for looping variables automatically. This is basically a "must" for a dynamically scoped language. If you wouldn't have that, you would have to enclose each 'dotimes', 'dolist' etc. in an extra 'let' to create a local scope for the looping variable.

itistoday
Posts: 429
Joined: Sun Dec 02, 2007 5:10 pm
Contact:

Post by itistoday »

I still fail to see how there's anything spectacular about dotimes creating a new scope "automatically"... With almost the same amount of characters, most other languages do that too, whether through the use of braces or what-not.

Regardless, there's still the unaddressed issue of name conflicts. It would be nice if there were something in the manual or CodePatterns document that addressed this issue and perhaps advocated the use of organizational prefixes if default functors are to be the def-facto way of writing macros.

Edit: I think I'm now more with you on the macro situation, see this post.
Get your Objective newLISP groove on.

Locked