Feature request

Jeff · Post by **Jeff** » Mon Mar 16, 2009 5:46 pm

Lutz,

Would it be possible to make swap work with arrays? Would it also be possible to have push/pop ops for arrays that increases/decreases an array's size destructively?

Lutz · Post by **Lutz** » Tue Mar 17, 2009 11:05 am

'swap' yes, 'push/pop' no.

The current 'swap' syntax with three arguments will be deprecated and the current syntax with two arguments will be expanded to take places/references similar to 'setf/setq' and 'inc/dec'.

Currently you can do:

Code: Select all

(swap x y)

to swap the contents of x and y. In the future you additionally will be able to:

Code: Select all

; swap elements in the same list or array

(swap (list-a 3) (list-a 5))
(swap (first a-array) (last a-array))

; or swap elements between different objects

(swap (first the-list) (last other-list))
(swap x (last the-array))

so basically any two places can be swapped. This will simplify the syntax of 'swap', limiting it to one pattern: (swap left-place right-place), which currently already works on symbols.

Jeff · Post by **Jeff** » Tue Mar 17, 2009 12:04 pm

That would be perfect. Thanks!

Concerning pushing/popping an array, I was thinking along the lines of simply extending them without having to rebuild them entirely (e.g. (append arr-1 (array 1 new-value))), as in the case of a vector. Popping would shift values in one direction (both ways). Obviously, these would not be the same as the list pop/push methods. They would need to be new ones, since they perform a much different function. But not having the ability to allocate more memory to or truncate an array means that we cannot build efficient data structures using newlisp.

For example, I need to build an extremely large priority queue that may be grown or shrunk arbitrarily (memory efficiency is a goal). Doing so using a list-based tree is incredibly inefficient. Implementing a heap with a linked list, even a dequeue, does not grow well. Above a few thousand items, it becomes ponderous to keep the thing sorted.

I'm currently using a sort of hack that tracks changes to a list on push ops, and only sorting on pops if it has been altered. This grows exponentially with each push/pop boundary, which is common on an active queue, but it is the only way I can think of that has any sort of reasonable performance with any number of entries (and it still is not enough).

I'm close to writing an extension in C, but I would like to avoid that if at all possible; I would like to make this as easily transportable as possible.

Kazimir Majorinc · Post by **Kazimir Majorinc** » Tue Mar 17, 2009 2:30 pm

Did you tried symbols?

a1 .... a1000 instead of (a 1) ... (a 1000)?

Generally, it is slower than built in arrays - but you can push and pop on the both sides - for the beginning. It might look as hack, but I think it is vice versa, this is the real thing, and buit in arrays are optimization - justified in many cases, but break if one need generalization. Exactly your case.

Code: Select all

(println (time (for (i 0 99999)
            (set (sym (string "a" i)) i))))

I tried it as well with array, list and empty loop. Results on my computer are:

symbols: 226-334 -- it should be O(n*log n)
array: 16 -- it should be O(n)
list: 13809 -- it should be O(n*n)
empty loop: 4 -- it should be O(n)

It looks promising to me. Furthermore, practically all extra time of symbols - compared with array is spent in evaluating (sym (string "a" i)). Expression (setf a100 1) is nearly twice faster than (setf (a 100) 1) if a is array.

Jeff · Post by **Jeff** » Tue Mar 17, 2009 2:35 pm

That is certainly an idea. I will have to look at that. I would need a counter to track the size so I don't overrun it, but yeah - that might work, and would give me tree access speed. Sorting values into symbol-hashed positions might take a long time, though.

Kazimir Majorinc · Post by **Kazimir Majorinc** » Tue Mar 17, 2009 5:31 pm

Closer inspection demonstrated that almost all time here

Code: Select all

(println (time (for (i 0 99999)
            (set (sym (string "a" i)) i))))

is spent in the function string:

Code: Select all

(println (time (setf (b 10000) 0) 99999))               => 13 (b is array)
(println (time (setf a50000 0) 99999))                  => 9 
(println (time (set (sym "a50000") 0) 99999))           => 38
(println (time (set (sym (string "a50000")) 0) 99999))  => 147
(println (time (set (sym (string "a" 50000)) 0) 99999)) => 251

Time for accessing the symbol in the symbol-table is very OK - even if there are millions of symbols defined - and it is the most important. Sym itself seems to be faster than string, which maybe has some capacity for improvement since (string "a50000") is slower than (sym "a50000").

Lutz · Post by **Lutz** » Tue Mar 17, 2009 6:01 pm

Yes, symbol trees in newLISP are pretty fast and already sorted during creation. Instead of using 'sym' you can use newLISP's hash mechanism built on the same. 'Tree' is a bullt-in 'Tree:Tree'. You also could do: (define MyHash:MyHash).

Code: Select all

>  (new Tree 'MyHash)
MyHash
> (time (MyHash (format "%d" (rand 10000000)) (rand 100)) 1000000)
4145
> (length (MyHash)) ; some doubles
951484
> (time (MyHash))
627
> (0 5 (MyHash))
(("1000009" 87) ("1000012" 55) ("1000020" 2) ("1000024" 98) ("1000030" 68))
>

About 4 seconds to create million symbols (Mac Mini). Less than a second to create the sorted list of key-value pairs.

Kazimir Majorinc · Post by **Kazimir Majorinc** » Tue Mar 17, 2009 6:16 pm

For the single most demanding array, numbers can be used as symbols - avoiding string bottleneck.

Code: Select all

(println (time (set (sym 50000) 0) 99999))

Time - 74. (now we are on 6 times slower than arrays)

Jeff · Post by **Jeff** » Tue Mar 17, 2009 6:35 pm

Lutz wrote:Yes, symbol trees in newLISP are pretty fast and already sorted during creation. Instead of using 'sym' you can use newLISP's hash mechanism built on the same. 'Tree' is a bullt-in 'Tree:Tree'. You also could do: (define MyHash:MyHash).

Does dotree iterate through the tree with the keys sorted? Is there a way to set the sort function?

Lutz · Post by **Lutz** » Tue Mar 17, 2009 7:14 pm

Does dotree iterate through the tree with the keys sorted?

Yes, basically 'dotree' is a recursive binary balanced tree walker. Also note than when the symbol tree is created using the hash-API, all symbols have an underscore prepended. This is necessary for safe saving/loading to/from disk and when symbols resemble built-in primitives. The hash-API doesn't show them, but you see them when doing (symbols MyHash) or when using dotree. Basically you are allowed to mix both approaches, as long as you think about your underscore-prefixes.

So when you use (dotree (sym MyHash) ... ) for traversal versus (dolist (item (MyHash)) ...) don't forget to prepend the underscore. When using (dolist (item (MyHash)) ...) only the key-value pairs in the item are shown, where the key-string has the prepended underscore. This is useful when hosting other stuff (e.g. functions) besides the key-value pairs in the MyHash namespace.

The sort function for symbol-trees is part of the tree algorithm and cannot be changed, its the normal sort-order for strings on computers.

Jeff · Post by **Jeff** » Tue Mar 17, 2009 7:37 pm

Well, it doesn't really matter what the symbols are, so long as they are sorted. I am going to be using this in place of a heap.

Kazimir Majorinc · Post by **Kazimir Majorinc** » Thu Mar 19, 2009 5:04 pm

Using symbols as keys might look strange (and even repulsive to some), but in my opinion, it is consistent with code=data, in this case symbols and keys respectively.

I'd like to note that swap seems to be performed by value internally. If x and y are lists, (swap x y) linearly depends on the size of x and y.

In cases similar to Jeff's, it might require one extra level of indirection, and not only in the case lists are large: swapping symbols is some 5% faster than swapping empty lists.

Code: Select all

(set 'x (sym "a"))
(set 'y (sym "b"))
(set 'u '(1))
(set 'v '(2))

(println  "symbols : " (time (swap x y) 1000000) ) 
(println  "one-element lists : " (time (swap u v) 1000000) )

Perhaps swap can be implemented to work "by reference" in the case of symbols, I think that with ORO, users cannot see the difference.

Another low hanging fruit seems to be function "rename-symbol". If symbols start to be used as keys, it might be really useful, but right semantics might be not so simple to figure out.

Lutz, how is that if symbol is deleted, then all occurrences of that symbol disappear? Why new symbol is not as good as old? I do not say it is wrong, I'm just interested. Here is the code that confuses me.

Code: Select all

> (set 'f '(x 3))
(x 3)
> (delete 'x)
true
> (set 'x 4)
4
> f
(nil 3)
>

Lutz · Post by **Lutz** » Thu Mar 19, 2009 5:50 pm

Perhaps swap can be implemented to work "by reference" in the case of symbol

Already done for 10.0.3. See my post from Match 17th in this thread with examples. This version goes up either as a maintenance release beginning of May or as a development release by the end of March.

Lutz, how is that if symbol is deleted, then all occurrences of that symbol disappear?

If a symbol gets deleted all references to it have to get 'nil', as you have shown in your post. If not, you would end up with pointers pointing into nowhere. There is an extra boolean flag for the 'delete' statement to tell it, only to delete if the symbols is not referenced.

To delete a symbol using the hash syntax use: (MyHash "theKey" nil). To check is its there: (MyHash "theKey") returns 'nil' is no such key.

Deleting a symbol means, taking it out of the symbol tree and checking (changing to 'nil') all references to it in memory.

To delete a whole symbol space you have to delete the context symbol twice. The first time it will remove all its symbols hosted and demote the context symbol to a normal symbols. The second time, the context symbol will be deleted too.

Another low hanging fruit seems to be function "rename-symbol"

Possible but not low hanging ;-). The symbol would have to be deleted then created again and all references to the old one changed to the new one. Symbols are maintained in a so called "Red Black Binary Tree" (see: http://en.wikipedia.org/wiki/Red-black_tree ). This means you cannot just rename the symbol entry, but the symbol gets a new place in the tree depending on its name.

The implementation used in newLISP is by Thomas Niemann: http://epaperpress.com/sortsearch/downl ... search.pdf

More recently a newer implementation has popped up on the web and has been widely discussed and got mixed reviews. I have looked into it and decided to stay with the present one for the time beeing. Symbol handling is very central to newLISP itself as namespaces and hashing are handled by it.

Jeff · Post by **Jeff** » Thu Mar 19, 2009 7:03 pm

Lutz - couldn't you just point another symbol to the root of the context and then delete the original?