Faster than find?

kanen · Post by **kanen** » Wed Apr 13, 2011 12:13 am

I have the following issue:

> (dolist (s (sequence 1 10000)) (push (rand 300 4) y -1))
> (set 'z (y -2))   
; this is the second to last {number} :)
> (time (find z y))
; 8.x ms

Why is this a problem? I need a faster search through these numbers. 8+ milliseconds is way too long.

The numbers must be in this format, or something very close. Any extra processing to convert the input (which is z in this case) could take too long.

I can happily convert the y to a better format, if it improves the search speed.

Is there a faster find I should be considering?

cormullion · Post by **cormullion** » Wed Apr 13, 2011 7:18 pm

Hmm. Tricky to know where to make the trade-offs. Are the numbers/lists unique? If so, you could use a dictionary, which is probably much faster for recall, although the building of said dictionary could be more time-consuming, although perhaps it could be done during the more contemplative parts of your code.

kanen · Post by **kanen** » Wed Apr 13, 2011 7:28 pm

The large list can be pre-built.

The thing-to-find can be dynamically assembled in the same format as the large list.

Elaborate on your dictionary idea. I would love to hear your thoughts.

cormullion wrote:Hmm. Tricky to know where to make the trade-offs. Are the numbers/lists unique? If so, you could use a dictionary, which is probably much faster for recall, although the building of said dictionary could be more time-consuming, although perhaps it could be done during the more contemplative parts of your code.

cormullion · Post by **cormullion** » Wed Apr 13, 2011 9:01 pm

Well, imagine that you stored the list of four numbers with a string in a dictionary:

Code: Select all

(new Tree 'H)
(dolist (s (sequence 1 10000)) 
    (set 'n (rand 300 4))
    (H (format "%03d%03d%03d%03d" n) n))

Now H is:

Code: Select all

("003263145157" (3 263 145 157))
("005053174019" (5 53 174 19))
..

I would expect lookups to be faster:

Code: Select all

(set 'z ((H) -2))
(H (format "%03d%03d%03d%03d" (last z)))

since dictionaries are faster than lists for looking up values. At least, I think they should be. But of course there is some additional overhead...

The question of uniqueness of keys is still pertinent, though... :)

kanen · Post by **kanen** » Fri Apr 15, 2011 4:52 am

Genius.

I should have considered it, but I didn't think it would be that much faster.

Turns out, I was totally wrong. It is way faster.

Code: Select all

(new Tree 'H)
(dolist (s (sequence 1 10000)) 
    (set 'n (rand 300 4))
    (H (format "%03d%03d%03d%03d" n) n))

(164 274 162 184)

> (time (H "164274162184"))
0.011

Pretty fast! Let's try another one...

Code: Select all

> ((H) -2)
("299285172263" (299 285 172 263))

> (time (H "299285172263"))

Now that you've shared... remind me to write something about my contexts as multiple timers hack... :)

xytroxon · Post by **xytroxon** » Fri Apr 15, 2011 4:33 pm

If these are IPv4 addresses, the numbers are always 8 bit bytes (000 - 255 decimal ie: 00-FF hexadecimal) and can be turned into shorter (faster to search) hex char strings (8 chars for hexadecimal vs 12 chars for decimal)...

Code: Select all

    (set 'n (rand 256 4))
    (H (format "%02X%02X%02X%02X" n) 0)

n = ( 000 000 000 000) -> 00000000
n = ( 255 128 005 000) -> FF080500
n = ( 255 255 255 255) -> FFFFFFFF

-- xytroxon

newlispfanclub.alh.net

Faster than find?

Faster than find?

Re: Faster than find?

Re: Faster than find?

Re: Faster than find?

Re: Faster than find?

Re: Faster than find?