Faster than find?

Q&A's, tips, howto's
Locked
kanen
Posts: 145
Joined: Thu Mar 25, 2010 6:24 pm
Contact:

Faster than find?

Post by kanen »

I have the following issue:

Code: Select all

> (dolist (s (sequence 1 10000)) (push (rand 300 4) y -1))
> (set 'z (y -2))   
; this is the second to last {number} :)
> (time (find z y))
; 8.x ms
Why is this a problem? I need a faster search through these numbers. 8+ milliseconds is way too long.

The numbers must be in this format, or something very close. Any extra processing to convert the input (which is z in this case) could take too long.

I can happily convert the y to a better format, if it improves the search speed.

Is there a faster find I should be considering?
. Kanen Flowers http://kanen.me .

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Re: Faster than find?

Post by cormullion »

Hmm. Tricky to know where to make the trade-offs. Are the numbers/lists unique? If so, you could use a dictionary, which is probably much faster for recall, although the building of said dictionary could be more time-consuming, although perhaps it could be done during the more contemplative parts of your code.

kanen
Posts: 145
Joined: Thu Mar 25, 2010 6:24 pm
Contact:

Re: Faster than find?

Post by kanen »

The large list can be pre-built.

The thing-to-find can be dynamically assembled in the same format as the large list.

Elaborate on your dictionary idea. I would love to hear your thoughts.
cormullion wrote:Hmm. Tricky to know where to make the trade-offs. Are the numbers/lists unique? If so, you could use a dictionary, which is probably much faster for recall, although the building of said dictionary could be more time-consuming, although perhaps it could be done during the more contemplative parts of your code.
. Kanen Flowers http://kanen.me .

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Re: Faster than find?

Post by cormullion »

Well, imagine that you stored the list of four numbers with a string in a dictionary:

Code: Select all

(new Tree 'H)
(dolist (s (sequence 1 10000)) 
    (set 'n (rand 300 4))
    (H (format "%03d%03d%03d%03d" n) n))
Now H is:

Code: Select all

("003263145157" (3 263 145 157))
("005053174019" (5 53 174 19))
..
I would expect lookups to be faster:

Code: Select all

(set 'z ((H) -2))
(H (format "%03d%03d%03d%03d" (last z)))
since dictionaries are faster than lists for looking up values. At least, I think they should be. But of course there is some additional overhead...

The question of uniqueness of keys is still pertinent, though... :)

kanen
Posts: 145
Joined: Thu Mar 25, 2010 6:24 pm
Contact:

Re: Faster than find?

Post by kanen »

Genius.

I should have considered it, but I didn't think it would be that much faster.

Turns out, I was totally wrong. It is way faster.

Code: Select all

(new Tree 'H)
(dolist (s (sequence 1 10000)) 
    (set 'n (rand 300 4))
    (H (format "%03d%03d%03d%03d" n) n))

(164 274 162 184)

> (time (H "164274162184"))
0.011
Pretty fast! Let's try another one...

Code: Select all

> ((H) -2)
("299285172263" (299 285 172 263))

> (time (H "299285172263"))
Now that you've shared... remind me to write something about my contexts as multiple timers hack... :)
. Kanen Flowers http://kanen.me .

xytroxon
Posts: 296
Joined: Tue Nov 06, 2007 3:59 pm
Contact:

Re: Faster than find?

Post by xytroxon »

If these are IPv4 addresses, the numbers are always 8 bit bytes (000 - 255 decimal ie: 00-FF hexadecimal) and can be turned into shorter (faster to search) hex char strings (8 chars for hexadecimal vs 12 chars for decimal)...

Code: Select all

    (set 'n (rand 256 4))
    (H (format "%02X%02X%02X%02X" n) 0)
n = ( 000 000 000 000) -> 00000000
n = ( 255 128 005 000) -> FF080500
n = ( 255 255 255 255) -> FFFFFFFF

-- xytroxon
"Many computers can print only capital letters, so we shall not use lowercase letters."
-- Let's Talk Lisp (c) 1976

Locked