Lists within lists and finding things...

kanen · Post by **kanen** » Tue Aug 20, 2013 3:33 am

I'm trying to figure out if any of the items in a bigger list are in a smaller list. I've tried a bunch of options, but none of them are fast enough.

Code: Select all

(setf big-list '( ((5 0 0 0 3 14 0 1 0 0 1 0 0 0 0 0 0) 1006) ((8 97 108 45 106 105 110 97 110 3 110 101 116 0 0 1 0 1) 1001) ((8 97 49 45 106 105 110 97 110 3 110 101 116 0 0 1 0 1) 1001) ((6 97 114 100 100 114 97 4 104 111 115 116 2 115 107 0 0 1 0 1) 1001) ((3 119 119 119 5 106 111 45 117 102 3 110 101 116 0 0 1 0 1) 1001) ((3 119 119 119 12 106 111 102 112 109 117 121 116 114 118 99 102 3 99 111 109 0 0 1 0 1) 1001) ((71 69 84 32) 1001) ((46 46 47 46 46 47 46 46 47) 232) ((205 128 232 215 255 255 255) 232) ((7 10 0 0 0 75 0 208 0) 286) ((7 0 0 0 0 75 0 208 0) 286) ((0 0 0 0 3 97 112 105 7) 1010) ))
;
(setf small-list '(59 126 1 0 0 1 0 0 0 0 0 0 3 97 112 105 7 116 119 105 116 116 101 114 3 99 111 109 0 0 1 0 1) )

Is there a REALLY FAST way to see if an item in the big-list is contained within the small-list and return the value associated with the big-list content? For example;

Code: Select all

> (last big-list)
((0 0 0 0 3 97 112 105 7) 1010)

I can now do something like:

Code: Select all

(intersect ((last big-list) 0) small-list)
; (0 3 97 112 105 7)

If the above was present in the small-list (which it is in this case), I'd want to return 1010, which is the "value" of that line in the big-list.

The only problem with this way of doing it is the "0" gets squashed into one value, so I don't get a "this is an exact match". Also, intersect seems to be slower than I'd like.

Thanks!

rickyboy · Post by **rickyboy** » Tue Aug 20, 2013 5:02 am

Kanen,

If you want to find an exact sublist in a longer list, then this might do:

Code: Select all

(define (sublist? SUBLIST SUPERLIST)
  "Check if SUBLIST occurs in SUPERLIST, for instance
   (sublist? '(1 2 3) '(0 1 2 3 4)) => true."
  (let (len-superlist (length SUPERLIST)
        len-sublist (length SUBLIST))
    (for (i 0 (- len-superlist len-sublist) 1
            (= SUBLIST (slice SUPERLIST i len-sublist)))
      nil)))

(define (second xs) (xs 1))

Then the following expression answers part of your question, namely "... see if an item in the big-list is contained within the small-list and return the value associated with the big-list content?" This will check all big-list items against small-list and return only the values.

Code: Select all

> (map second (filter (fn (b) (sublist? (b 0) small-list)) big-list))
(1010)

Looks like there is only one "record" in big-list that contains a sublist of small-list: the one associated to the value 1010.

sublist? is not as fast as intersect, but it's correct, whereas the intersect usage is not (if I understand what you are trying to do). Maybe you can make sublist? faster. Cheers!

rickyboy · Post by **rickyboy** » Tue Aug 20, 2013 6:16 pm

Hello Kanen,

When I wrote my last message, I had to guess (a little) as to what exactly you needed. Then today, I thought of another meaning regarding what you said.

When I coded sublist?, I assumed that you wanted to see if a big-list item (which was a list) was a sublist of small-list. Since order matters in lists, sublist? obeys the ordering of the elements of the lists involved. It also assumes that adjacent elements in the corresponding lists are preserved. For instance:

Code: Select all

> ;; Order matters
> (sublist? '(3 4) '(0 1 2 3 4))
true
> (sublist? '(4 3) '(0 1 2 3 4))
nil
> ;; Adjacency matters
> (sublist? '(3 4) '(0 1 2 3 3.5 4))
nil

But what if you had really meant for the "containment" to not necessarily respect element order or adjacency? Then, a natural way to do that would be to treat the lists as multisets. In this case, the test then changes from sublist? to subset? (where "set" in this context means multiset, a generalization of proper sets).

If that is what you were really after then please consider the following definitions.

Code: Select all

(define (subset? A B)
  "Is A a subset of B? (A and B are lists that act as multisets)."
  (if (empty? A) true
      (empty? B) nil
      (letn (sizeA0 (length A)        ; size of original A
             sizeB0 (length B)        ; size of original B
             dsize0 (- sizeB0 sizeA0) ; original delta of sizes
             sizeA sizeA0 sizeB sizeB0 dsize dsize0)
        (while (and (>= dsize 0) (= dsize0 dsize)
                    (not (empty? A)))
          (letn (elt (first A)
                 cnt (length (find-all elt A)))
            (setq A (remove cnt elt A))
            (setq B (remove cnt elt B))
            (setq sizeA (length A))
            (setq sizeB (length B))
            (setq dsize (- sizeB sizeA))))
        (empty? A))))

for which you will need the following definition.

Code: Select all

(define (remove HOW-MANY ELT LST)
  "Remove as many as HOW-MANY occurences of ELT in LST."
  (let (elt-position true)
    (while (and elt-position (> HOW-MANY 0))
      (when (setq elt-position (find ELT LST))
        (pop LST elt-position)
        (dec HOW-MANY)))
    LST))

Now, neither order nor adjacency matter.

Code: Select all

> ;; Order doesn't matter now.
> (subset? '(3 4) '(0 1 2 3 4))
true
> (subset? '(4 3) '(0 1 2 3 4))
true
> ;; Adjacency doesn't matter either.
> (subset? '(3 4) '(0 1 2 3 3.5 4))
true

It just so happens that, with the given definitions of big-list and small-list, using subset? as the test yields the same answer as using sublist? as the test.

Code: Select all

> (map second (filter (fn (b) (subset? (b 0) small-list)) big-list))
(1010)

And as before, the functions I defined might not meet your performance standards, so please bum the code as appropriate. :)

newlispfanclub.alh.net

Lists within lists and finding things...

Lists within lists and finding things...

Re: Lists within lists and finding things...

Re: Lists within lists and finding things...