Getting higher frequency elements out of a list

joejoe · Post by **joejoe** » Tue May 29, 2012 6:59 am

I have almost gotten this on my own, but am stumped.

I am after a list of the words occurring two or more times in 'title-words, that are "good", sorted by frequency (high to low).

Here I try:

Code: Select all

#/usr/local/bin/newlisp

; my list of words:

(set 'title-words '("one" "two" "two" "three" "three" "three" "four" "four" "four" "four" "five" "five" "five" "five" "five" "six" "six"))


; words to remove from my list:

(set 'bad-title-words '("two" "four"))


; the "good" words i want:

(set 'good (difference title-words bad-title-words))


; an index count of the good words frequencies:

(set 'title-words-index (count good title-words))


; good word frequencies that occur more than once in my list

(set 'big-title-words-index (ref-all '1 title-words-index < true))


(println title-words) ; initial list of words
:-> ("one" "two" "two" "three" "three" "three" "four" "four" "four" "four" "five" "five" "five" "five" "five" "six" "six")

(println bad-title-words) ; words to remove
;-> ("two" "four")

(println good) ; words i want to keep
;-> ("one" "three" "five" "six")

(println title-words-index) ; a count of good word frequency
;-> (1 3 5 2)

(println big-title-words-index) ; somehow related to the words i want to get and sort by frequency
;-> (3 5 2)

(exit)

Now i dont know how to get the words back again. I think I got lost and even went down a wrong road.

I would appreciate any directions back to my path! :0)

Patrick · Post by **Patrick** » Tue May 29, 2012 8:54 am

I'm not 100% sure what you wanted, since your description and the example seem to contradict, so let me see: You want all duplicates in a list, sorted by how often they occur?

Code: Select all

;; reverse inner list, so that only the value is an item again
(println (map
  (fn (x) (first x))
  ;; filter out those elements with 1's
  (filter
    (fn (x) (not (= (last x) 1)))
    (sort
      ;; make a list with (value, index)-items
      (map
        (fn (c x) (list x c))
        (count (unique title-words) title-words)
        (unique title-words))
      ;; define compariosn function
      (fn (a b) (> (last a) (last b)))))))

Here that's what I could come up with. Basically first, I look at all the different things that are in the list, generate a list which items are also lists with two elements each: first the value (e.g. "one" or "two"), then how often they are used. Then I sort that list with the custom comparison function (fn (a b) (> (last a) (last b))) (i.e. based on how often they occur), then I filter out all those who are only used once and last, but not least, I revert the list, so that it doesn't contain the (value, index)-pairs anymore, but only the values like you wanted.

Seems this slightly shorter version also works:

Code: Select all

;; reverse inner list, so that only the value is an item again
(println (map
  (fn (x) (last x))
  ;; filter out those elements with 1's
  (filter
    (fn (x) (!= (first x) 1))
    (sort
      ;; make a list with (index, value)-items
      (map
        list
        (count (unique title-words) title-words)
        (unique title-words))
      >))))

cormullion · Post by **cormullion** » Tue May 29, 2012 12:07 pm

Another approach is to use contexts...

Code: Select all

(set 'title-words '("one" "two" "two" "three" "three" "three" "four" "four" "four" "four" "five" "five" "five" "five" "five" "six" "six"))
(set 'bad-title-words '("two" "four"))

(define C:C)
(dolist (word title-words)
   (if (set 'tally (C word))
       (C word (inc tally))
       (C word 1)))

and a function to check words:

Code: Select all

(define (good? word)
   (not (find word bad-title-words)))

A list of all words sorted by their frequency:

Code: Select all

(sort (C) (fn (x y) (> (last x) (last y))))
;-> (("five" 5) ("four" 4) ("three" 3) ("two" 2) ("six" 2) ("one" 1))

Just the good words, and their frequency:

Code: Select all

(filter (fn (x) (good? (first x))) (C))
;-> (("five" 5) ("one" 1) ("six" 2) ("three" 3))

So, all the good words and their frequencies, sorted by frequency:

Code: Select all

(filter (fn (x) (good? (first x))) 
    (sort (C) 
    (fn (x y) (> (last x) (last y)))))

;-> (("five" 5) ("three" 3) ("six" 2) ("one" 1))

joejoe · Post by **joejoe** » Wed May 30, 2012 11:40 pm

Kickin!

Thanks Patrick and cormullion! Im on the move again!

Both are superb examples and really teach a lot! Much appreciated again! :0)

newlispfanclub.alh.net

Getting higher frequency elements out of a list

Getting higher frequency elements out of a list

Re: Getting higher frequency elements out of a list

Re: Getting higher frequency elements out of a list

Re: Getting higher frequency elements out of a list