New word stemmer, now with multiple language support!

Featuring the Dragonfly web framework
Locked
methodic
Posts: 58
Joined: Tue May 10, 2005 5:04 am

New word stemmer, now with multiple language support!

Post by methodic »

Hi, a while back I made a newlisp word stemmer that used a stemming library:
http://newlispfanclub.alh.net/forum/vie ... mer#p14985

I had to re-visit this and decided to update the library with a different one. The library I'm using now is located here:
http://snowball.tartarus.org/dist/libstemmer_c.tgz

Here is my code, with comments:

Code: Select all

;; stemmer.lsp by tony lambiris <tony@libpcap.net>
;;
;; supported languages:
;; danish dutch english finnish french german hungarian italian norwegian
;; porter portuguese romanian russian spanish swedish turkish
;;
;; HOW TO USE:
;;
;; 1) download http://snowball.tartarus.org/dist/libstemmer_c.tgz
;; 2) extract and in libstemmer_c type 'make CFLAGS=-fPIC'
;; 3) build the shared library: 
;;    gcc -shared -o libstemmer.so libstemmer/*.o src_c/*.o runtime/*.o
;; 4) copy libstemmer.so to the path of your choice (defined by STEMLIB)
;; 5) follow the code examples below to use
;;
;; > (load "stemmer.lsp")
;; MAIN
;; > (STEM:stemmer "english" '("who" "directed" "taxi" "driver"))
;; ("who" "direct" "taxi" "driver")
;; > (STEM:stemmer "french" (parse "Je vous en prie" " "))
;; ("Je" "vous" "en" "pri")

(context 'STEM)

;; change to the location of where you installed the shared library
(constant 'STEMLIB "/home/tlambiris/Code/libstemmer.so")

;; imported function names
(import STEMLIB "sb_stemmer_new")
(import STEMLIB "sb_stemmer_stem")
(import STEMLIB "sb_stemmer_delete")

;; takes 2 parameters (respectively):
;; 1) the language to use for word stemming
;; 2) a list of words
;;
;; if word stemming is successful, a list of stemmed words will be returned
;; otherwise the original list of words will be returned
(define (stemmer lang words)
  (set 'new_words '())

  (dolist (w words)
    (set 's (sb_stemmer_new lang 0))
    (if (> s 0)
      (begin
        ;; we were able to initialize the stemmer. let us stem.
        (set 'n (get-string (sb_stemmer_stem s w (length w))))
        (sb_stemmer_delete s)

        ;; push the stemmed word onto our new list
        (push n new_words -1)
      )
    )
  )

  ;; if new_words is still an empty list, something went wrong
  ;; return the original list of words instead
  (if (= (length new_words) 0)
    words
    new_words
  )
)

(context MAIN)
Here is a direct link:
http://libpcap.net/newlisp/stemmer.lsp

Questions/comments welcome! :) Again, thanks to Lutz for creating such a powerful language!

joejoe
Posts: 173
Joined: Thu Jun 25, 2009 5:09 pm
Location: Denver, USA

Re: New word stemmer, now with multiple language support!

Post by joejoe »

hi methodic,

can't do much yet but to say thanks! it looks like a very useful script.

thanks big for throwing it out there for us. :D

Locked