SentenceBoundary.lsp?

Notices and updates
Locked
methodic
Posts: 58
Joined: Tue May 10, 2005 5:04 am

SentenceBoundary.lsp?

Post by methodic »

Doesn't appear to exist anymore:
http://www.newlisp.org/code/SentenceBoundary.lsp.txt

Anyone have a copy of this? Thanks in advance.

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

Yes, I have a copy here. It's Rev: 2630, dated 2006.

Which means it doesn't work with current versions of newLISP.

If you don't hear from the authors directly, I'll post a copy. I'd rather they had the opportunity to decide what to do with it first.

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

I have put it back here:

http://www.newlisp.org/syntax.cgi?code/ ... ry.lsp.txt

if you change (inc 'i) to (inc i) and use 'setf' on the lines where 'nth-set' is used, then it should work for version 10.

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

Now has been updated for newLISP v.10.0 :

http://www.newlisp.org/syntax.cgi?code/ ... ry.lsp.txt

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

thanks to methodic fo correcting a typo, now at rev 2697:

http://www.newlisp.org/syntax.cgi?code/ ... ry.lsp.txt

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

Are these lines correct?

Code: Select all

(setf (word_list (- i 1) -1) (append (word_list (- i 1) -1) "."))
(setf (word_list (- i 1)) word_list (append (word_list (- i 1)) tmp))

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

They are a direct translation from the 'nth-set' statements with flat version 9.0 syntax.

It does a good job on wikipedia pages (corrected link):

(url-to-sentences "http://en.wikipedia.org/wiki/Grady_(band)" 15000)


but it bombs on your home page :-(. Perhaps Methodic can put some more work into it.

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

I think my version works:

Code: Select all

(println (slice (url-to-sentences "http://unbalanced-parentheses.nfshost.com/" 10000) 0 20

("blog: unbalanced-parentheses" "about this blog" "atom newsfeed" "contact me" "twitters @ 12:25" 
 "#opensources newLISP 10.0.6: newLISP is a Lisp-like, general-purpose scripting language." 
 "It has al." "http://chilp.it/?76b031" "from OpenSources (Jacob Peacock)" "1 day and 20 hours ago" 
 "newLISP 10.0.6 - newLISP is a Lisp-like, general-purpose scripting language." "It has all the magic of traditional L. http://bit.ly/ZCc1L" 
 "from fmannounce (freshmeat announces)" "1 day and 22 hours ago" "Development release 10.0.6 features minor enhancements and bug fixes: http://www.newlisp.org/downloads/development/" 
 "from newlisp (newLISP)" "2 days and 17 hours ago" "@ghfischer Hey do you have a copy of the Sentence Boundary code? I can't find it on newlisp.org anymore." 
 "from m3thodic (tony)" "5 days and 12 hours ago")
I think your conversion to v10 has more errors than mine... :)

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

Can you post a link to the corrected version or post the critical correction?

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

The lines in my version that differ from yours are (according to BBEdit):

Code: Select all

line  91:    (if (not break_token) (set 'break_token ""))
line 176:   (set 'word_list (map (fn(x) (parse x " ")) sentence_list))
line 212:             (setf (word_list (- i 1))    (append (word_list (- i 1)) tmp))
line 215: (begin
            (inc i) 
          )

I think line 212 is the culprit.

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

Thanks Cormullion, now I see the superfluous 'word_list' . A new version is posted:

http://www.newlisp.org/syntax.cgi?code/ ... ry.lsp.txt

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

I think that version has a stray "syntax highlighting with newLISP and syntax.cgi" line near the end...

This file is definitely jinxed...!

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

I had to look that up:

jinxed
adj : (usually used colloquially) causing or accompanied by misfortune [syn: hexed]

definitely hexed :)

methodic
Posts: 58
Joined: Tue May 10, 2005 5:04 am

Post by methodic »

Just a quick post on a bug I found..

On line 180, GetSentences will fail if on the last argument if the sentence passed in has any whitespace in it; ie: "The brown fox jumped high. "

Adding a trim statement to any line before the parse will fix it. I'm sure it was written so that you pass the string through clean-html first, but for what I'm using it for I don't need that function.

Thanks again to everyone who helped make this 10.0 compliant! :)

methodic
Posts: 58
Joined: Tue May 10, 2005 5:04 am

Post by methodic »

Trim wasn't the issue after all... I _think_ I found the bug... sent Lutz a new file which he should be posting shortly. :)

kanen
Posts: 145
Joined: Thu Mar 25, 2010 6:24 pm
Contact:

Re: SentenceBoundary.lsp?

Post by kanen »

You should probably change the Contact information in Sentence Boundary, as neither of these e-mail addresses work any longer. :)
;; Creative Commons Attribution (by) License v2.5
;; Full text - http://creativecommons.org/licenses/by/2.5/
;; Contact - fischer@kozoru.com, desanto@kozoru.com
;; Copyright (c) 2006, kozoru, Inc.
. Kanen Flowers http://kanen.me .

Locked