(parse) oddness

Q&A's, tips, howto's
Locked
kanen
Posts: 145
Joined: Thu Mar 25, 2010 6:24 pm
Contact:

(parse) oddness

Post by kanen »

I have a problem, where parse is returning an extra item.

I have attached code for this problem, with comments.

Code: Select all

;; sites-sm.txt
[text] 
;; copy below to sites-sm.txt
1  google.com
2  facebook.com
3  yahoo.com
4  youtube.com
5  live.com
6  wikipedia.org
7  blogger.com
8  baidu.com
9  msn.com
10 yahoo.co.jp
[/text]

(set 'sites (parse (read-file "sites-sm.txt") "\n"))
;; ("1\tgoogle.com" "2\tfacebook.com" "3\tyahoo.com" "4\tyoutube.com" i
;; "5\tlive.com" "6\twikipedia.org" "7\tblogger.com" "8\tbaidu.com" 
;; "9\tmsn.com" "10\tyahoo.co.jp" "")

(println "Sites has " (length sites) " entries") ; -> 
;; Sites has 11 entries

(println (slice (sites 0) (+ 1 (find "\t" (sites 0))) ) ) 
;; google.com

(dolist (x sites)
   (println (slice x (+ 1 (find "\t" x)) ) ) )
;; prints: google.com ... yahoo.co.jp
;;
;; ERR: value expected : (find "\t" x)

(exit)
As you can see, parse returns everything in the file, plus an extra "" item in the list.

This causes everything from the parse to start failing, for various reasons.

* 11 items, when there are only 10 in the list (or, should only be 10)
* The dolist fails on the "" item, because there is no "\t" character to find

Obviously, I can correct this with something like:

Code: Select all

(dolist (x sites)
   (if (> (length x) 0) (push x newsites))
)  
(set 'sites newsites)
I feel like I am missing something fundamental here, but in my mind, adding this extra list item, with a blank list, seems like a bug or a failure. Can someone help me understand this issue clearly?
. Kanen Flowers http://kanen.me .

Sammo
Posts: 180
Joined: Sat Dec 06, 2003 6:11 pm
Location: Loveland, Colorado USA

Re: (parse) oddness

Post by Sammo »

A trailing "\n" in "sites-sm.txt" is the culprit. Try removing the trailing newlines before parsing:

Code: Select all

(set 'sites (parse (trim (read-file "sites-sm.txt") "" "\n") "\n"))
Edit: Corrected typo

kanen
Posts: 145
Joined: Thu Mar 25, 2010 6:24 pm
Contact:

Re: (parse) oddness

Post by kanen »

I thought the same thing, but... unless I completely misunderstand how (parse) works, I can tell you there is no trailing newlines in the file.

Opening it in vim or another editor shows only a newline after each entry, but not an extra, trailing newline.

Yes, your solution fixes the problem, but the problem does not actually exist because there's no blank (or trailing) newline at the end of the file.

Am I still missing something?
Sammo wrote:A trailing "\n" in "sites-sm.txt" is the culprit. Try removing the trailing newlines before parsing:

Code: Select all

(set 'sites (parse (trim (read-file "sites-sm.txt") "" "\n") "\n"))
Edit: Corrected typo
. Kanen Flowers http://kanen.me .

Sammo
Posts: 180
Joined: Sat Dec 06, 2003 6:11 pm
Location: Loveland, Colorado USA

Re: (parse) oddness

Post by Sammo »

If there isn't a trailing "\n" in "sites-sm.txt", then is one being appended in (read-file "sites-sm.txt")? It seems that it would have to be in order for trim to correct the problem.
-- Sam

kanen
Posts: 145
Joined: Thu Mar 25, 2010 6:24 pm
Contact:

Re: (parse) oddness

Post by kanen »

Makes sense as to what is happening - (read-file) adding a trailing newline.

I consider this to be a bug, though, unless someone can correct my thinking.

Perhaps I should be reading the file differently?
Sammo wrote:If there isn't a trailing "\n" in "sites-sm.txt", then is one being appended in (read-file "sites-sm.txt")? It seems that it would have to be in order for trim to correct the problem.
-- Sam
. Kanen Flowers http://kanen.me .

Sammo
Posts: 180
Joined: Sat Dec 06, 2003 6:11 pm
Location: Loveland, Colorado USA

Re: (parse) oddness

Post by Sammo »

On Win XP running newLisp 10.2.1, I dup'd your file (complete with \n instead of \r\n) but could not duplicate your symptom.

Code: Select all

(parse (read-file {C:\MyDirectory\sites-sm.txt}) "\n")
returns

Code: Select all

("1\tgoogle.com" "2\tfacebook.com" "3\tyahoo.com" "4\tyoutube.com" "5\tlive.com" 
 "6\twikipedia.org" "7\tblogger.com" "8\tbaidu.com" "9\tmsn.com" "10\tyahoo.co.jp")
The problem is either in a newLisp version later than 10.2.1 or in a non-Windows version.

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Re: (parse) oddness

Post by Lutz »

Like on Win XP with 10.2.1 it runs correctly on UNIX too with all versions of newLISP.

Vi adds extra line feed character(s) on Windows and on UNIX, even if not typed in and not visible in vi.

But instead of stripping the extra trailing line-feed, here is another way to parse using 'find-all':

Code: Select all

> (find-all {[^\n]+} (read-file "example.txt"))
("1  google.com" "2  facebook.com" "3  yahoo.com" "4  youtube.com" "5  live.com"
 "6  wikipedia.org" "7  blogger.com" "8  baidu.com" "9  msn.com" "10 yahoo.co.jp")
while 'parse' defines the border between items, the regex in 'find-all' defines the item content. I used curl braces instead of quotes in the above example, so I don't have to double escape the line-feed character.

'find-all' also takes extra options, which let you process each item found, and by default always uses regular expressions, which 'parse' does not.

Regarding a raw-packet option, from another post:
Kanen wrote:Now, if I could just convince you to create a (raw-packet) option in newLISP, my problems would be solved. ;P
I tried to create a 'net-packet' a few years back and used example code from here:

http://mixter.void.ru/rawip.html

but could not get it to work (that was with OSX on PPC G4 and FreeBSD on Intel x368), even the example code. Perhaps it was some simple thing, I missed. What I need is a working C example.

kanen
Posts: 145
Joined: Thu Mar 25, 2010 6:24 pm
Contact:

Re: (parse) oddness

Post by kanen »

Lutz wrote:Like on Win XP with 10.2.1 it runs correctly on UNIX too with all versions of newLISP.

Vi adds extra line feed character(s) on Windows and on UNIX, even if not typed in and not visible in vi.
Yes it does, because vi added a ":set binary" or a "-b" command-line option. This might be enough to migrate me to vile, just to not deal with it. Frustrating!
Lutz wrote:But instead of stripping the extra trailing line-feed, here is another way to parse using 'find-all':

Code: Select all

> (find-all {[^\n]+} (read-file "example.txt"))
("1  google.com" "2  facebook.com" "3  yahoo.com" "4  youtube.com" "5  live.com"
 "6  wikipedia.org" "7  blogger.com" "8  baidu.com" "9  msn.com" "10 yahoo.co.jp")
Fantastic. Thanks for the easy (and appreciated) example. Makes my life less complicated.
Lutz wrote:while 'parse' defines the border between items, the regex in 'find-all' defines the item content. I used curl braces instead of quotes in the above example, so I don't have to double escape the line-feed character.

'find-all' also takes extra options, which let you process each item found, and by default always uses regular expressions, which 'parse' does not.

Regarding a raw-packet option, from another post:
Kanen wrote:Now, if I could just convince you to create a (raw-packet) option in newLISP, my problems would be solved. ;P
I tried to create a 'net-packet' a few years back and used example code from here:

http://mixter.void.ru/rawip.html

but could not get it to work (that was with OSX on PPC G4 and FreeBSD on Intel x368), even the example code. Perhaps it was some simple thing, I missed. What I need is a working C example.
Yeah, the stuff from mixter is total crap. He misses about 1/2 the headers you need, includes headers you don't need and there are many mistakes in the code. I have several working examples. I will try to put something together that makes sense. It won't be pretty, but it will at least be functional (i.e. it will compile and run properly, without a hundred hours of tweaking and research).
. Kanen Flowers http://kanen.me .

Locked