(parse) oddness

kanen · Post by **kanen** » Tue Apr 20, 2010 8:39 pm

I have a problem, where parse is returning an extra item.

I have attached code for this problem, with comments.

;; sites-sm.txt
[text] 
;; copy below to sites-sm.txt
1  google.com
2  facebook.com
3  yahoo.com
4  youtube.com
5  live.com
6  wikipedia.org
7  blogger.com
8  baidu.com
9  msn.com
10 yahoo.co.jp
[/text]

(set 'sites (parse (read-file "sites-sm.txt") "\n"))
;; ("1\tgoogle.com" "2\tfacebook.com" "3\tyahoo.com" "4\tyoutube.com" i
;; "5\tlive.com" "6\twikipedia.org" "7\tblogger.com" "8\tbaidu.com" 
;; "9\tmsn.com" "10\tyahoo.co.jp" "")

(println "Sites has " (length sites) " entries") ; -> 
;; Sites has 11 entries

(println (slice (sites 0) (+ 1 (find "\t" (sites 0))) ) ) 
;; google.com

(dolist (x sites)
   (println (slice x (+ 1 (find "\t" x)) ) ) )
;; prints: google.com ... yahoo.co.jp
;;
;; ERR: value expected : (find "\t" x)

(exit)

As you can see, parse returns everything in the file, plus an extra "" item in the list.

This causes everything from the parse to start failing, for various reasons.

* 11 items, when there are only 10 in the list (or, should only be 10)
* The dolist fails on the "" item, because there is no "\t" character to find

Obviously, I can correct this with something like:

Code: Select all

(dolist (x sites)
   (if (> (length x) 0) (push x newsites))
)  
(set 'sites newsites)

I feel like I am missing something fundamental here, but in my mind, adding this extra list item, with a blank list, seems like a bug or a failure. Can someone help me understand this issue clearly?

Sammo · Post by **Sammo** » Tue Apr 20, 2010 10:27 pm

A trailing "\n" in "sites-sm.txt" is the culprit. Try removing the trailing newlines before parsing:

Code: Select all

(set 'sites (parse (trim (read-file "sites-sm.txt") "" "\n") "\n"))

Edit: Corrected typo

kanen · Post by **kanen** » Tue Apr 20, 2010 10:42 pm

I thought the same thing, but... unless I completely misunderstand how (parse) works, I can tell you there is no trailing newlines in the file.

Opening it in vim or another editor shows only a newline after each entry, but not an extra, trailing newline.

Yes, your solution fixes the problem, but the problem does not actually exist because there's no blank (or trailing) newline at the end of the file.

Am I still missing something?

Sammo wrote:A trailing "\n" in "sites-sm.txt" is the culprit. Try removing the trailing newlines before parsing:
Code: Select all
(set 'sites (parse (trim (read-file "sites-sm.txt") "" "\n") "\n"))
Edit: Corrected typo

Sammo · Post by **Sammo** » Tue Apr 20, 2010 10:48 pm

If there isn't a trailing "\n" in "sites-sm.txt", then is one being appended in (read-file "sites-sm.txt")? It seems that it would have to be in order for trim to correct the problem.
-- Sam

kanen · Post by **kanen** » Tue Apr 20, 2010 11:06 pm

Makes sense as to what is happening - (read-file) adding a trailing newline.

I consider this to be a bug, though, unless someone can correct my thinking.

Perhaps I should be reading the file differently?

Sammo wrote:If there isn't a trailing "\n" in "sites-sm.txt", then is one being appended in (read-file "sites-sm.txt")? It seems that it would have to be in order for trim to correct the problem.
-- Sam

Sammo · Post by **Sammo** » Wed Apr 21, 2010 3:15 am

On Win XP running newLisp 10.2.1, I dup'd your file (complete with \n instead of \r\n) but could not duplicate your symptom.

Code: Select all

(parse (read-file {C:\MyDirectory\sites-sm.txt}) "\n")

returns

Code: Select all

("1\tgoogle.com" "2\tfacebook.com" "3\tyahoo.com" "4\tyoutube.com" "5\tlive.com" 
 "6\twikipedia.org" "7\tblogger.com" "8\tbaidu.com" "9\tmsn.com" "10\tyahoo.co.jp")

The problem is either in a newLisp version later than 10.2.1 or in a non-Windows version.

Lutz · Post by **Lutz** » Wed Apr 21, 2010 7:07 am

Like on Win XP with 10.2.1 it runs correctly on UNIX too with all versions of newLISP.

Vi adds extra line feed character(s) on Windows and on UNIX, even if not typed in and not visible in vi.

But instead of stripping the extra trailing line-feed, here is another way to parse using 'find-all':

Code: Select all

> (find-all {[^\n]+} (read-file "example.txt"))
("1  google.com" "2  facebook.com" "3  yahoo.com" "4  youtube.com" "5  live.com"
 "6  wikipedia.org" "7  blogger.com" "8  baidu.com" "9  msn.com" "10 yahoo.co.jp")

while 'parse' defines the border between items, the regex in 'find-all' defines the item content. I used curl braces instead of quotes in the above example, so I don't have to double escape the line-feed character.

'find-all' also takes extra options, which let you process each item found, and by default always uses regular expressions, which 'parse' does not.

Regarding a raw-packet option, from another post:

Kanen wrote:Now, if I could just convince you to create a (raw-packet) option in newLISP, my problems would be solved. ;P

I tried to create a 'net-packet' a few years back and used example code from here:

http://mixter.void.ru/rawip.html

but could not get it to work (that was with OSX on PPC G4 and FreeBSD on Intel x368), even the example code. Perhaps it was some simple thing, I missed. What I need is a working C example.

kanen · Post by **kanen** » Wed Apr 21, 2010 8:27 am

Lutz wrote:Like on Win XP with 10.2.1 it runs correctly on UNIX too with all versions of newLISP.

Vi adds extra line feed character(s) on Windows and on UNIX, even if not typed in and not visible in vi.

Yes it does, because vi added a ":set binary" or a "-b" command-line option. This might be enough to migrate me to vile, just to not deal with it. Frustrating!

Lutz wrote:But instead of stripping the extra trailing line-feed, here is another way to parse using 'find-all':
Code: Select all
> (find-all {[^\n]+} (read-file "example.txt"))
("1  google.com" "2  facebook.com" "3  yahoo.com" "4  youtube.com" "5  live.com"
 "6  wikipedia.org" "7  blogger.com" "8  baidu.com" "9  msn.com" "10 yahoo.co.jp")

Fantastic. Thanks for the easy (and appreciated) example. Makes my life less complicated.

Lutz wrote:while 'parse' defines the border between items, the regex in 'find-all' defines the item content. I used curl braces instead of quotes in the above example, so I don't have to double escape the line-feed character.

'find-all' also takes extra options, which let you process each item found, and by default always uses regular expressions, which 'parse' does not.

Regarding a raw-packet option, from another post:

Kanen wrote:Now, if I could just convince you to create a (raw-packet) option in newLISP, my problems would be solved. ;P
I tried to create a 'net-packet' a few years back and used example code from here:

http://mixter.void.ru/rawip.html

but could not get it to work (that was with OSX on PPC G4 and FreeBSD on Intel x368), even the example code. Perhaps it was some simple thing, I missed. What I need is a working C example.

Yeah, the stuff from mixter is total crap. He misses about 1/2 the headers you need, includes headers you don't need and there are many mistakes in the code. I have several working examples. I will try to put something together that makes sense. It won't be pretty, but it will at least be functional (i.e. it will compile and run properly, without a hundred hours of tweaking and research).

newlispfanclub.alh.net

(parse) oddness

(parse) oddness

Re: (parse) oddness

Re: (parse) oddness

Re: (parse) oddness

Re: (parse) oddness

Re: (parse) oddness

Re: (parse) oddness

Re: (parse) oddness