baby steps...

Q&A's, tips, howto's
Locked
tom
Posts: 168
Joined: Wed Jul 14, 2004 10:32 pm

baby steps...

Post by tom »

May I see some trivial examples of newlisp in action?

w3m is a text-mode browser that handles tables well. You can
effectively strip the tags from an html file, while preserving (pretty
much) all the whitespace caused by the tags.

Code: Select all

$ w3m file.html > file.txt
How can I loop through a
directory full of files.html converting them all to files.txt, using
w3m, 1. in the same directory 2. in a different, new directory

Thanks, off to read the manual!

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

you can do the following:

(dolist (fle (directory)) (exec (append "wm3 " fle " > " fle ".txt")))


The 'directory' function can also take a parameter for an argument.

Lutz

nigelbrown
Posts: 429
Joined: Tue Nov 11, 2003 2:11 am
Location: Brisbane, Australia

Post by nigelbrown »

I've not fully tested it but it worked once:

(define (htm2txt indir outdir)
(map (fn (x) (exec (append "w3m " indir (first x) " > " outdir (nth 3 x) ".txt")))
(filter 'list? (map (fn (x) (regex "(.*)\.html*$" x 1)) (directory indir)))))

indir and out dir should end in /
eg
(htm2txt "./" "/tmp/")


Nigel
Last edited by nigelbrown on Mon Jul 19, 2004 4:50 am, edited 1 time in total.

nigelbrown
Posts: 429
Joined: Tue Nov 11, 2003 2:11 am
Location: Brisbane, Australia

Post by nigelbrown »

Seeing the thread was 'baby steps' I'll step through the code:

Directory returns a list of files in the directory
> (directory "./")
("." ".." "3.htm" "bab-ubhelp.htm" "exlfus.htm" "gtk-server.log"
"images" "License.txt" "models" "quit.blend" "Readme.txt" "stdout.txt"
"test.html")
>

next map applies a regex to the directory file list. The regex matches .htm or .html and also extracts the filename upto that extension due to the brackets in the regex. Nil is returned if that extension is not found. The total result comes back as a list with nils or sublists:
> (map (fn (x) (regex "(.*)\.html*$" x 1)) (directory "./"))
(nil nil ("3.htm" 0 5 "3" 0 1) ("bab-ubhelp.htm" 0 14 "bab-ubhelp"
0 10)
("exlfus.htm" 0 10 "exlfus" 0 6) nil nil nil nil nil nil nil
("test.html" 0 9 "test" 0 4))
>

filtering by whether it is a list removes the nils:
> (filter 'list? (map (fn (x) (regex "(.*)\.html*$" x 1)) (directory "./")))
(("3.htm" 0 5 "3" 0 1) ("bab-ubhelp.htm" 0 14 "bab-ubhelp" 0 10)
("exlfus.htm" 0 10 "exlfus" 0 6)
("test.html" 0 9 "test" 0 4))
>
the next steps maps the anonymously defined function:
(fn (x) (exec (append "w3m " indir (first x) " > " outdir (nth 3 x) ".txt")))
which looks at a sublist viz ("test.html" 0 9 "test" 0 4)) and uses
first to get the full htm(l) file name eg
> (first '("test.html" 0 9 "test" 0 4))
"test.html"
>
and nth to get position 4 which is the name only extracted
-nth indexes start at zero- eg
> (nth 3 '("test.html" 0 9 "test" 0 4))
"test"
>
then the function uses append to make a final string with directories which is then passed to exec for execution.
Map applies this function to all of the list that survived filter.

The define, of course, bundles this all up into a callable function.

Hope this makes it clear. dolist is another approach but I'm more used to map.

Regards
Nigel

PS I was not aware to w3m prior to this but it is a useful addition to my linux setup (easier to install from binary rpm as the gc6 dependencies were a bit tricky when trying to build from source)

nigelbrown
Posts: 429
Joined: Tue Nov 11, 2003 2:11 am
Location: Brisbane, Australia

Post by nigelbrown »

As a further is refinement if the length of the final list returned by map is taken the number of files processed is returned Viz
(define (htm2txt indir outdir)(length (map (fn (x) (exec (append "w3m " indir (first x) " > " outdir (nth 3 x) ".txt")))(filter 'list? (map (fn (x) (regex "(.*)\.html*$" x 1)) (directory indir))))))
then
> (htm2txt "./" "./")
4


if you don't have w3m you can test the workings of the function by substituting something like type (say in windows) for w3m viz
(define (htm2txt indir outdir)(length (map (fn (x) (exec (append "type " indir (first x) " > " outdir (nth 3 x) ".txt")))(filter 'list? (map (fn (x) (regex "(.*)\.html*$" x 1)) (directory indir))))))

to generalise the function:

> (define (any2any indir outdir progname fileregex outext)
(length (map (fn (x) (exec (append progname " " indir (first x) " > " outdir (nth 3 x) outext)))(filter 'list? (map (fn (x) (regex fileregex x 1)) (directory indir))))))

thus
> (any2any "./" "./" "type" "(.*)\.html*$" ".txt")
4
>

then htm2txt can be defined as
(define (htm2txt indir outdir) (any2any indir outdir "w3m" "(.*)\.html*$" ".txt"))


Regards
Nigel

tom
Posts: 168
Joined: Wed Jul 14, 2004 10:32 pm

Post by tom »

thanks guys, I'm studying your solutions. More baby steps to follow!

tom
Posts: 168
Joined: Wed Jul 14, 2004 10:32 pm

Post by tom »

I'm putting my question here because I'm still at an
infant level in my newlisp understanding. Anyway,
I have some questions.

I would like to process the output of a command. I
think I need to put the output into a list to start
with (correct me if I'm wrong).

here's some example output. The first question is
obvious, how do I get it into a list?

(sorry, very basic stuff)

Code: Select all

~ >> pacman -Qi frozen-bubble
Name           : frozen-bubble
Version        : 1.0.0-6
Groups         : None
Packager       : Arch Linux (http://www.archlinux.org)
URL            : http://www.frozen-bubble.org
License        : None
Architecture   : i686
Size           : 12116129
Build Date     : Mon Jun  6 17:17:58 2005 UTC
Install Date   : Wed Dec 21 07:43:32 2005 UTC
Install Script : No
Reason:        : explicitly installed
Provides       : None
Depends On     : sdl_mixer sdl_perl 
Required By    : None
Conflicts With : None
Description    : A game in which you throw colorful bubbles and build
                 groups to destroy the bubbles

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Post by Lutz »

Put the command in to an 'exec' statement:

Code: Select all

(exec "pacman -Qi frozen-bubble")
All standard-out from the command output will come back in a list

Lutz

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Post by cormullion »

If you want to find a particular piece of information somewhere in the list of strings returned by, say, exec, you can use the 'replace' command.

I recently discovered that this command can do something clever - instead of just replacing the text matched by a regular expression, you can 'replace' it with another newLISP expression altogether.

(dolist (_line _list)
(replace "(Architecture.*: )(.*)" _line (println $0 "\n" $1 "\n" $2) 0))

This loops through _list, a list of strings, and when a string contains the matched regular expression it prints out the stored matches. For example, this looks for the Architecture line in your example, so that you could get the "i686" string.

There's probably many alternatives!

hope this helps - I'm an infant too!

Locked