Dmi wrote:
In fact the current algo isn't good to differ where the current closing
angle brace ">" closes "<input>", "<textarea>" or another tag.
i thought so. the problem with your current approach lies in the puny
little ">" which closes tags like "<input ...>", because a single ">"
closes any tag anywhere. i was thinking about making your code more
selective. for every tag recognized there is a list of tags it closes,
and i could add specific tags after "tag-open" to tell it which tags to
look for that will definitely going to close it. anyway, your code
defines a tag stack, which can be useful to attack the structure.
Dmi wrote:
U have several ways here:
1. Preparse an html contents to transform all "<input attrs> into
<input> attrs </input>" and so on. (replace will help U).
and use rules like
Code: Select all
(input "<input> " tag-open ())
(input/ "</input> " tag-close (input))
etc.
that's a good idea! i had already thought of preprocessing the input to
make it more legible.
Dmi wrote:
2. Preparse an html contents to transform all "<[^ ]* [^ ]*>" into "<\1>
<attrs> \2 </attrs>"
and use one universal rule:
Code: Select all
(attrs "<attrs> " tag-open ())
(attrs/ "</attrs> " tag-close (input))
And U'll have _all_ tag attributes wrapped as (attrs attribute-string)
another good one, which i could combine with the previous suggestion.
Dmi wrote:
Btw, about awk: U may redefine FS to something unusual and regexps will
works through "\n". Also U may remember the rest of "$0" of previous
iteration and concatenate it in the current one...
this one i don't understand, you'd have to explain from "remeber the
rest of $0 ..." onwards.
currently, i'm playing with the following idea:
since there are only a few tags needed for the task, i can preprocess
the input to delete all the tags that don't have to be interpreted. in
that same step i'd make the input uniform to "(parse...)" by inserting
spaces or other markers. then, only "<input...>" and "<textarea>" tags
will be left. they vary in the number and type of tags used. then
i'd run a table like the one from your code against them, but use
"(match...)" to find and extract the information i need.
since i don't control the markup on the page and it can change anytime,
i need to be flexible, but only on a small number of items.
there are a number of leads i looked up in the 'net. once i'm finished
with the parsing, i need to cook up the "multipart/form-data"
content-type for the PUT method.
thanks for your suggestions, --clemens