date + parse problems

kanen · Post by **kanen** » Fri Mar 25, 2011 8:03 pm

I use parse all over the place in my code. I recently ran into a very strange problem with parse.

> (set 'x 1301073325)
1301073325
> (date x)
"Fri Mar 25 10:15:25 2011"
> (parse (date x))
("Fri" "Mar" "25" "10" ":" "15" ":" "25" "2011")

Looks good. Parses correctly, but when the time changes...

Code: Select all

> (set 'x 1301071976)
1301071976
> (date x)
"Fri Mar 25 09:52:56 2011"
> (parse (date x))
("Fri" "Mar" "25" "0" "9" ":" "52" ":" "56" "2011")

Notice "09:52" is being parsed as "0" "9" ":" "52" instead of "09" ":" "52"

I am running newLisp 10.3.0

Sammo · Post by **Sammo** » Fri Mar 25, 2011 8:38 pm

You'll find that parse correctly handles "Fri Mar 25 07:52:56 2011" and other times with hours less than 8 am, but not "Fri Mar 25 08:52:56 2011". For an extreme case, try "Fri Mar 25 08:08:08 2011". I'll bet this is caused by newLISP's number parser which treats digit strings beginning with "0" as octal and breaking the parse on non-octal digits such as "8" and "9".

newdep · Post by **newdep** » Fri Mar 25, 2011 9:35 pm

Sammo wrote:You'll find that parse correctly handles "Fri Mar 25 07:52:56 2011" and other times with hours less than 8 am, but not "Fri Mar 25 08:52:56 2011". For an extreme case, try "Fri Mar 25 08:08:08 2011". I'll bet this is caused by newLISP's number parser which treats digit strings beginning with "0" as octal and breaking the parse on non-octal digits such as "8" and "9".

does this help ?

(parse "Fri Mar 30 20:08:01 2011" {:\s*|\s+} 0 )

("Fri" "Mar" "30" "20" "08" "01" "2011")

newdep · Post by **newdep** » Fri Mar 25, 2011 10:14 pm

I think you found an issue there ;-)
Its indeed odd, i tried several options but this one is an odd one indeed..

("Fri" "Mar" "30" "0" "9" ":" "52" ":" "56" "2011")

Nice spotting!

newdep · Post by **newdep** » Fri Mar 25, 2011 10:24 pm

I think its a HEX issue...

it only happens from 08 - 0F

> (parse "0F:02:56" )
("0" "F" "02" ":" "56")

> (parse "0F:08:56" )
("0" "F" "0" "8" ":" "56")
> (parse "0F:A0:56" )
("0" "F" "A0" "56")
> (parse "0F:0F:56" )
("0" "F" "0" "F" "56")

and this is even odder..

> (parse ":0F:0F:56" )
(":" "0" "F" "0" "F" "56")

Lutz · Post by **Lutz** » Sat Mar 26, 2011 2:30 am

If you use 'parse' without the second parameter, 'parse' uses the algorithm to parse newLISP source. In your example, I would simply add the ":" as separator string and you get the expected results:

Code: Select all

> (parse "0F:02:56" ":" )
("0F" "02" "56")
>

or use regular expressions like Sammo is suggesting for Kane's example.

look also into this:

Code: Select all

(date-list (date-parse "2010.10.18 7:00" "%Y.%m.%d %H:%M"))
→ (2010 10 18 7 0 0 290 1)

to split specific date formats in components.

kanen · Post by **kanen** » Sat Mar 26, 2011 4:38 am

Regular expressions are the answer, however...

I consider this a bug. I do not think (parse) should see anything above "08" and to "0F" as a HEX string to be split, unless you specifically ask for the string to be split as hex. This also breaks all over the place in my system because I'm actually parsing HEX in a lot of places (for TrustPipe).

Your example below doesn't actually solve the problem I'm having because I have both spaces and colons in my example.

It just seems that, if I have the following example strings (below), the results are unexpected.

Code: Select all

> (parse "0F000801")
("0" "F000801")
> (parse "0F 00 08 01")
("0" "F" "00" "0" "8" "01")

Lutz wrote:If you use 'parse' without the second parameter, 'parse' uses the algorithm to parse newLISP source. In your example, I would simply add the ":" as separator string and you get the expected results:
Code: Select all
> (parse "0F:02:56" ":" )
("0F" "02" "56")
> 
or use regular expressions like Sammo is suggesting for Kane(n)'s example.

cormullion · Post by **cormullion** » Sat Mar 26, 2011 9:23 am

Ha, I've done this too. It's a well-known pitfall - I even wrote about it, since I spent such a long time looking for a bug that appeared after the code ran perfectly for 10 months ... http://newlisper.wordpress.com/2006/09/18/my-mistake-2/

It's sensible to make the default action for parse use newLISP syntax - what would be a better choice? Spaces, with or without tabs and/or returns and/or linefeeds? What about hyphens and quotes? So it makes sense to leave the precise specification of the parsing to the programmer. Be warned that using the default parse also eliminates semicolon-headed strings as well...

BTW - I once tried to write an 'intelligent' date-parser:

Code: Select all

> (ParseTime "Fri Mar 25 10:15:25 2011")
((2011 3 25 10 15 25))
> (ParseTime "Fri Mar 25 09:52:56 2011")
((2011 3 25 9 52 56))
> (ParseTime "Fri Mar 25 9:52:56 2011")
((2011 3 25 9 52 56))
> (ParseTime "Fri March 08 09:08:56 2011")
((2011 3 8 9 8 56))
 > (ParseTime "Tuesday, March 08, 2011 3:51 PM")
((2011 3 8 15 51 0))
> (ParseTime "Tuesday, March 08, 2011 09:08:56")
((2011 3 8 21 8 56))
> (ParseTime "Wednesday, March 08, 2011 09:08:56")
((2011 3 8 21 8 56))
> (ParseTime "Wednesday, March 08, 2011 9:08:56 PM")
((2011 3 8 21 8 56))

but as you see, it's a hard problem... too hard for me, anyway :)

newdep · Post by **newdep** » Sat Mar 26, 2011 10:02 am

I do think its inconsistant..using a regular parse.. Its about the 08 - 0F thats makes it odd..
I do understand the logic of the octal here but if you dont know this then the result is not as expected.
And the parse description in the manual does not say anything about this eighter..

Lutz · Post by **Lutz** » Sat Mar 26, 2011 4:10 pm

Under the premise that 'parse' without the second parameter behaves like the newLISP source parser, there is nothing unexpected in the behavior of 'parse'. The confusion arises when octal numbers are discovered. See the same examples changing to octal:

Code: Select all

> (parse "0F000801")
("0" "F000801")

> (parse "06000801") ; change F to a valid octal
("06000" "801")

> (parse "06000701") ; change 8 to valid octal
("06000701")
> 

> (parse "0F 00 08 01")
("0" "F" "00" "0" "8" "01")

> (parse "06 00 07 01") ; all string are valid octal
("06" "00" "07" "01")
>

This octal confusion is a well known phenomenon in any programming language, because virtually all of them follow the same rules when parsing numbers: Numbers have certain valid start characters and and the parser ends them and restarts when an invalid character for that specific number format is found - octal numbers start with a '0' -

As Cormullion mentions: "what would be a better choice?" (for the default parse behavior). There are just too much possibilities, therefore I think that newLISP-source parsing as a default is a sensible choice. Yes, perhaps adding something in the manual will alleviate the confusion. Currently 'parse' mentions "newLISP parsing rules" in the description. Perhaps a chapter about "newLISP parsing rules" has to be added and the description of 'parse' would link to it.

Last not least: in many cases where parse is used, 'find-all' would be the better choice. While 'parse' takes break strings or regex expressions in the optional second parameter, 'find-all' describes the tokens itself:

Code: Select all

> (find-all  "[^:]+" "0F:02:56")
("0F" "02" "56")

kanen · Post by **kanen** » Sat Mar 26, 2011 5:42 pm

A better choice would be to parse as a string, including all the normal items (space, colon, etc.) and not do the hex/octal parsing by default.

I do like the find-all example below and believe you are absolutely right and I should change my habits to use find-all with regex.

Lutz wrote:As Cormullion mentions: "what would be a better choice?" (for the default parse behavior). There are just too much possibilities, therefore I think that newLISP-source parsing as a default is a sensible choice. Yes, perhaps adding something in the manual will alleviate the confusion. Currently 'parse' mentions "newLISP parsing rules" in the description. Perhaps a chapter about "newLISP parsing rules" has to be added and the description of 'parse' would link to it.

Last not least: in many cases where parse is used, 'find-all' would be the better choice. While 'parse' takes break strings or regex expressions in the optional second parameter, 'find-all' describes the tokens itself:
Code: Select all
> (find-all  "[^:]+" "0F:02:56")
("0F" "02" "56")

newlispfanclub.alh.net

date + parse problems

date + parse problems

Re: date + parse problems

Re: date + parse problems

Re: date + parse problems

Re: date + parse problems

Re: date + parse problems

Re: date + parse problems

Re: date + parse problems

Re: date + parse problems

Re: date + parse problems

Re: date + parse problems