date + parse problems

Q&A's, tips, howto's
Locked
kanen
Posts: 145
Joined: Thu Mar 25, 2010 6:24 pm
Contact:

date + parse problems

Post by kanen »

I use parse all over the place in my code. I recently ran into a very strange problem with parse.

Code: Select all

> (set 'x 1301073325)
1301073325
> (date x)
"Fri Mar 25 10:15:25 2011"
> (parse (date x))
("Fri" "Mar" "25" "10" ":" "15" ":" "25" "2011")
Looks good. Parses correctly, but when the time changes...

Code: Select all

> (set 'x 1301071976)
1301071976
> (date x)
"Fri Mar 25 09:52:56 2011"
> (parse (date x))
("Fri" "Mar" "25" "0" "9" ":" "52" ":" "56" "2011")
Notice "09:52" is being parsed as "0" "9" ":" "52" instead of "09" ":" "52"

I am running newLisp 10.3.0
. Kanen Flowers http://kanen.me .

Sammo
Posts: 180
Joined: Sat Dec 06, 2003 6:11 pm
Location: Loveland, Colorado USA

Re: date + parse problems

Post by Sammo »

You'll find that parse correctly handles "Fri Mar 25 07:52:56 2011" and other times with hours less than 8 am, but not "Fri Mar 25 08:52:56 2011". For an extreme case, try "Fri Mar 25 08:08:08 2011". I'll bet this is caused by newLISP's number parser which treats digit strings beginning with "0" as octal and breaking the parse on non-octal digits such as "8" and "9".

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Re: date + parse problems

Post by newdep »

Sammo wrote:You'll find that parse correctly handles "Fri Mar 25 07:52:56 2011" and other times with hours less than 8 am, but not "Fri Mar 25 08:52:56 2011". For an extreme case, try "Fri Mar 25 08:08:08 2011". I'll bet this is caused by newLISP's number parser which treats digit strings beginning with "0" as octal and breaking the parse on non-octal digits such as "8" and "9".

does this help ?

(parse "Fri Mar 30 20:08:01 2011" {:\s*|\s+} 0 )

("Fri" "Mar" "30" "20" "08" "01" "2011")
-- (define? (Cornflakes))

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Re: date + parse problems

Post by newdep »

I think you found an issue there ;-)
Its indeed odd, i tried several options but this one is an odd one indeed..

("Fri" "Mar" "30" "0" "9" ":" "52" ":" "56" "2011")

Nice spotting!
-- (define? (Cornflakes))

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Re: date + parse problems

Post by newdep »

I think its a HEX issue...

it only happens from 08 - 0F

> (parse "0F:02:56" )
("0" "F" "02" ":" "56")

> (parse "0F:08:56" )
("0" "F" "0" "8" ":" "56")
> (parse "0F:A0:56" )
("0" "F" "A0" "56")
> (parse "0F:0F:56" )
("0" "F" "0" "F" "56")

and this is even odder..

> (parse ":0F:0F:56" )
(":" "0" "F" "0" "F" "56")
-- (define? (Cornflakes))

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Re: date + parse problems

Post by Lutz »

If you use 'parse' without the second parameter, 'parse' uses the algorithm to parse newLISP source. In your example, I would simply add the ":" as separator string and you get the expected results:

Code: Select all

> (parse "0F:02:56" ":" )
("0F" "02" "56")
> 
or use regular expressions like Sammo is suggesting for Kane's example.

look also into this:

Code: Select all

(date-list (date-parse "2010.10.18 7:00" "%Y.%m.%d %H:%M"))
→ (2010 10 18 7 0 0 290 1)
to split specific date formats in components.

kanen
Posts: 145
Joined: Thu Mar 25, 2010 6:24 pm
Contact:

Re: date + parse problems

Post by kanen »

Regular expressions are the answer, however...

I consider this a bug. I do not think (parse) should see anything above "08" and to "0F" as a HEX string to be split, unless you specifically ask for the string to be split as hex. This also breaks all over the place in my system because I'm actually parsing HEX in a lot of places (for TrustPipe).

Your example below doesn't actually solve the problem I'm having because I have both spaces and colons in my example.

It just seems that, if I have the following example strings (below), the results are unexpected.

Code: Select all

> (parse "0F000801")
("0" "F000801")
> (parse "0F 00 08 01")
("0" "F" "00" "0" "8" "01")
Lutz wrote:If you use 'parse' without the second parameter, 'parse' uses the algorithm to parse newLISP source. In your example, I would simply add the ":" as separator string and you get the expected results:

Code: Select all

> (parse "0F:02:56" ":" )
("0F" "02" "56")
> 
or use regular expressions like Sammo is suggesting for Kane(n)'s example.
. Kanen Flowers http://kanen.me .

cormullion
Posts: 2038
Joined: Tue Nov 29, 2005 8:28 pm
Location: latiitude 50N longitude 3W
Contact:

Re: date + parse problems

Post by cormullion »

Ha, I've done this too. It's a well-known pitfall - I even wrote about it, since I spent such a long time looking for a bug that appeared after the code ran perfectly for 10 months ... http://newlisper.wordpress.com/2006/09/18/my-mistake-2/

It's sensible to make the default action for parse use newLISP syntax - what would be a better choice? Spaces, with or without tabs and/or returns and/or linefeeds? What about hyphens and quotes? So it makes sense to leave the precise specification of the parsing to the programmer. Be warned that using the default parse also eliminates semicolon-headed strings as well...

BTW - I once tried to write an 'intelligent' date-parser:

Code: Select all

> (ParseTime "Fri Mar 25 10:15:25 2011")
((2011 3 25 10 15 25))
> (ParseTime "Fri Mar 25 09:52:56 2011")
((2011 3 25 9 52 56))
> (ParseTime "Fri Mar 25 9:52:56 2011")
((2011 3 25 9 52 56))
> (ParseTime "Fri March 08 09:08:56 2011")
((2011 3 8 9 8 56))
 > (ParseTime "Tuesday, March 08, 2011 3:51 PM")
((2011 3 8 15 51 0))
> (ParseTime "Tuesday, March 08, 2011 09:08:56")
((2011 3 8 21 8 56))
> (ParseTime "Wednesday, March 08, 2011 09:08:56")
((2011 3 8 21 8 56))
> (ParseTime "Wednesday, March 08, 2011 9:08:56 PM")
((2011 3 8 21 8 56))
but as you see, it's a hard problem... too hard for me, anyway :)

newdep
Posts: 2038
Joined: Mon Feb 23, 2004 7:40 pm
Location: Netherlands

Re: date + parse problems

Post by newdep »

I do think its inconsistant..using a regular parse.. Its about the 08 - 0F thats makes it odd..
I do understand the logic of the octal here but if you dont know this then the result is not as expected.
And the parse description in the manual does not say anything about this eighter..
-- (define? (Cornflakes))

Lutz
Posts: 5289
Joined: Thu Sep 26, 2002 4:45 pm
Location: Pasadena, California
Contact:

Re: date + parse problems

Post by Lutz »

Under the premise that 'parse' without the second parameter behaves like the newLISP source parser, there is nothing unexpected in the behavior of 'parse'. The confusion arises when octal numbers are discovered. See the same examples changing to octal:

Code: Select all

> (parse "0F000801")
("0" "F000801")

> (parse "06000801") ; change F to a valid octal
("06000" "801")

> (parse "06000701") ; change 8 to valid octal
("06000701")
> 

> (parse "0F 00 08 01")
("0" "F" "00" "0" "8" "01")

> (parse "06 00 07 01") ; all string are valid octal
("06" "00" "07" "01")
> 
This octal confusion is a well known phenomenon in any programming language, because virtually all of them follow the same rules when parsing numbers: Numbers have certain valid start characters and and the parser ends them and restarts when an invalid character for that specific number format is found - octal numbers start with a '0' -

As Cormullion mentions: "what would be a better choice?" (for the default parse behavior). There are just too much possibilities, therefore I think that newLISP-source parsing as a default is a sensible choice. Yes, perhaps adding something in the manual will alleviate the confusion. Currently 'parse' mentions "newLISP parsing rules" in the description. Perhaps a chapter about "newLISP parsing rules" has to be added and the description of 'parse' would link to it.

Last not least: in many cases where parse is used, 'find-all' would be the better choice. While 'parse' takes break strings or regex expressions in the optional second parameter, 'find-all' describes the tokens itself:

Code: Select all

> (find-all  "[^:]+" "0F:02:56")
("0F" "02" "56")

kanen
Posts: 145
Joined: Thu Mar 25, 2010 6:24 pm
Contact:

Re: date + parse problems

Post by kanen »

A better choice would be to parse as a string, including all the normal items (space, colon, etc.) and not do the hex/octal parsing by default.

I do like the find-all example below and believe you are absolutely right and I should change my habits to use find-all with regex.
Lutz wrote:As Cormullion mentions: "what would be a better choice?" (for the default parse behavior). There are just too much possibilities, therefore I think that newLISP-source parsing as a default is a sensible choice. Yes, perhaps adding something in the manual will alleviate the confusion. Currently 'parse' mentions "newLISP parsing rules" in the description. Perhaps a chapter about "newLISP parsing rules" has to be added and the description of 'parse' would link to it.

Last not least: in many cases where parse is used, 'find-all' would be the better choice. While 'parse' takes break strings or regex expressions in the optional second parameter, 'find-all' describes the tokens itself:

Code: Select all

> (find-all  "[^:]+" "0F:02:56")
("0F" "02" "56")
. Kanen Flowers http://kanen.me .

Locked