Page 1 of 1
date + parse problems
Posted: Fri Mar 25, 2011 8:03 pm
by kanen
I use parse all over the place in my code. I recently ran into a very strange problem with parse.
Code: Select all
> (set 'x 1301073325)
1301073325
> (date x)
"Fri Mar 25 10:15:25 2011"
> (parse (date x))
("Fri" "Mar" "25" "10" ":" "15" ":" "25" "2011")
Looks good. Parses correctly, but when the time changes...
Code: Select all
> (set 'x 1301071976)
1301071976
> (date x)
"Fri Mar 25 09:52:56 2011"
> (parse (date x))
("Fri" "Mar" "25" "0" "9" ":" "52" ":" "56" "2011")
Notice "09:52" is being parsed as "0" "9" ":" "52" instead of "09" ":" "52"
I am running newLisp 10.3.0
Re: date + parse problems
Posted: Fri Mar 25, 2011 8:38 pm
by Sammo
You'll find that parse correctly handles "Fri Mar 25 07:52:56 2011" and other times with hours less than 8 am, but not "Fri Mar 25 08:52:56 2011". For an extreme case, try "Fri Mar 25 08:08:08 2011". I'll bet this is caused by newLISP's number parser which treats digit strings beginning with "0" as octal and breaking the parse on non-octal digits such as "8" and "9".
Re: date + parse problems
Posted: Fri Mar 25, 2011 9:35 pm
by newdep
Sammo wrote:You'll find that parse correctly handles "Fri Mar 25 07:52:56 2011" and other times with hours less than 8 am, but not "Fri Mar 25 08:52:56 2011". For an extreme case, try "Fri Mar 25 08:08:08 2011". I'll bet this is caused by newLISP's number parser which treats digit strings beginning with "0" as octal and breaking the parse on non-octal digits such as "8" and "9".
does this help ?
(parse "Fri Mar 30 20:08:01 2011" {:\s*|\s+} 0 )
("Fri" "Mar" "30" "20" "08" "01" "2011")
Re: date + parse problems
Posted: Fri Mar 25, 2011 10:14 pm
by newdep
I think you found an issue there ;-)
Its indeed odd, i tried several options but this one is an odd one indeed..
("Fri" "Mar" "30" "0" "9" ":" "52" ":" "56" "2011")
Nice spotting!
Re: date + parse problems
Posted: Fri Mar 25, 2011 10:24 pm
by newdep
I think its a HEX issue...
it only happens from 08 - 0F
> (parse "0F:02:56" )
("0" "F" "02" ":" "56")
> (parse "0F:08:56" )
("0" "F" "0" "8" ":" "56")
> (parse "0F:A0:56" )
("0" "F" "A0" "56")
> (parse "0F:0F:56" )
("0" "F" "0" "F" "56")
and this is even odder..
> (parse ":0F:0F:56" )
(":" "0" "F" "0" "F" "56")
Re: date + parse problems
Posted: Sat Mar 26, 2011 2:30 am
by Lutz
If you use 'parse' without the second parameter, 'parse' uses the algorithm to parse newLISP source. In your example, I would simply add the ":" as separator string and you get the expected results:
Code: Select all
> (parse "0F:02:56" ":" )
("0F" "02" "56")
>
or use regular expressions like Sammo is suggesting for Kane's example.
look also into this:
Code: Select all
(date-list (date-parse "2010.10.18 7:00" "%Y.%m.%d %H:%M"))
→ (2010 10 18 7 0 0 290 1)
to split specific date formats in components.
Re: date + parse problems
Posted: Sat Mar 26, 2011 4:38 am
by kanen
Regular expressions are the answer, however...
I consider this a bug. I do not think (parse) should see anything above "08" and to "0F" as a HEX string to be split, unless you specifically ask for the string to be split as hex. This also breaks all over the place in my system because I'm actually parsing HEX in a lot of places (for TrustPipe).
Your example below doesn't actually solve the problem I'm having because I have both spaces and colons in my example.
It just seems that, if I have the following example strings (below), the results are unexpected.
Code: Select all
> (parse "0F000801")
("0" "F000801")
> (parse "0F 00 08 01")
("0" "F" "00" "0" "8" "01")
Lutz wrote:If you use 'parse' without the second parameter, 'parse' uses the algorithm to parse newLISP source. In your example, I would simply add the ":" as separator string and you get the expected results:
Code: Select all
> (parse "0F:02:56" ":" )
("0F" "02" "56")
>
or use regular expressions like Sammo is suggesting for Kane(n)'s example.
Re: date + parse problems
Posted: Sat Mar 26, 2011 9:23 am
by cormullion
Ha, I've done this too. It's a well-known pitfall - I even wrote about it, since I spent such a long time looking for a bug that appeared after the code ran perfectly for 10 months ...
http://newlisper.wordpress.com/2006/09/18/my-mistake-2/
It's sensible to make the default action for
parse use newLISP syntax - what would be a better choice? Spaces, with or without tabs and/or returns and/or linefeeds? What about hyphens and quotes? So it makes sense to leave the precise specification of the parsing to the programmer. Be warned that using the default parse also eliminates semicolon-headed strings as well...
BTW - I once tried to write an 'intelligent' date-parser:
Code: Select all
> (ParseTime "Fri Mar 25 10:15:25 2011")
((2011 3 25 10 15 25))
> (ParseTime "Fri Mar 25 09:52:56 2011")
((2011 3 25 9 52 56))
> (ParseTime "Fri Mar 25 9:52:56 2011")
((2011 3 25 9 52 56))
> (ParseTime "Fri March 08 09:08:56 2011")
((2011 3 8 9 8 56))
> (ParseTime "Tuesday, March 08, 2011 3:51 PM")
((2011 3 8 15 51 0))
> (ParseTime "Tuesday, March 08, 2011 09:08:56")
((2011 3 8 21 8 56))
> (ParseTime "Wednesday, March 08, 2011 09:08:56")
((2011 3 8 21 8 56))
> (ParseTime "Wednesday, March 08, 2011 9:08:56 PM")
((2011 3 8 21 8 56))
but as you see, it's a hard problem... too hard for me, anyway :)
Re: date + parse problems
Posted: Sat Mar 26, 2011 10:02 am
by newdep
I do think its inconsistant..using a regular parse.. Its about the 08 - 0F thats makes it odd..
I do understand the logic of the octal here but if you dont know this then the result is not as expected.
And the parse description in the manual does not say anything about this eighter..
Re: date + parse problems
Posted: Sat Mar 26, 2011 4:10 pm
by Lutz
Under the premise that 'parse' without the second parameter behaves like the newLISP source parser, there is nothing unexpected in the behavior of 'parse'. The confusion arises when octal numbers are discovered. See the same examples changing to octal:
Code: Select all
> (parse "0F000801")
("0" "F000801")
> (parse "06000801") ; change F to a valid octal
("06000" "801")
> (parse "06000701") ; change 8 to valid octal
("06000701")
>
> (parse "0F 00 08 01")
("0" "F" "00" "0" "8" "01")
> (parse "06 00 07 01") ; all string are valid octal
("06" "00" "07" "01")
>
This
octal confusion is a well known phenomenon in any programming language, because virtually all of them follow the same rules when parsing numbers: Numbers have certain valid start characters and and the parser ends them and restarts when an invalid character for that specific number format is found - octal numbers start with a '0' -
As Cormullion mentions: "what would be a better choice?" (for the default parse behavior). There are just too much possibilities, therefore I think that newLISP-source parsing as a default is a sensible choice. Yes, perhaps adding something in the manual will alleviate the confusion. Currently 'parse' mentions "newLISP parsing rules" in the description. Perhaps a chapter about "newLISP parsing rules" has to be added and the description of 'parse' would link to it.
Last not least: in many cases where parse is used, 'find-all' would be the better choice. While 'parse' takes break strings or regex expressions in the optional second parameter, 'find-all' describes the tokens itself:
Code: Select all
> (find-all "[^:]+" "0F:02:56")
("0F" "02" "56")
Re: date + parse problems
Posted: Sat Mar 26, 2011 5:42 pm
by kanen
A better choice would be to parse as a string, including all the normal items (space, colon, etc.) and not do the hex/octal parsing by default.
I do like the find-all example below and believe you are absolutely right and I should change my habits to use find-all with regex.
Lutz wrote:As Cormullion mentions: "what would be a better choice?" (for the default parse behavior). There are just too much possibilities, therefore I think that newLISP-source parsing as a default is a sensible choice. Yes, perhaps adding something in the manual will alleviate the confusion. Currently 'parse' mentions "newLISP parsing rules" in the description. Perhaps a chapter about "newLISP parsing rules" has to be added and the description of 'parse' would link to it.
Last not least: in many cases where parse is used, 'find-all' would be the better choice. While 'parse' takes break strings or regex expressions in the optional second parameter, 'find-all' describes the tokens itself:
Code: Select all
> (find-all "[^:]+" "0F:02:56")
("0F" "02" "56")