Java SimpleDateFormat 将问题解析为 WEKA
我发誓我使用了正确的日期格式,但在加载到 WEKA 时我不断收到解析错误。
"MonFeb2116:00:00+0000"
"EEEMMMddHH:mm:ssZ"
以下是一个示例数据集:
@RELATION example
@ATTRIBUTE tweetid STRING
@ATTRIBUTE timestamp DATE "EEEMMMddhh:mm:ssZ"
@ATTRIBUTE I NUMERIC
@ATTRIBUTE a NUMERIC
@ATTRIBUTE cool NUMERIC
@ATTRIBUTE foo NUMERIC
@ATTRIBUTE bar NUMERIC
@ATTRIBUTE temp NUMERIC
@ATTRIBUTE class {POS,NEG}
@DATA
39715973388828673,"MonFeb2116:00:00+0000",0,0,0,0,2,2,?
39716148329197568,"MonFeb2116:00:42+0000",0,1,0,0,0,1,?
39715973388828673,"MonFeb2116:00:51+0000",1,0,0,0,0,0,?
39723030380941312,"MonFeb2116:28:03+0000",0,0,0,0,0,0,?
39723030531944448,"MonFeb2116:28:03+0000",0,0,0,0,0,0,?
39723031433707520,"MonFeb2116:28:03+0000",0,0,0,0,0,0,?
WEKA 错误:
unparseable date "MonFeb2116:00:00+0000, read Token[MonFeb2116:00:00+0000], line 21
已使用 API 文档进行双重检查 - 遗漏了什么?
http://download.oracle.com/javase /1.4.2/docs/api/java/text/SimpleDateFormat.html
编辑 --------------
@RELATION example
@ATTRIBUTE tweetid STRING
@ATTRIBUTE timestamp DATE "EEE MMM dd hh:mm:ss Z"
@ATTRIBUTE I NUMERIC
@ATTRIBUTE a NUMERIC
@ATTRIBUTE cool NUMERIC
@ATTRIBUTE foo NUMERIC
@ATTRIBUTE love NUMERIC
@ATTRIBUTE temp NUMERIC
@ATTRIBUTE class {POS,NEG}
@DATA
39715973388828673,"Mon Feb 21 16:00:00 +0000",0,0,0,0,2,2,?
39716148329197568,"Mon Feb 21 16:00:42 +0000",0,1,0,0,0,1,?
39715973388828673,"Mon Feb 21 16:00:51 +0000",1,0,0,0,0,0,?
39723030380941312,"Mon Feb 21 16:28:03 +0000",0,0,0,0,0,0,?
39723030531944448,"Mon Feb 21 16:28:03 +0000",0,0,0,0,0,0,?
39723031433707520,"Mon Feb 21 16:28:03 +0000",0,0,0,0,0,0,?
格式化日期以用空格分隔标记。还没在WEKA打球...
I swear I'm using the correct date format but I keep getting a parse error when loading into WEKA.
"MonFeb2116:00:00+0000"
"EEEMMMddHH:mm:ssZ"
Here is an example dataset:
@RELATION example
@ATTRIBUTE tweetid STRING
@ATTRIBUTE timestamp DATE "EEEMMMddhh:mm:ssZ"
@ATTRIBUTE I NUMERIC
@ATTRIBUTE a NUMERIC
@ATTRIBUTE cool NUMERIC
@ATTRIBUTE foo NUMERIC
@ATTRIBUTE bar NUMERIC
@ATTRIBUTE temp NUMERIC
@ATTRIBUTE class {POS,NEG}
@DATA
39715973388828673,"MonFeb2116:00:00+0000",0,0,0,0,2,2,?
39716148329197568,"MonFeb2116:00:42+0000",0,1,0,0,0,1,?
39715973388828673,"MonFeb2116:00:51+0000",1,0,0,0,0,0,?
39723030380941312,"MonFeb2116:28:03+0000",0,0,0,0,0,0,?
39723030531944448,"MonFeb2116:28:03+0000",0,0,0,0,0,0,?
39723031433707520,"MonFeb2116:28:03+0000",0,0,0,0,0,0,?
WEKA Error:
unparseable date "MonFeb2116:00:00+0000, read Token[MonFeb2116:00:00+0000], line 21
Have used the API documentation to double check - missing something?
http://download.oracle.com/javase/1.4.2/docs/api/java/text/SimpleDateFormat.html
EDIT -----------
@RELATION example
@ATTRIBUTE tweetid STRING
@ATTRIBUTE timestamp DATE "EEE MMM dd hh:mm:ss Z"
@ATTRIBUTE I NUMERIC
@ATTRIBUTE a NUMERIC
@ATTRIBUTE cool NUMERIC
@ATTRIBUTE foo NUMERIC
@ATTRIBUTE love NUMERIC
@ATTRIBUTE temp NUMERIC
@ATTRIBUTE class {POS,NEG}
@DATA
39715973388828673,"Mon Feb 21 16:00:00 +0000",0,0,0,0,2,2,?
39716148329197568,"Mon Feb 21 16:00:42 +0000",0,1,0,0,0,1,?
39715973388828673,"Mon Feb 21 16:00:51 +0000",1,0,0,0,0,0,?
39723030380941312,"Mon Feb 21 16:28:03 +0000",0,0,0,0,0,0,?
39723030531944448,"Mon Feb 21 16:28:03 +0000",0,0,0,0,0,0,?
39723031433707520,"Mon Feb 21 16:28:03 +0000",0,0,0,0,0,0,?
Formatted date to separate tokens with space. Still not playing ball in WEKA...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您使用的是哪个默认区域设置?使用英语区域设置时,字符串
"MonFeb2116:00:00+0000"
应可使用模式"EEEMMMddHH:mm:ssZ"
进行解析。但请注意,如果模式或解析的字符串中不存在年份,则年份将默认为 1970 年。这可能不是您真正想要的。Which default locale are you using? Using an English locale, the String
"MonFeb2116:00:00+0000"
should be parseable with the pattern"EEEMMMddHH:mm:ssZ"
. Note however, that the year will default to 1970, if not present in the pattern or parsed string. That is probably not what you really want.好吧,我不知道它是否能解决所有问题,但尝试将
hh
(12 小时格式)更改为HH
(24 小时格式)。我不确定它是否能够读取没有任何空格的“星期几/月份名称”,即使如此......您有获取该格式的值吗?如果你可以在第三个和第六个字符后面加一个空格,这会有所帮助......Well, I don't know whether it'll sort everything out or not, but try changing
hh
(12-hour format) toHH
(24-hour format). I'm not sure whether it'll be able to read a "day of the week / month name" without any spaces even so... do you have to get the value in that format? If you could put a space after the 3rd and 6th characters it would help...