如何从 Python 日期时间对象中删除未转换的数据
我有一个大部分正确日期时间的数据库,但有一些像这样损坏了: Sat Dec 22 12:34:08 PST 20102015
如果没有无效的年份,这对我有用:
end_date = soup('tr')[4].contents[1].renderContents()
end_date = time.strptime(end_date,"%a %b %d %H:%M:%S %Z %Y")
end_date = datetime.fromtimestamp(time.mktime(end_date))
但是一旦我击中了一个对象如果年份无效,我会得到 ValueError: unconverted data returned: 2
,这很好,但我不确定如何最好地从年份中去除坏字符。它们的范围为 2 到 6 个未转换的字符
。
有什么指点吗?我只会切片 end_date
但我希望有一个日期时间安全的策略。
I have a database of mostly correct datetimes but a few are broke like so: Sat Dec 22 12:34:08 PST 20102015
Without the invalid year, this was working for me:
end_date = soup('tr')[4].contents[1].renderContents()
end_date = time.strptime(end_date,"%a %b %d %H:%M:%S %Z %Y")
end_date = datetime.fromtimestamp(time.mktime(end_date))
But once I hit an object with a invalid year I get ValueError: unconverted data remains: 2
, which is great but im not sure how best to strip the bad characters out of the year. They range from 2 to 6 unconverted characters
.
Any pointers? I would just slice end_date
but im hoping there is a datetime-safe strategy.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
除非您想重写
strptime
(一个非常糟糕的主意),否则您唯一真正的选择是切片end_date
并砍掉末尾的额外字符,假设这是会给你你想要的正确结果。例如,您可以捕获
ValueError
、切片并重试:例如:
Unless you want to rewrite
strptime
(a very bad idea), the only real option you have is to sliceend_date
and chop off the extra characters at the end, assuming that this will give you the correct result you intend.For example, you can catch the
ValueError
, slice, and try again:For example:
是的,我只是砍掉多余的数字。假设它们总是附加到日期字符串,那么这样的事情就会起作用:
我打算尝试从异常中获取多余的位数,但在我安装的 Python 版本(2.6.6 和 3.1.2)上,该信息实际上并不存在;它只是说数据与格式不匹配。当然,您可以继续一次删除一位数字并重新解析,直到没有出现异常为止。
您还可以编写一个仅匹配有效日期的正则表达式,包括年份中正确的位数,但这似乎有点矫枉过正。
Yeah, I'd just chop off the extra numbers. Assuming they are always appended to the datestring, then something like this would work:
I was going to try to get the number of excess digits from the exception, but on my installed versions of Python (2.6.6 and 3.1.2) that information isn't actually there; it just says that the data does not match the format. Of course, you could just continue lopping off digits one at a time and re-parsing until you don't get an exception.
You could also write a regex that will match only valid dates, including the right number of digits in the year, but that seems like overkill.
这是我使用的更简单的一行:
end_date = end_date[:-4]
Here's an even simpler one-liner I use:
end_date = end_date[:-4]
改进(我希望)Adam Rosenfield 的代码:
Improving (i hope) the code of Adam Rosenfield:
strptime()
确实希望看到格式正确的日期,因此您可能需要在调用它之前对end_date
字符串进行一些修改。这是将
end_date
中的最后一项削减为 4 个字符的一种方法:strptime()
really expects to see a correctly formatted date, so you probably need to do some munging on theend_date
string before you call it.This is one way to chop the last item in the
end_date
to 4 chars: