如何从 Python 日期时间对象中删除未转换的数据

发布于 2024-10-18 05:51:44 字数 511 浏览 9 评论 0原文

我有一个大部分正确日期时间的数据库，但有一些像这样损坏了： Sat Dec 22 12:34:08 PST 20102015

如果没有无效的年份，这对我有用：

end_date = soup('tr')[4].contents[1].renderContents()
end_date = time.strptime(end_date,"%a %b %d %H:%M:%S %Z %Y")
end_date = datetime.fromtimestamp(time.mktime(end_date))

但是一旦我击中了一个对象如果年份无效，我会得到 ValueError: unconverted data returned: 2，这很好，但我不确定如何最好地从年份中去除坏字符。它们的范围为 2 到 6 个未转换的字符。

有什么指点吗？我只会切片 end_date 但我希望有一个日期时间安全的策略。

原文

I have a database of mostly correct datetimes but a few are broke like so: Sat Dec 22 12:34:08 PST 20102015

Without the invalid year, this was working for me:

end_date = soup('tr')[4].contents[1].renderContents()
end_date = time.strptime(end_date,"%a %b %d %H:%M:%S %Z %Y")
end_date = datetime.fromtimestamp(time.mktime(end_date))

But once I hit an object with a invalid year I get ValueError: unconverted data remains: 2, which is great but im not sure how best to strip the bad characters out of the year. They range from 2 to 6 unconverted characters.

Any pointers? I would just slice end_date but im hoping there is a datetime-safe strategy.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

没有伤那来痛 2024-10-25 05:51:44

除非您想重写 strptime （一个非常糟糕的主意），否则您唯一真正的选择是切片 end_date 并砍掉末尾的额外字符，假设这是会给你你想要的正确结果。

例如，您可以捕获 ValueError、切片并重试：

def parse_prefix(line, fmt):
    try:
        t = time.strptime(line, fmt)
    except ValueError as v:
        if len(v.args) > 0 and v.args[0].startswith('unconverted data remains: '):
            line = line[:-(len(v.args[0]) - 26)]
            t = time.strptime(line, fmt)
        else:
            raise
    return t

例如：

parse_prefix(
    '2015-10-15 11:33:20.738 45162 INFO core.api.wsgi yadda yadda.',
    '%Y-%m-%d %H:%M:%S'
) # -> time.struct_time(tm_year=2015, tm_mon=10, tm_mday=15, tm_hour=11, tm_min=33, ...

Unless you want to rewrite strptime (a very bad idea), the only real option you have is to slice end_date and chop off the extra characters at the end, assuming that this will give you the correct result you intend.

For example, you can catch the ValueError, slice, and try again:

def parse_prefix(line, fmt):
    try:
        t = time.strptime(line, fmt)
    except ValueError as v:
        if len(v.args) > 0 and v.args[0].startswith('unconverted data remains: '):
            line = line[:-(len(v.args[0]) - 26)]
            t = time.strptime(line, fmt)
        else:
            raise
    return t

For example:

parse_prefix(
    '2015-10-15 11:33:20.738 45162 INFO core.api.wsgi yadda yadda.',
    '%Y-%m-%d %H:%M:%S'
) # -> time.struct_time(tm_year=2015, tm_mon=10, tm_mday=15, tm_hour=11, tm_min=33, ...

回复收藏 0 原文

难得心□动 2024-10-25 05:51:44

是的，我只是砍掉多余的数字。假设它们总是附加到日期字符串，那么这样的事情就会起作用：

end_date = end_date.split(" ")
end_date[-1] = end_date[-1][:4]
end_date = " ".join(end_date)

我打算尝试从异常中获取多余的位数，但在我安装的 Python 版本（2.6.6 和 3.1.2）上，该信息实际上并不存在；它只是说数据与格式不匹配。当然，您可以继续一次删除一位数字并重新解析，直到没有出现异常为止。

您还可以编写一个仅匹配有效日期的正则表达式，包括年份中正确的位数，但这似乎有点矫枉过正。

Yeah, I'd just chop off the extra numbers. Assuming they are always appended to the datestring, then something like this would work:

end_date = end_date.split(" ")
end_date[-1] = end_date[-1][:4]
end_date = " ".join(end_date)

I was going to try to get the number of excess digits from the exception, but on my installed versions of Python (2.6.6 and 3.1.2) that information isn't actually there; it just says that the data does not match the format. Of course, you could just continue lopping off digits one at a time and re-parsing until you don't get an exception.

You could also write a regex that will match only valid dates, including the right number of digits in the year, but that seems like overkill.

回复收藏 0 原文

琉璃繁缕 2024-10-25 05:51:44

这是我使用的更简单的一行：

end_date = end_date[:-4]

回复收藏 0 原文

一桥轻雨一伞开 2024-10-25 05:51:44

改进（我希望）Adam Rosenfield 的代码：

import time

for end_date in ( 'Fri Feb 18 20:41:47 Paris, Madrid 2011',
                  'Fri Feb 18 20:41:47 Paris, Madrid 20112015'):

    print end_date

    fmt = "%a %b %d %H:%M:%S %Z %Y"
    try:
        end_date = time.strptime(end_date, fmt)
    except ValueError, v:
        ulr = len(v.args[0].partition('unconverted data remains: ')[2])
        if ulr:
            end_date = time.strptime(end_date[:-ulr], fmt)
        else:
            raise v

    print end_date,'\n'

Improving (i hope) the code of Adam Rosenfield:

import time

for end_date in ( 'Fri Feb 18 20:41:47 Paris, Madrid 2011',
                  'Fri Feb 18 20:41:47 Paris, Madrid 20112015'):

    print end_date

    fmt = "%a %b %d %H:%M:%S %Z %Y"
    try:
        end_date = time.strptime(end_date, fmt)
    except ValueError, v:
        ulr = len(v.args[0].partition('unconverted data remains: ')[2])
        if ulr:
            end_date = time.strptime(end_date[:-ulr], fmt)
        else:
            raise v

    print end_date,'\n'

回复收藏 0 原文

浅黛梨妆こ 2024-10-25 05:51:44

strptime() 确实希望看到格式正确的日期，因此您可能需要在调用它之前对 end_date 字符串进行一些修改。

这是将 end_date 中的最后一项削减为 4 个字符的一种方法：

chop = len(end_date.split()[-1]) - 4
end_date = end_date[:-chop]

strptime() really expects to see a correctly formatted date, so you probably need to do some munging on the end_date string before you call it.

This is one way to chop the last item in the end_date to 4 chars:

chop = len(end_date.split()[-1]) - 4
end_date = end_date[:-chop]

回复收藏 0 原文

ㄖ落Θ余辉 2024-10-25 05:51:44

from datetime import datetime
ReportingDate = struct[7][1:-1] # 6/21/2022 5:00
dt = ReportingDate[:-5] # 6/21/2022
ReportingDate1 = datetime.strptime(dt, "%m/%d/%Y").strftime("%Y-%m-%d")

from datetime import datetime
ReportingDate = struct[7][1:-1] # 6/21/2022 5:00
dt = ReportingDate[:-5] # 6/21/2022
ReportingDate1 = datetime.strptime(dt, "%m/%d/%Y").strftime("%Y-%m-%d")

回复收藏 0 原文

~没有更多了~