如何解析 ISO 8601 格式的日期和时间?
我需要解析 RFC 3339 字符串,如 "2008-09-03T20:56 :35.450686Z"
转换为 Python 的 datetime
类型。
我在中找到了 strptime
Python标准库,但是不是很方便。
做这个的最好方式是什么?
I need to parse RFC 3339 strings like "2008-09-03T20:56:35.450686Z"
into Python's datetime
type.
I have found strptime
in the Python standard library, but it is not very convenient.
What is the best way to do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(28)
最初我尝试使用:
但这在负时区上不起作用。 然而,我在 Python 3.7.3 中工作得很好:
一些测试,请注意,输出仅因微秒精度而异。 在我的机器上达到了 6 位精度,但是 YMMV:
Initially I tried with:
But that didn't work on negative timezones. This however I got working fine, in Python 3.7.3:
Some tests, note that the out only differs by precision of microseconds. Got to 6 digits of precision on my machine, but YMMV:
感谢 Mark Amery 的回答,我设计了函数来解释日期时间的所有可能的 ISO 格式:
Thanks to great Mark Amery's answer I devised function to account for all possible ISO formats of datetime:
对于与 2.X 标准库一起使用的东西,请尝试:
calendar.timegm 是 time.mktime 缺少的 gm 版本。
For something that works with the 2.X standard library try:
calendar.timegm is the missing gm version of time.mktime.
如果解析无效的日期字符串,python-dateutil 将抛出异常,因此您可能需要捕获该异常。
The python-dateutil will throw an exception if parsing invalid date strings, so you may want to catch the exception.
datetime.fromisoformat()
在 Python 3.11 中得到改进,可解析大多数 ISO 8601 格式datetime.fromisoformat() 现在可用于解析大多数 ISO 8601 格式,除了那些支持小数小时和分钟的格式。 以前,此方法仅支持 datetime.isoformat() 可以发出的格式。
datetime.fromisoformat()
is improved in Python 3.11 to parse most ISO 8601 formatsdatetime.fromisoformat() can now be used to parse most ISO 8601 formats, barring only those that support fractional hours and minutes. Previously, this method only supported formats that could be emitted by datetime.isoformat().
现在有 Maya: Datetimes for Humans™,来自流行的 Requests: HTTP for Humans™ 包的作者:
Nowadays there's Maya: Datetimes for Humans™, from the author of the popular Requests: HTTP for Humans™ package:
因为 ISO 8601 允许存在可选冒号和破折号的多种变体,基本上是
CCYY-MM-DDThh:mm:ss[Z|(+|-)hh:mm]
。 如果您想使用 strptime,您需要首先删除这些变体。目标是生成 utc 日期时间对象。
If you just want a basic case that work for UTC with the Z suffix like
2016-06-29T19:36:29.3453Z
:If you want to handle timezone offsets like
2016-06-29T19:36:29.3453-0400
or2008-09-03T20:56:35.450686+05:00
use the following. These will convert all variations into something without variable delimiters like20080903T205635.450686+0500
making it more consistent/easier to parse.If your system does not support the
%z
strptime directive (you see something likeValueError: 'z' is a bad directive in format '%Y%m%dT%H%M%S.%f%z'
) then you need to manually offset the time fromZ
(UTC). Note%z
may not work on your system in python versions < 3 as it depended on the c library support which varies across system/python build type (i.e. Jython, Cython, etc.).Because ISO 8601 allows many variations of optional colons and dashes being present, basically
CCYY-MM-DDThh:mm:ss[Z|(+|-)hh:mm]
. If you want to use strptime, you need to strip out those variations first.The goal is to generate a utc datetime object.
If you just want a basic case that work for UTC with the Z suffix like
2016-06-29T19:36:29.3453Z
:If you want to handle timezone offsets like
2016-06-29T19:36:29.3453-0400
or2008-09-03T20:56:35.450686+05:00
use the following. These will convert all variations into something without variable delimiters like20080903T205635.450686+0500
making it more consistent/easier to parse.If your system does not support the
%z
strptime directive (you see something likeValueError: 'z' is a bad directive in format '%Y%m%dT%H%M%S.%f%z'
) then you need to manually offset the time fromZ
(UTC). Note%z
may not work on your system in python versions < 3 as it depended on the c library support which varies across system/python build type (i.e. Jython, Cython, etc.).如果无论如何使用
pandas
,我可以推荐时间戳
来自pandas
。 你可以在这里咆哮:令人难以置信的是,我们在 2021 年仍然需要担心日期字符串解析之类的事情。
If
pandas
is used anyway, I can recommendTimestamp
frompandas
. There you canRant: It is just unbelievable that we still need to worry about things like date string parsing in 2021.
Django 的
parse_datetime()
函数支持带有 UTC 偏移量的日期:因此它可以用于解析整个项目中字段中的 ISO 8601 日期:
Django's
parse_datetime()
function supports dates with UTC offsets:So it could be used for parsing ISO 8601 dates in fields within entire project:
在所有受支持的 Python 版本中,将类似 ISO 8601 的日期字符串转换为 UNIX 时间戳或
datetime.datetime
对象(无需安装第三方模块)的一种直接方法是使用 SQLite 的日期解析器。输出:
One straightforward way to convert an ISO 8601-like date string to a UNIX timestamp or
datetime.datetime
object in all supported Python versions without installing third-party modules is to use the date parser of SQLite.Output:
另一种方法是使用 ISO-8601 的专用解析器,即使用 dateutil 解析器的 isoparse 函数:
输出:
该函数在 标准 Python 函数 datetime.fromisoformat 的文档:
An another way is to use specialized parser for ISO-8601 is to use isoparse function of dateutil parser:
Output:
This function is also mentioned in the documentation for the standard Python function datetime.fromisoformat:
我是 iso8601utils 的作者。 它可以在 GitHub 或 PyPI。 以下是解析示例的方法:
I'm the author of iso8601utils. It can be found on GitHub or on PyPI. Here's how you can parse your example:
我已经为 ISO 8601 标准编写了一个解析器并将其放在 GitHub 上: https://github.com/盒装/iso8601。 此实现支持规范中的所有内容,但持续时间、间隔、周期间隔和 Python datetime 模块支持的日期范围之外的日期除外。
包括测试! :P
I've coded up a parser for the ISO 8601 standard and put it on GitHub: https://github.com/boxed/iso8601. This implementation supports everything in the specification except for durations, intervals, periodic intervals, and dates outside the supported date range of Python's datetime module.
Tests are included! :P
如果你不想使用dateutil,你可以尝试这个功能:
测试:
结果:
If you don't want to use dateutil, you can try this function:
Test:
Result:
这适用于 Python 3.2 及以上版本的 stdlib(假设所有时间戳均为 UTC):
例如,
This works for stdlib on Python 3.2 onwards (assuming all the timestamps are UTC):
For example,
如果您使用 Django,它提供了 dateparse 模块接受一堆类似于 ISO 格式的格式,包括时区。
如果您不使用 Django 并且不想使用此处提到的其他库之一,您可以调整 dateparse 的 Django 源代码到您的项目。
If you are working with Django, it provides the dateparse module that accepts a bunch of formats similar to ISO format, including the time zone.
If you are not using Django and you don't want to use one of the other libraries mentioned here, you could probably adapt the Django source code for dateparse to your project.
只需使用
python-dateutil
模块:文档
Just use the
python-dateutil
module:Documentation
我发现 ciso8601 是解析 ISO 8601 时间戳的最快方法。
它还完全支持 RFC 3339,并具有严格解析 RFC 3339 时间戳的专用功能。
用法示例:
GitHub Repo README 显示了它们与所有其他答案中列出的其他库。
我的个人项目涉及大量 ISO 8601 解析。 能够转接电话并加快速度真是太好了。 :)
编辑:我从此成为 ciso8601 的维护者。 现在比以前更快了!
I have found ciso8601 to be the fastest way to parse ISO 8601 timestamps.
It also has full support for RFC 3339, and a dedicated function for strict parsing RFC 3339 timestamps.
Example usage:
The GitHub Repo README shows their speedup versus all of the other libraries listed in the other answers.
My personal project involved a lot of ISO 8601 parsing. It was nice to be able to just switch the call and go faster. :)
Edit: I have since become a maintainer of ciso8601. It's now faster than ever!
如今,Arrow 也可以用作第三方解决方案:
In these days, Arrow also can be used as a third-party solution:
从 Python 3.7 开始,strptime 支持 UTC 偏移量中的冒号分隔符(源)。 因此,您可以使用:
编辑:
正如 Martijn 所指出的,如果您使用 isoformat() 创建了 datetime 对象,则只需使用
datetime.fromisoformat()
即可。编辑 2:
正如 Mark Amery 所指出的,我添加了一个 try.. except 块来解释丢失的小数秒。
Starting from Python 3.7, strptime supports colon delimiters in UTC offsets (source). So you can then use:
EDIT:
As pointed out by Martijn, if you created the datetime object using isoformat(), you can simply use
datetime.fromisoformat()
.EDIT 2:
As pointed out by Mark Amery, I added a try..except block to account for missing fractional seconds.
您得到的确切错误是什么? 是不是像下面这样?
如果是,您可以将输入字符串拆分为“.”,然后将微秒添加到您获得的日期时间中。
尝试这个:
What is the exact error you get? Is it like the following?
If yes, you can split your input string on ".", and then add the microseconds to the datetime you got.
Try this:
Python >= 3.11
fromisoformat
现在直接解析Z
:Python 3.7 到 3.10
来自注释之一的简单选项:将
'Z'
替换为'+00:00' - 并使用
fromisoformat
:为什么更喜欢
fromisoformat
?虽然
strptime
的%z
可以将'Z'
字符解析为 UTC,但fromisoformat
速度更快by ~ x40 (对于 Python 3.11 甚至是 ~x60):(GNU/Linux 上的 Python 3.11.3 x64)
另请参阅:更快的 strptime
Python >= 3.11
fromisoformat
now parsesZ
directly:Python 3.7 to 3.10
A simple option from one of the comments: replace
'Z'
with'+00:00'
- and usefromisoformat
:Why prefer
fromisoformat
?Although
strptime
's%z
can parse the'Z'
character to UTC,fromisoformat
is faster by ~ x40 (or even ~x60 for Python 3.11):(Python 3.11.3 x64 on GNU/Linux)
See also: A faster strptime
尝试 iso8601 模块; 它正是这样做的。
python.org wiki 的 WorkingWithTime 页面上还提到了其他几个选项。
Try the iso8601 module; it does exactly this.
There are several other options mentioned on the WorkingWithTime page on the python.org wiki.
请注意,在 Python 2.6+ 和 Py3K 中,%f 字符捕获微秒。
请参阅问题此处
Note in Python 2.6+ and Py3K, the %f character catches microseconds.
See issue here
从 Python 3.7 开始,您基本上可以不用使用
datetime.datetime.strptime
来解析 RFC 3339 日期时间,如下所示:这有点尴尬,因为我们需要尝试两种不同的格式字符串才能同时支持这两种格式带有秒数小数部分的日期时间(例如
2022-01-01T12:12:12.123Z
)和不带秒数的日期时间(例如2022-01-01T12:12:12Z
),两者在 RFC 3339 下都是有效的。但只要我们执行那一点点复杂的逻辑,这就是有效的。关于此方法需要注意的一些注意事项:
T
来分隔日期和日期。尽管 RFC 3339 声称是 ISO 8601 的概要文件,但 ISO 8601 不允许这样做。 如果您想支持 RFC 3339 的这个愚蠢的怪癖,您可以将datetime_str = datetime_str.replace(' ', 'T')
添加到函数的开头。+0500
,而 RFC 3339 不支持。 如果您不仅想解析已知的 RFC-3339 日期时间,还想严格验证您获得的日期时间是否为 RFC 3339,请使用其他方法或添加您自己的逻辑来验证时区偏移量格式。2009-W01-1 code> 是有效的 ISO 8601 日期。)
%z
说明符仅匹配时区偏移量,例如+0500
或-0430
或+0000
,而不是 RFC 3339 时区偏移量,如+05:00
或- 04:30 或
Z
。As of Python 3.7, you can basically (caveats below) get away with using
datetime.datetime.strptime
to parse RFC 3339 datetimes, like this:It's a little awkward, since we need to try two different format strings in order to support both datetimes with a fractional number of seconds (like
2022-01-01T12:12:12.123Z
) and those without (like2022-01-01T12:12:12Z
), both of which are valid under RFC 3339. But as long as we do that single fiddly bit of logic, this works.Some caveats to note about this approach:
T
to separate the date from the time, even though RFC 3339 purports to be a profile of ISO 8601 and ISO 8601 does not allow this. If you want to support this silly quirk of RFC 3339, you could adddatetime_str = datetime_str.replace(' ', 'T')
to the start of the function.+0500
without a colon, which RFC 3339 does not support. If you don't merely want to parse known-to-be-RFC-3339 datetimes but also want to rigorously validate that the datetime you're getting is RFC 3339, use another approach or add in your own logic to validate the timezone offset format.2009-W01-1
is a valid ISO 8601 date.)%z
specifier only matches timezones offsets like+0500
or-0430
or+0000
, not RFC 3339 timezone offsets like+05:00
or-04:30
orZ
.从 Python 3.11 开始,标准库的
datetime.fromisoformat
支持任何有效的 ISO 8601 输入。 在早期版本中,它仅解析特定的子集,请参阅文档中的警告注释。 如果您在不属于该子集的字符串上使用 Python 3.10 或更早版本(如问题中所示),请参阅标准库外部函数的其他答案。 文档:Since Python 3.11, the standard library’s
datetime.fromisoformat
supports any valid ISO 8601 input. In earlier versions it only parses a specific subset, see the cautionary note in the docs. If you are using Python 3.10 or earlier on strings that don't fall into that subset (like in the question), see other answers for functions from outside the standard library. The docs:python-dateutil 中的
isoparse
函数python-dateutil 包有 < code>dateutil.parser.isoparse 不仅可以解析 RFC 3339 日期时间字符串(如问题中的字符串),还可以解析其他 ISO 8601 不符合 RFC 3339 的日期和时间字符串(例如没有 UTC 偏移量的字符串,或仅表示日期的字符串)。
python-dateutil 包还具有 <代码>dateutil.parser.parse。 与
isoparse
相比,它可能不太严格,但是它们都非常宽容,并且会尝试解释您传入的字符串。如果您想消除任何误读的可能性,您需要使用比这两个函数更严格的函数。与 Python 3.7+ 内置
datetime 的比较。 datetime.fromisoformat
dateutil.parser.isoparse
是一个完整的 ISO-8601 格式解析器,但在 Python ≤ 3.10 中fromisoformat
是故意的不是。。 在 Python 3.11 中,fromisoformat
支持有效 ISO 8601 中的几乎所有字符串。有关此警告的警告,请参阅fromisoformat
的文档。 (请参阅此答案)。isoparse
function from python-dateutilThe python-dateutil package has
dateutil.parser.isoparse
to parse not only RFC 3339 datetime strings like the one in the question, but also other ISO 8601 date and time strings that don't comply with RFC 3339 (such as ones with no UTC offset, or ones that represent only a date).The python-dateutil package also has
dateutil.parser.parse
. Compared withisoparse
, it is presumably less strict, but both of them are quite forgiving and will attempt to interpret the string that you pass in. If you want to eliminate the possibility of any misreads, you need to use something stricter than either of these functions.Comparison with Python 3.7+’s built-in
datetime.datetime.fromisoformat
dateutil.parser.isoparse
is a full ISO-8601 format parser, but in Python ≤ 3.10fromisoformat
is deliberately not. In Python 3.11,fromisoformat
supports almost all strings in valid ISO 8601. Seefromisoformat
's docs for this cautionary caveat. (See this answer).