在Python中解析日期而不使用默认值
我正在使用 python 的 dateutil.parser 工具来解析从第三方提要获取的一些日期。它允许指定一个默认日期,该日期本身默认为今天,用于填充解析日期的缺失元素。虽然这通常很有帮助,但对于我的用例来说没有合理的默认值,而且我更愿意将部分日期视为我根本没有得到日期(因为它几乎总是意味着我得到了乱码数据)。我已经编写了以下解决方案:(
from dateutil import parser
import datetime
def parse_no_default(dt_str):
dt = parser.parse(dt_str, default=datetime.datetime(1900, 1, 1)).date()
dt2 = parser.parse(dt_str, default=datetime.datetime(1901, 2, 2)).date()
if dt == dt2:
return dt
else:
return None
此代码片段仅查看日期,因为这是我的应用程序所关心的全部内容,但类似的逻辑可以扩展以包括时间组件。)
我想知道(希望)有一个更好的方法来做到这一点。至少可以说,两次解析同一个字符串只是为了看看它是否填充不同的默认值,这似乎是对资源的严重浪费。
这是预期行为的一组测试(使用nosetest生成器):
import nose.tools
import lib.tools.date
def check_parse_no_default(sample, expected):
actual = lib.tools.date.parse_no_default(sample)
nose.tools.eq_(actual, expected)
def test_parse_no_default():
cases = (
('2011-10-12', datetime.date(2011, 10, 12)),
('2011-10', None),
('2011', None),
('10-12', None),
('2011-10-12T11:45:30', datetime.date(2011, 10, 12)),
('10-12 11:45', None),
('', None),
)
for sample, expected in cases:
yield check_parse_no_default, sample, expected
I'm using python's dateutil.parser
tool to parse some dates I'm getting from a third party feed. It allows specifying a default date, which itself defaults to today, for filling in missing elements of the parsed date. While this is in general helpful, there is no sane default for my use case, and I would prefer to treat partial dates as if I had not gotten a date at all (since it almost always means I got garbled data). I've written the following work around:
from dateutil import parser
import datetime
def parse_no_default(dt_str):
dt = parser.parse(dt_str, default=datetime.datetime(1900, 1, 1)).date()
dt2 = parser.parse(dt_str, default=datetime.datetime(1901, 2, 2)).date()
if dt == dt2:
return dt
else:
return None
(This snippet only looks at the date, as that's all I care about for my application, but similar logic could be extended to include the time component.)
I'm wondering (hoping) there's a better way of doing this. Parsing the same string twice just to see if it fills in different defaults seems like a gross waste of resources, to say the least.
Here's the set of tests (using nosetest generators) for the expected behavior:
import nose.tools
import lib.tools.date
def check_parse_no_default(sample, expected):
actual = lib.tools.date.parse_no_default(sample)
nose.tools.eq_(actual, expected)
def test_parse_no_default():
cases = (
('2011-10-12', datetime.date(2011, 10, 12)),
('2011-10', None),
('2011', None),
('10-12', None),
('2011-10-12T11:45:30', datetime.date(2011, 10, 12)),
('10-12 11:45', None),
('', None),
)
for sample, expected in cases:
yield check_parse_no_default, sample, expected
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
根据您的域,以下解决方案可能有效:
另一种方法是猴子补丁解析器类(这是非常hackiesh,所以如果您有其他选择,我不会推荐它):
您可以按如下方式使用它:
通过检查哪些成员可用在结果(ddd)中,您可以确定何时返回 None。
当所有字段可用时,您可以将 ddd 转换为日期时间对象:
Depending on your domain following solution might work:
Another approach would be to monkey patch parser class (this is very hackiesh, so I wouldn't recommend it if you have other options):
You can use it as follows:
By checking which members available in result (ddd) you could determine when return None.
When all fields available you can convert ddd into datetime object:
这可能是一个“黑客”,但看起来 dateutil 只查看您传入的默认值之外的很少的属性。您可以提供一个以所需方式爆炸的“假”日期时间。
This is probably a "hack", but it looks like dateutil looks at very few attributes out of the default you pass in. You could provide a 'fake' datetime that explodes in the desired way.
我在 dateutil 中遇到了完全相同的问题,我编写了这个函数,并认为我会为了后代而发布它。基本上使用像 @ILYA Khlopotov 这样的底层
_parse
方法建议:返回的对象不是
datetime
实例,但它具有.year
,.month
和.day
属性,这足以满足我的需求。我想您可以轻松地将其转换为datetime
实例。I ran into the exact same problem with dateutil, I wrote this function and figured I would post it for posterity's sake. Basically using the underlying
_parse
method like @ILYA Khlopotov suggests:The returned object isn't a
datetime
instance, but it has the.year
,.month
, and,.day
attributes, which was good enough for my needs. I suppose you could easily convert it to adatetime
instance.simple-date 为你做到了这一点(它确实在内部尝试了多种格式,但没有你想象的那么多,因为它使用的模式使用可选部分(如正则表达式)扩展了 python 的日期模式)。
请参阅 https://github.com/andrewcooke/simple-date - 但仅限 python 3.2 及更高版本(对不起)。
它比默认情况下您想要的更宽松:
但您可以指定自己的格式。例如:
ps
invert()
只是切换%
的存在,否则在指定复杂的日期模式时会变得一团糟。所以这里只有文字T
字符需要%
前缀(在标准 python 日期格式中,它将是唯一没有前缀的字母数字字符)simple-date does this for you (it does try multiple formats, internally, but not as many as you might think, because the patterns it uses extend python's date patterns with optional parts, like regexps).
see https://github.com/andrewcooke/simple-date - but only python 3.2 and up (sorry).
it's more lenient than what you want by default:
but you could specify your own format. for example:
ps the
invert()
just switches the presence of%
which otherwise become a real mess when specifying complex date patterns. so here only the literalT
character needs a%
prefix (in standard python date formatting it would be the only alpha-numeric character without a prefix)