与月份,没有一天的dataparser
我想解析不同格式的日期,而数据策略似乎是处理大多数怪异情况的最佳选择。但是,我的日期没有一天的问题,例如“ 04/2022”。我想提取这样的字符串,为月= 4,年= 2022,day = none或day = 1。不幸的是,解析“ 04/2022”会导致月份= 5,年= 2022。有没有办法强迫dateparser将两个检测到的数字之一视为一个月?在这种情况下,dateutils parser似乎可以正常工作,但是诸如“ polish_month_name年”之类的字符串失败。有没有办法按照我想要的方式使以下功能工作?
def extract_dates(line: str):
"""
Extracts list of dates detected in line
:param line: string to look fot the dates
:return: list of
>>> line = "09/01/2019 oraz 4/2020, 09/2018, luty 2020"
>>> extract_dates(line)
[(1, 2019), (4, 2020), (9, 2018), (2, 2020)]
"""
extracted_dates = []
dates = search_dates(line, languages=['pl'], settings={'DATE_ORDER': 'DMY'})
if dates is not None:
for d in dates:
try:
parse_res = dateparser.parse(d[0], languages=['pl'])
extracted_dates.append((parse_res.month, parse_res.year))
except:
parse_res = 'None'
else:
extracted_dates.append('None')
return extracted_dates
I want to parse dates of different formats and dataparser seems to be the best option to handle most of weird cases. However, I'm having problem with dates without a day, e.g. "04/2022". I'd like such a string to be extracted as month=4, year=2022, day=None or day=1. Unfortunately parsing "04/2022" results in month=5, year=2022. Is there a way to force dateparser to treat one of two detected numbers as month? Dateutils parser seems to work fine in such a case, but then it fails with strings such as "polish_month_name Year". Is there a way to make the following function work the way I want?
def extract_dates(line: str):
"""
Extracts list of dates detected in line
:param line: string to look fot the dates
:return: list of
>>> line = "09/01/2019 oraz 4/2020, 09/2018, luty 2020"
>>> extract_dates(line)
[(1, 2019), (4, 2020), (9, 2018), (2, 2020)]
"""
extracted_dates = []
dates = search_dates(line, languages=['pl'], settings={'DATE_ORDER': 'DMY'})
if dates is not None:
for d in dates:
try:
parse_res = dateparser.parse(d[0], languages=['pl'])
extracted_dates.append((parse_res.month, parse_res.year))
except:
parse_res = 'None'
else:
extracted_dates.append('None')
return extracted_dates
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论