与月份,没有一天的dataparser

发布于 2025-01-28 17:17:52 字数 1054 浏览 1 评论 0原文

我想解析不同格式的日期,而数据策略似乎是处理大多数怪异情况的最佳选择。但是,我的日期没有一天的问题,例如“ 04/2022”。我想提取这样的字符串,为月= 4,年= 2022,day = none或day = 1。不幸的是,解析“ 04/2022”会导致月份= 5,年= 2022。有没有办法强迫dateparser将两个检测到的数字之一视为一个月?在这种情况下,dateutils parser似乎可以正常工作,但是诸如“ polish_month_name年”之类的字符串失败。有没有办法按照我想要的方式使以下功能工作?

def extract_dates(line: str):
    """
    Extracts list of dates detected in line
    :param line: string to look fot the dates
    :return: list of
    >>> line = "09/01/2019 oraz 4/2020, 09/2018, luty 2020"
    >>> extract_dates(line)
    [(1, 2019), (4, 2020), (9, 2018), (2, 2020)]

    """
    extracted_dates = []
    dates = search_dates(line, languages=['pl'], settings={'DATE_ORDER': 'DMY'})
    if dates is not None:
        for d in dates:
            try:
                parse_res = dateparser.parse(d[0], languages=['pl'])
                extracted_dates.append((parse_res.month, parse_res.year))
            except:
                parse_res = 'None'
    else:
        extracted_dates.append('None')
    return extracted_dates

I want to parse dates of different formats and dataparser seems to be the best option to handle most of weird cases. However, I'm having problem with dates without a day, e.g. "04/2022". I'd like such a string to be extracted as month=4, year=2022, day=None or day=1. Unfortunately parsing "04/2022" results in month=5, year=2022. Is there a way to force dateparser to treat one of two detected numbers as month? Dateutils parser seems to work fine in such a case, but then it fails with strings such as "polish_month_name Year". Is there a way to make the following function work the way I want?

def extract_dates(line: str):
    """
    Extracts list of dates detected in line
    :param line: string to look fot the dates
    :return: list of
    >>> line = "09/01/2019 oraz 4/2020, 09/2018, luty 2020"
    >>> extract_dates(line)
    [(1, 2019), (4, 2020), (9, 2018), (2, 2020)]

    """
    extracted_dates = []
    dates = search_dates(line, languages=['pl'], settings={'DATE_ORDER': 'DMY'})
    if dates is not None:
        for d in dates:
            try:
                parse_res = dateparser.parse(d[0], languages=['pl'])
                extracted_dates.append((parse_res.month, parse_res.year))
            except:
                parse_res = 'None'
    else:
        extracted_dates.append('None')
    return extracted_dates

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文