使用REGEX在文本中查找日期

发布于 2025-01-20 14:12:40 字数 700 浏览 2 评论 0原文

如果日期之前没有“有效”一词,我想查找文本中的所有日期。 例如,我有以下行:

FEE SCHEDULE effective January 1, 2022 STATE OF January 7, 2022 ALASKA DISCLAIMER The January 5, 2022

我的正则表达式应返回 ['January , 2022', 'January 5, 2022']

我如何在 Python 中执行此操作?

我的尝试:

>>> import re
>>> rule = '((?<!Effective\ )([A-Za-z]{3,9}\ *\d{1,2}\ *,\ *\d{4}))'
>>> text = 'FEE SCHEDULE Effective January 1, 2022 STATE OF January 7, 2022 ALASKA DISCLAIMER The January 5, 2022'
>>> re.findall(rule, text)
[('anuary 1, 2022', 'anuary 1, 2022'), ('January 7, 2022', 'January 7, 2022'), ('January 5, 2022', 'January 5, 2022')]

但是不起作用。

I want to find all dates in a text if there is no word Effective before the date.
For example, I have the following line:

FEE SCHEDULE Effective January 1, 2022 STATE OF January 7, 2022 ALASKA DISCLAIMER The January 5, 2022

My regex should return ['January , 2022', 'January 5, 2022']

How can I do this in Python?

My attempt:

>>> import re
>>> rule = '((?<!Effective\ )([A-Za-z]{3,9}\ *\d{1,2}\ *,\ *\d{4}))'
>>> text = 'FEE SCHEDULE Effective January 1, 2022 STATE OF January 7, 2022 ALASKA DISCLAIMER The January 5, 2022'
>>> re.findall(rule, text)
[('anuary 1, 2022', 'anuary 1, 2022'), ('January 7, 2022', 'January 7, 2022'), ('January 5, 2022', 'January 5, 2022')]

But it doesn't work.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

定格我的天空 2025-01-27 14:12:40

您可以使用

\b(?<!Effective\s)[A-Za-z]{3,9}\s*\d{1,2}\s*,\s*\d{4}(?!\d)

查看正则表达式演示详细信息

  • \b - 字边界
  • (? - 负向后查找,如果存在 <,则匹配失败code>Effective + 紧邻当前位置左侧的空白字符
  • [A-Za-z]{3,9} - 三到九个 ASCII 字母
  • \s* - 零个或多个空格
  • \d{1,2} - 一位或两位数字
  • \s*,\s* - 包含零个或多个空格的逗号
  • \d{4 } - 四位数字
  • (?!\d) - 如果右侧紧邻一个数字,则否定前瞻会导致匹配失败。

You can use

\b(?<!Effective\s)[A-Za-z]{3,9}\s*\d{1,2}\s*,\s*\d{4}(?!\d)

See the regex demo. Details:

  • \b - a word boundary
  • (?<!Effective\s) - a negative lookbehind that fails the match if there is Effective + a whitespace char immediately to the left of the current location
  • [A-Za-z]{3,9} - three to nine ASCII letters
  • \s* - zero or more whitespaces
  • \d{1,2} - one or two digits
  • \s*,\s* - a comma enclosed with zero or more whitespaces
  • \d{4} - four digits
  • (?!\d) - a negative lookahead that fails the match if there is a digit immediately on the right.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文