使用REGEX在文本中查找日期

发布于 2025-01-20 14:12:40 字数 700 浏览 2 评论 0原文

如果日期之前没有“有效”一词，我想查找文本中的所有日期。例如，我有以下行：

FEE SCHEDULE effective January 1, 2022 STATE OF January 7, 2022 ALASKA DISCLAIMER The January 5, 2022

我的正则表达式应返回 ['January , 2022', 'January 5, 2022']

我如何在 Python 中执行此操作？

我的尝试：

>>> import re
>>> rule = '((?<!Effective\ )([A-Za-z]{3,9}\ *\d{1,2}\ *,\ *\d{4}))'
>>> text = 'FEE SCHEDULE Effective January 1, 2022 STATE OF January 7, 2022 ALASKA DISCLAIMER The January 5, 2022'
>>> re.findall(rule, text)
[('anuary 1, 2022', 'anuary 1, 2022'), ('January 7, 2022', 'January 7, 2022'), ('January 5, 2022', 'January 5, 2022')]

但是不起作用。

原文

I want to find all dates in a text if there is no word Effective before the date.
For example, I have the following line:

FEE SCHEDULE Effective January 1, 2022 STATE OF January 7, 2022 ALASKA DISCLAIMER The January 5, 2022

My regex should return ['January , 2022', 'January 5, 2022']

How can I do this in Python?

My attempt:

>>> import re
>>> rule = '((?<!Effective\ )([A-Za-z]{3,9}\ *\d{1,2}\ *,\ *\d{4}))'
>>> text = 'FEE SCHEDULE Effective January 1, 2022 STATE OF January 7, 2022 ALASKA DISCLAIMER The January 5, 2022'
>>> re.findall(rule, text)
[('anuary 1, 2022', 'anuary 1, 2022'), ('January 7, 2022', 'January 7, 2022'), ('January 5, 2022', 'January 5, 2022')]

But it doesn't work.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

定格我的天空 2025-01-27 14:12:40

您可以使用

\b(?<!Effective\s)[A-Za-z]{3,9}\s*\d{1,2}\s*,\s*\d{4}(?!\d)

查看正则表达式演示。 详细信息：

\b - 字边界
(? - 负向后查找，如果存在 <，则匹配失败code>Effective + 紧邻当前位置左侧的空白字符
[A-Za-z]{3,9} - 三到九个 ASCII 字母
\s* - 零个或多个空格
\d{1,2} - 一位或两位数字
\s*,\s* - 包含零个或多个空格的逗号
\d{4 } - 四位数字
(?!\d) - 如果右侧紧邻一个数字，则否定前瞻会导致匹配失败。

You can use

\b(?<!Effective\s)[A-Za-z]{3,9}\s*\d{1,2}\s*,\s*\d{4}(?!\d)

See the regex demo. Details:

\b - a word boundary
(?<!Effective\s) - a negative lookbehind that fails the match if there is Effective + a whitespace char immediately to the left of the current location
[A-Za-z]{3,9} - three to nine ASCII letters
\s* - zero or more whitespaces
\d{1,2} - one or two digits
\s*,\s* - a comma enclosed with zero or more whitespaces
\d{4} - four digits
(?!\d) - a negative lookahead that fails the match if there is a digit immediately on the right.