使用REGEX在文本中查找日期
如果日期之前没有“有效”一词,我想查找文本中的所有日期。 例如,我有以下行:
FEE SCHEDULE effective January 1, 2022 STATE OF January 7, 2022 ALASKA DISCLAIMER The January 5, 2022
我的正则表达式应返回 ['January , 2022', 'January 5, 2022']
我如何在 Python 中执行此操作?
我的尝试:
>>> import re
>>> rule = '((?<!Effective\ )([A-Za-z]{3,9}\ *\d{1,2}\ *,\ *\d{4}))'
>>> text = 'FEE SCHEDULE Effective January 1, 2022 STATE OF January 7, 2022 ALASKA DISCLAIMER The January 5, 2022'
>>> re.findall(rule, text)
[('anuary 1, 2022', 'anuary 1, 2022'), ('January 7, 2022', 'January 7, 2022'), ('January 5, 2022', 'January 5, 2022')]
但是不起作用。
I want to find all dates in a text if there is no word Effective before the date.
For example, I have the following line:
FEE SCHEDULE Effective January 1, 2022 STATE OF January 7, 2022 ALASKA DISCLAIMER The January 5, 2022
My regex should return ['January , 2022', 'January 5, 2022']
How can I do this in Python?
My attempt:
>>> import re
>>> rule = '((?<!Effective\ )([A-Za-z]{3,9}\ *\d{1,2}\ *,\ *\d{4}))'
>>> text = 'FEE SCHEDULE Effective January 1, 2022 STATE OF January 7, 2022 ALASKA DISCLAIMER The January 5, 2022'
>>> re.findall(rule, text)
[('anuary 1, 2022', 'anuary 1, 2022'), ('January 7, 2022', 'January 7, 2022'), ('January 5, 2022', 'January 5, 2022')]
But it doesn't work.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以使用
查看正则表达式演示。 详细信息:
\b
- 字边界(? - 负向后查找,如果存在 <,则匹配失败code>Effective
+ 紧邻当前位置左侧的空白字符[A-Za-z]{3,9}
- 三到九个 ASCII 字母\s*
- 零个或多个空格\d{1,2}
- 一位或两位数字\s*,\s*
- 包含零个或多个空格的逗号\d{4 }
- 四位数字(?!\d)
- 如果右侧紧邻一个数字,则否定前瞻会导致匹配失败。You can use
See the regex demo. Details:
\b
- a word boundary(?<!Effective\s)
- a negative lookbehind that fails the match if there isEffective
+ a whitespace char immediately to the left of the current location[A-Za-z]{3,9}
- three to nine ASCII letters\s*
- zero or more whitespaces\d{1,2}
- one or two digits\s*,\s*
- a comma enclosed with zero or more whitespaces\d{4}
- four digits(?!\d)
- a negative lookahead that fails the match if there is a digit immediately on the right.