提取数据,如果在substring之间其他完整字符串

发布于 2025-02-08 10:06:49 字数 519 浏览 2 评论 0 原文

我有这样的字符串模式:

Beginning through June 18, 2022 at Noon standard time\n
Jan 20, 2022
Beginning through April 26, 2022 at 12:01 a.m. standard time

我想使用python Regex提取“通过”和“之前” at”字的数据部分。

June 18, 2022
Jan 20, 2022
April 26, 2022

我可以使用RE组提取长文本。

s ="Beginning through June 18, 2022 at Noon standard time"
re.search(r'(.*through)(.*) (at.*)', s).group(2)

但是,这无效,

s ="June 18, 2022"

任何人都可以帮助我吗?

I have string pattern like these:

Beginning through June 18, 2022 at Noon standard time\n
Jan 20, 2022
Beginning through April 26, 2022 at 12:01 a.m. standard time

I want to extract the data part presetnt after "through" and before "at" word using python regex.

June 18, 2022
Jan 20, 2022
April 26, 2022

I can extract for the long text using re group.

s ="Beginning through June 18, 2022 at Noon standard time"
re.search(r'(.*through)(.*) (at.*)', s).group(2)

However it will not work for

s ="June 18, 2022"

Can anyone help me on that.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

输什么也不输骨气 2025-02-15 10:06:49

如何玩可选组 backtracking

^(?:.*?through )?(.*?)(?: at.*)?$

请参阅Regex101 或a Python demo at tio.run

Note that if just one of the substrings存在,它可以匹配字符串的第一个到末端,或者从字符串的开始到后者。如果不存在,它将与完整的字符串匹配。


另一个想法可能是使用 pypi regex 支持<<<< a href =“ https://www.regular-expressions.info/branchrest.html” rel =“ nofollow noreferrer”>分支重置组。

^(?|.*?through (.+?) at|(.+))

如果两者都存在,则将零件提取零件,否则将其完整的字符串提取。 afaik REGEX 模块与Python的Regex功能广泛兼容,只需将导入Regex作为RE 而使用。

Python demo at tio.run

How about playing with optional groups and backtracking.

^(?:.*?through )?(.*?)(?: at.*)?$

See this demo at regex101 or a Python demo at tio.run

Note that if just one of the substrings are present, it will either match from the first to end of the string or from start of string to the latter. If none are present, it will match the full string.


Another idea could be to use PyPI regex which supports branch reset groups.

^(?|.*?through (.+?) at|(.+))

This one extracts the part between if both are present, else the full string. Afaik the regex module is widely compatible to Python's regex functions, just use import regex as re instead.

Demo at regex101 or Python demo at tio.run

Oo萌小芽oO 2025-02-15 10:06:49

您可以与捕获组一起使用此正则罚款:

(?:.* through |^)(.+?)(?: at |$)

demo

Regex strong>

  • (?:。*通过|^):匹配任何内容,后跟“虽然” 或启动位置
  • (。+?):匹配1+任何字符,并在组#1
  • (?at | $)中捕获它:匹配 at at“ ”或字符串的结尾

代码:

import re
arr = ['Beginning through June 18, 2022 at Noon standard time',
'Jan 20, 2022',
'Beginning through April 26, 2022 at 12:01 a.m. standard time']

for i in arr:
     print (re.findall(r'(?:.* through |^)(.+?)(?: at |$)', i))

输出:

['June 18, 2022']
['Jan 20, 2022']
['April 26, 2022']

You may use this regex with a capture group:

(?:.* through |^)(.+?)(?: at |$)

RegEx Demo

RegEx Details:

  • (?:.* through |^): Match anything followed by " though " or start position
  • (.+?): Match 1+ of any character and capture it in group #1
  • (?: at |$): Match " at " or end of string

Code:

import re
arr = ['Beginning through June 18, 2022 at Noon standard time',
'Jan 20, 2022',
'Beginning through April 26, 2022 at 12:01 a.m. standard time']

for i in arr:
     print (re.findall(r'(?:.* through |^)(.+?)(?: at |$)', i))

Output:

['June 18, 2022']
['Jan 20, 2022']
['April 26, 2022']
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文