Python 正则表达式中的动态命名组

发布于 2024-08-17 18:23:33 字数 219 浏览 19 评论 0原文

有没有办法动态更新Python中正则表达式组的名称?

例如,如果文本是:

person 1: name1
person 2: name2
person 3: name3
...
person N: nameN

在事先不知道有多少人的情况下,您如何命名组“person1”、“person2”、“person3”、...和“personN”?

Is there a way to dynamically update the name of regex groups in Python?

For example, if the text is:

person 1: name1
person 2: name2
person 3: name3
...
person N: nameN

How would you name groups 'person1', 'person2', 'person3', ..., and 'personN' without knowing beforehand how many people there are?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

所谓喜欢 2024-08-24 18:23:33

不,但你可以这样做:

>>> import re
>>> p = re.compile('(?m)^(.*?)\\s*:\\s*(.*)

输出:

[('person 1', 'name1'), ('person 2', 'name2'), ('person 3', 'name3'), ('person N', 'nameN')]

快速解释:

(?m)     # enable multi-line mode
^        # match the start of a new line
(.*?)    # un-greedily match zero or more chars and store it in match group 1
\s*:\s*  # match a colon possibly surrounded by space chars
(.*)     # match the rest of the line and store it in match group 2
$        # match the end of the line

引用

) >>> text = '''person 1: name1 person 2: name2 person 3: name3 ... person N: nameN''' >>> p.findall(text)

输出:

快速解释:

引用

No, but you can do something like this:

>>> import re
>>> p = re.compile('(?m)^(.*?)\\s*:\\s*(.*)

output:

[('person 1', 'name1'), ('person 2', 'name2'), ('person 3', 'name3'), ('person N', 'nameN')]

A quick explanation:

(?m)     # enable multi-line mode
^        # match the start of a new line
(.*?)    # un-greedily match zero or more chars and store it in match group 1
\s*:\s*  # match a colon possibly surrounded by space chars
(.*)     # match the rest of the line and store it in match group 2
$        # match the end of the line

References

) >>> text = '''person 1: name1 person 2: name2 person 3: name3 ... person N: nameN''' >>> p.findall(text)

output:

A quick explanation:

References

蹲墙角沉默 2024-08-24 18:23:33

命名捕获组和编号组(\1、\2 等)不能是动态的,但您可以使用 findall 实现相同的效果:

re.findall(pattern, string[, flags])

返回字符串中模式的所有非重叠匹配项,作为列表
字符串。字符串被扫描
从左到右,匹配项是
按找到的顺序返回。如果一个或
更多团体出现在
模式,返回组列表;这
将是一个元组列表,如果
模式有多个组。空的
匹配项包含在结果中
除非他们触及了开头
另一场比赛。

named capture groups and numbered groups (\1, \2, etc.) cannot be dynamic, but you can achieve the same thing with findall:

re.findall(pattern, string[, flags])

Return all non-overlapping matches of pattern in string, as a list of
strings. The string is scanned
left-to-right, and matches are
returned in the order found. If one or
more groups are present in the
pattern, return a list of groups; this
will be a list of tuples if the
pattern has more than one group. Empty
matches are included in the result
unless they touch the beginning of
another match.

白云不回头 2024-08-24 18:23:33

从您接受的答案来看,不需要正则

p="""
person 1: name1
person 2: name2
person 3: name3
person N: nameN
"""

ARR=[]
for item in p.split("\n"):
    if item:
        s=item.split(":")
        ARR.append(s)
print ARR

表达式输出

$ ./python.py
[['person 1', ' name1'], ['person 2', ' name2'], ['person 3', ' name3'], ['person N', ' nameN']]

judging from your accepted answer, there's no need for regex

p="""
person 1: name1
person 2: name2
person 3: name3
person N: nameN
"""

ARR=[]
for item in p.split("\n"):
    if item:
        s=item.split(":")
        ARR.append(s)
print ARR

output

$ ./python.py
[['person 1', ' name1'], ['person 2', ' name2'], ['person 3', ' name3'], ['person N', ' nameN']]
前事休说 2024-08-24 18:23:33

Python 中的正则表达式(我非常确定这对于一般的正则表达式来说是正确的)不允许任意数量的匹配。您可以捕获整个重复匹配(通过在重复组周围放置捕获括号)或捕获一系列匹配中的最后一个匹配(通过重复捕获组)。这与这些捕获组是命名的还是编号的无关。

您需要通过迭代字符串中的所有匹配项以编程方式执行此操作,例如

for match in re.findall(pattern, string):
    do_something(match)

Regexes in Python (and I'm pretty certain that that's true for regexes in general) don't allow for an arbitrary number of matches. You can either capture a repeated match in its entirety (by placing capturing parentheses around a repeated group) or capture the last match in a series of matches (by repeating a capturing group). This is independent of whether these are named or numbered capturing groups.

You need to do this programmatically by iterating over all matches in a string, like

for match in re.findall(pattern, string):
    do_something(match)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文