`match = re.match(); 的替代方案 如果匹配:...` 习语?

发布于 2024-07-26 23:49:46 字数 1909 浏览 2 评论 0原文

如果你想检查某些内容是否与正则表达式匹配,如果是,则打印第一组,你可以这样做。

import re
match = re.match("(\d+)g", "123g")
if match is not None:
    print match.group(1)

这完全是迂腐的,但是中间的 match 变量有点烦人。

像 Perl 这样的语言会这样做这是通过为匹配组创建新的 $1..$9 变量来实现的,例如......

if($blah ~= /(\d+)g/){
    print $1
}

来自 这个 reddit 评论

with re_context.match('^blah', s) as match:
    if match:
        ...
    else:
        ...

..我认为这是一个有趣的想法,所以我写了一个简单的实现:(

#!/usr/bin/env python2.6
import re

class SRE_Match_Wrapper:
    def __init__(self, match):
        self.match = match

    def __exit__(self, type, value, tb):
        pass

    def __enter__(self):
        return self.match

    def __getattr__(self, name):
        if name == "__exit__":
            return self.__exit__
        elif name == "__enter__":
            return self.__name__
        else:
            return getattr(self.match, name)

def rematch(pattern, inp):
    matcher = re.compile(pattern)
    x = SRE_Match_Wrapper(matcher.match(inp))
    return x
    return match

if __name__ == '__main__':
    # Example:
    with rematch("(\d+)g", "123g") as m:
        if m:
            print(m.group(1))

    with rematch("(\d+)g", "123") as m:
        if m:
            print(m.group(1))

理论上这个功能可以修补到_sre.SRE_Match 对象)

如果没有匹配项,如果您可以跳过 with 语句的代码块的执行,那就太好了,这会将其简化为

with rematch("(\d+)g", "123") as m:
    print(m.group(1)) # only executed if the match occurred

...。 .但这似乎是不可能的,根据我可以从 PEP 343 推断出的

任何想法? 正如我所说,这确实是微不足道的烦恼,几乎达到了代码高尔夫的地步。

If you want to check if something matches a regex, if so, print the first group, you do..

import re
match = re.match("(\d+)g", "123g")
if match is not None:
    print match.group(1)

This is completely pedantic, but the intermediate match variable is a bit annoying..

Languages like Perl do this by creating new $1..$9 variables for match groups, like..

if($blah ~= /(\d+)g/){
    print $1
}

From this reddit comment,

with re_context.match('^blah', s) as match:
    if match:
        ...
    else:
        ...

..which I thought was an interesting idea, so I wrote a simple implementation of it:

#!/usr/bin/env python2.6
import re

class SRE_Match_Wrapper:
    def __init__(self, match):
        self.match = match

    def __exit__(self, type, value, tb):
        pass

    def __enter__(self):
        return self.match

    def __getattr__(self, name):
        if name == "__exit__":
            return self.__exit__
        elif name == "__enter__":
            return self.__name__
        else:
            return getattr(self.match, name)

def rematch(pattern, inp):
    matcher = re.compile(pattern)
    x = SRE_Match_Wrapper(matcher.match(inp))
    return x
    return match

if __name__ == '__main__':
    # Example:
    with rematch("(\d+)g", "123g") as m:
        if m:
            print(m.group(1))

    with rematch("(\d+)g", "123") as m:
        if m:
            print(m.group(1))

(This functionality could theoretically be patched into the _sre.SRE_Match object)

It would be nice if you could skip the execution of the with statement's code block, if there was no match, which would simplify this to..

with rematch("(\d+)g", "123") as m:
    print(m.group(1)) # only executed if the match occurred

..but this seems impossible based of what I can deduce from PEP 343

Any ideas? As I said, this is really trivial annoyance, almost to the point of being code-golf..

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

∝单色的世界 2024-08-02 23:49:47

我不认为使用 with 是这种情况下的解决方案。 您必须在 BLOCK 部分(由用户指定)引发异常,并让 __exit__ 方法将 True 返回到“吞下”例外。 所以它永远不会好看。

我建议采用类似于 Perl 语法的语法。 制作您自己的扩展 re 模块(我将其称为 rex)并让它在其模块命名空间中设置变量:

if rex.match('(\d+)g', '123g'):
    print rex._1

正如您在下面的注释中看到的,此方法是既不是范围安全的,也不是线程安全的。 仅当您完全确定您的应用程序将来不会成为多线程,并且从您使用此功能的范围调用的任何函数也将使用相同的功能时,您才会使用此功能方法。

I don't think using with is the solution in this case. You'd have to raise an exception in the BLOCK part (which is specified by the user) and have the __exit__ method return True to "swallow" the exception. So it would never look good.

I'd suggest going for a syntax similar to the Perl syntax. Make your own extended re module (I'll call it rex) and have it set variables in its module namespace:

if rex.match('(\d+)g', '123g'):
    print rex._1

As you can see in the comments below, this method is neither scope- nor thread-safe. You would only use this if you were completely certain that your application wouldn't become multi-threaded in the future and that any functions called from the scope that you're using this in will also use the same method.

花心好男孩 2024-08-02 23:49:47

这看起来并不漂亮,但您可以从 getattr(object, name[, default]) 内置函数中受益,如下所示:

>>> getattr(re.match("(\d+)g", "123g"), 'group', lambda n:'')(1)
'123'
>>> getattr(re.match("(\d+)g", "X23g"), 'group', lambda n:'')(1)
''

模仿 if match 打印组流程,你可以这样使用for语句:

>>> for group in filter(None, [getattr(re.match("(\d+)g", "123g"), 'group', None)]):
        print(group(1))
123
>>> for group in filter(None, [getattr(re.match("(\d+)g", "X23g"), 'group', None)]):
        print(group(1))
>>> 

当然你可以定义一个小函数来完成肮脏的工作:

>>> matchgroup = lambda p,s: filter(None, [getattr(re.match(p, s), 'group', None)])
>>> for group in matchgroup("(\d+)g", "123g"):
        print(group(1))
123
>>> for group in matchgroup("(\d+)g", "X23g"):
        print(group(1))
>>> 

This is not really pretty-looking, but you can profit from the getattr(object, name[, default]) built-in function using it like this:

>>> getattr(re.match("(\d+)g", "123g"), 'group', lambda n:'')(1)
'123'
>>> getattr(re.match("(\d+)g", "X23g"), 'group', lambda n:'')(1)
''

To mimic the if match print group flow, you can (ab)use the for statement this way:

>>> for group in filter(None, [getattr(re.match("(\d+)g", "123g"), 'group', None)]):
        print(group(1))
123
>>> for group in filter(None, [getattr(re.match("(\d+)g", "X23g"), 'group', None)]):
        print(group(1))
>>> 

Of course you can define a little function to do the dirty work:

>>> matchgroup = lambda p,s: filter(None, [getattr(re.match(p, s), 'group', None)])
>>> for group in matchgroup("(\d+)g", "123g"):
        print(group(1))
123
>>> for group in matchgroup("(\d+)g", "X23g"):
        print(group(1))
>>> 
腻橙味 2024-08-02 23:49:47

这不是完美的解决方案,但确实允许您为同一字符串链接多个匹配选项:

class MatchWrapper(object):
  def __init__(self):
    self._matcher = None

  def wrap(self, matcher):
    self._matcher = matcher

  def __getattr__(self, attr):
    return getattr(self._matcher, attr)

def match(pattern, s, matcher):
  m = re.match(pattern, s)
  if m:
    matcher.wrap(m)
    return True
  else:
    return False

matcher = MatchWrapper()
s = "123g";
if _match("(\d+)g", line, matcher):
  print matcher.group(1)
elif _match("(\w+)g", line, matcher):
  print matcher.group(1)
else:
  print "no match"

Not the perfect solution, but does allow you to chain several match options for the same str:

class MatchWrapper(object):
  def __init__(self):
    self._matcher = None

  def wrap(self, matcher):
    self._matcher = matcher

  def __getattr__(self, attr):
    return getattr(self._matcher, attr)

def match(pattern, s, matcher):
  m = re.match(pattern, s)
  if m:
    matcher.wrap(m)
    return True
  else:
    return False

matcher = MatchWrapper()
s = "123g";
if _match("(\d+)g", line, matcher):
  print matcher.group(1)
elif _match("(\w+)g", line, matcher):
  print matcher.group(1)
else:
  print "no match"
2024-08-02 23:49:47

这是我的解决方案:

import re

s = 'hello world'

match = []
if match.append(re.match('w\w+', s)) or any(match):
    print('W:', match.pop().group(0))
elif match.append(re.match('h\w+', s)) or any(match):
    print('H:', match.pop().group(0))
else:
    print('No match found')

您可以根据需要使用任意多个 elif 子句。

更好的是:

import re

s = 'hello world'

if vars().update(match=re.match('w\w+', s)) or match:
    print('W:', match.group(0))
elif vars().update(match=re.match('h\w+', s)) or match:
    print('H:', match.group(0))
else:
    print('No match found')

追加更新都返回。 因此,您必须在每种情况下使用部分来实际检查表达式的结果。

不幸的是,只有当代码位于顶层(即不在函数中)时,这才有效。

Here's my solution:

import re

s = 'hello world'

match = []
if match.append(re.match('w\w+', s)) or any(match):
    print('W:', match.pop().group(0))
elif match.append(re.match('h\w+', s)) or any(match):
    print('H:', match.pop().group(0))
else:
    print('No match found')

You can use as many elif clauses as needed.

Even better:

import re

s = 'hello world'

if vars().update(match=re.match('w\w+', s)) or match:
    print('W:', match.group(0))
elif vars().update(match=re.match('h\w+', s)) or match:
    print('H:', match.group(0))
else:
    print('No match found')

Both append and update return None. So you have to actually check the result of your expression by using the or part in every case.

Unfortunately, this only works as long as the code resides top-level, i.e. not in a function.

眼波传意 2024-08-02 23:49:47

这就是我所做的:

def re_match_cond (match_ref, regex, text):
    match = regex.match (text)
    del match_ref[:]
    match_ref.append (match)
    return match

if __name__ == '__main__':
    match_ref = []
    if re_match_cond (match_ref, regex_1, text):
        match = match_ref[0]
        ### ...
    elif re_match_cond (match_ref, regex_2, text):
        match = match_ref[0]
        ### ...
    elif re_match_cond (match_ref, regex_3, text):
        match = match_ref[0]
        ### ...
    else:
        ### no match
        ### ...

也就是说,我将一个列表传递给函数以模拟按引用传递。

This is what I do:

def re_match_cond (match_ref, regex, text):
    match = regex.match (text)
    del match_ref[:]
    match_ref.append (match)
    return match

if __name__ == '__main__':
    match_ref = []
    if re_match_cond (match_ref, regex_1, text):
        match = match_ref[0]
        ### ...
    elif re_match_cond (match_ref, regex_2, text):
        match = match_ref[0]
        ### ...
    elif re_match_cond (match_ref, regex_3, text):
        match = match_ref[0]
        ### ...
    else:
        ### no match
        ### ...

That is, I pass a list to the function to emulate pass-by-reference.

梦里兽 2024-08-02 23:49:46

我认为这不是小事。 如果我经常编写这样的代码,我不想在代码中添加多余的条件。

这有点奇怪,但你可以使用迭代器来做到这一点:

import re

def rematch(pattern, inp):
    matcher = re.compile(pattern)
    matches = matcher.match(inp)
    if matches:
        yield matches

if __name__ == '__main__':
    for m in rematch("(\d+)g", "123g"):
        print(m.group(1))

奇怪的是它使用迭代器来处理不迭代的东西——它更接近条件,乍一看它可能看起来会产生每场比赛有多个结果。

上下文管理器不能导致其托管函数被完全跳过,这看起来确实很奇怪; 虽然这不是“with”的明确用例之一,但它似乎是一个自然的扩展。

I don't think it's trivial. I don't want to have to sprinkle a redundant conditional around my code if I'm writing code like that often.

This is slightly odd, but you can do this with an iterator:

import re

def rematch(pattern, inp):
    matcher = re.compile(pattern)
    matches = matcher.match(inp)
    if matches:
        yield matches

if __name__ == '__main__':
    for m in rematch("(\d+)g", "123g"):
        print(m.group(1))

The odd thing is that it's using an iterator for something that isn't iterating--it's closer to a conditional, and at first glance it might look like it's going to yield multiple results for each match.

It does seem odd that a context manager can't cause its managed function to be skipped entirely; while that's not explicitly one of the use cases of "with", it seems like a natural extension.

硬不硬你别怂 2024-08-02 23:49:46

Python 3.8 开始,并引入赋值表达式 (PEP 572 ):= 运算符),我们现在可以捕获条件值 re.match(r'(\d+)g', '123g')变量 match 以便检查它是否不是 None,然后在条件体内重新使用它:

>>> if match := re.match(r'(\d+)g', '123g'):
...   print(match.group(1))
... 
123
>>> if match := re.match(r'(\d+)g', 'dddf'):
...   print(match.group(1))
...
>>>

Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), we can now capture the condition value re.match(r'(\d+)g', '123g') in a variable match in order to both check if it's not None and then re-use it within the body of the condition:

>>> if match := re.match(r'(\d+)g', '123g'):
...   print(match.group(1))
... 
123
>>> if match := re.match(r'(\d+)g', 'dddf'):
...   print(match.group(1))
...
>>>
北城挽邺 2024-08-02 23:49:46

另一种不错的语法是这样的:

header = re.compile('(.*?) = (.*?)
)
footer = re.compile('(.*?): (.*?)
)

if header.match(line) as m:
    key, value = m.group(1,2)
elif footer.match(line) as m
    key, value = m.group(1,2)
else:
    key, value = None, None

Another nice syntax would be something like this:

header = re.compile('(.*?) = (.*?)
)
footer = re.compile('(.*?): (.*?)
)

if header.match(line) as m:
    key, value = m.group(1,2)
elif footer.match(line) as m
    key, value = m.group(1,2)
else:
    key, value = None, None
二智少女 2024-08-02 23:49:46

我有另一种方法来做到这一点,基于 Glen Maynard 的解决方案:

for match in [m for m in [re.match(pattern,key)] if m]:
    print "It matched: %s" % match

与 Glen 的解决方案类似,这会迭代 0(如果不匹配)或 1(如果匹配)次。

不需要潜艇,但结果不太整洁。

I have another way of doing this, based on Glen Maynard's solution:

for match in [m for m in [re.match(pattern,key)] if m]:
    print "It matched: %s" % match

Similar to Glen's solution, this itterates either 0 (if no match) or 1 (if a match) times.

No sub needed, but less tidy as a result.

一杆小烟枪 2024-08-02 23:49:46

如果您在一个地方做了很多这样的事情,这里有一个替代答案:

import re
class Matcher(object):
    def __init__(self):
        self.matches = None
    def set(self, matches):
        self.matches = matches
    def __getattr__(self, name):
        return getattr(self.matches, name)

class re2(object):
    def __init__(self, expr):
        self.re = re.compile(expr)

    def match(self, matcher, s):
        matches = self.re.match(s)
        matcher.set(matches)
        return matches

pattern = re2("(\d+)g")
m = Matcher()
if pattern.match(m, "123g"):
    print(m.group(1))
if not pattern.match(m, "x123g"):
    print "no match"

您可以使用与 re 相同的线程安全性编译一次正则表达式,为整个函数创建一个可重用的 Matcher 对象,然后您可以非常方便地使用它简洁地。 这还有一个好处是您可以通过明显的方式反转它——要使用迭代器来做到这一点,您需要传递一个标志来告诉它反转其结果。

不过,如果每个函数只进行一次匹配,那么这并没有多大帮助; 您不想将 Matcher 对象保留在比这更广泛的上下文中; 它会导致与 Blixt 的解决方案相同的问题。

If you're doing a lot of these in one place, here's an alternative answer:

import re
class Matcher(object):
    def __init__(self):
        self.matches = None
    def set(self, matches):
        self.matches = matches
    def __getattr__(self, name):
        return getattr(self.matches, name)

class re2(object):
    def __init__(self, expr):
        self.re = re.compile(expr)

    def match(self, matcher, s):
        matches = self.re.match(s)
        matcher.set(matches)
        return matches

pattern = re2("(\d+)g")
m = Matcher()
if pattern.match(m, "123g"):
    print(m.group(1))
if not pattern.match(m, "x123g"):
    print "no match"

You can compile the regex once with the same thread safety as re, create a single reusable Matcher object for the whole function, and then you can use it very concisely. This also has the benefit that you can reverse it in the obvious way--to do that with an iterator, you'd need to pass a flag to tell it to invert its result.

It's not much help if you're only doing a single match per function, though; you don't want to keep Matcher objects in a broader context than that; it'd cause the same issues as Blixt's solution.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文