re.search 和 re.match 有什么区别?

发布于 2024-07-07 02:10:22 字数 360 浏览 8 评论 0 原文

Python re 模块中的 search()match() 函数有什么区别?

我已阅读 Python 2 文档< /a> (Python 3 文档),但我似乎从来不记得它。

What is the difference between the search() and match() functions in the Python re module?

I've read the Python 2 documentation (Python 3 documentation), but I never seem to remember it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

夜巴黎 2024-07-14 02:11:50

re.match 锚定在字符串的开头,而 re.search 则扫描整个字符串。 因此,在下面的示例中,xy 匹配相同的内容。

x = re.match('pat', s)       # <--- already anchored at the beginning of string
y = re.search('\Apat', s)    # <--- match at the beginning

如果字符串不包含换行符,则 \A^ 本质上是相同的; 差异显示在多行字符串中。 在以下示例中,re.match 永远不会匹配第二行,而 re.search 可以使用正确的正则表达式(和标志)。

s = "1\n2"
re.match('2', s, re.M)       # no match
re.search('^2', s, re.M)     # match
re.search('\A2', s, re.M)    # no match  <--- mimics `re.match`

re 中还有一个函数 re.fullmatch() 会扫描整个字符串,因此它会同时锚定在字符串的开头和结尾。 因此,在下面的示例中,xyz 匹配相同的内容。

x = re.match('pat\Z', s)     # <--- already anchored at the beginning; must match end
y = re.search('\Apat\Z', s)  # <--- match at the beginning and end of string
z = re.fullmatch('pat', s)   # <--- already anchored at the beginning and end

基于 Jeyekomon 的答案(并使用他们的设置),使用 perfplot 库,我绘制了 timeit 测试的结果,该测试调查了:

  • 如果re.search“模仿”re.match,他们如何比较? (第一个图)
  • 如果re.match“模仿”re.search,他们如何比较? (第二个图)
  • 如果将相同的模式传递给他们,他们如何比较? (最后一个图)

请注意,最后一个模式不会产生相同的输出(因为 re.match 锚定在字符串的开头。)

性能图

第一个图显示 match 一样使用 search,>match 会更快。 第二个图支持@Jeyekomon的答案,并显示如果像search一样使用matchsearch会更快。 最后一张图显示,如果两者扫描相同的模式,则两者之间几乎没有什么区别。


用于生成性能图的代码。

import re
from random import choices
from string import ascii_lowercase
import matplotlib.pyplot as plt
from perfplot import plot

patterns = [
    [re.compile(r'\Aword'), re.compile(r'word')],
    [re.compile(r'word'), re.compile(r'(.*?)word')],
    [re.compile(r'word')]*2
]

fig, axs = plt.subplots(1, 3, figsize=(20,6), facecolor='white')
for i, (pat1, pat2) in enumerate(patterns):
    plt.sca(axs[i])
    perfplot.plot(
        setup=lambda n: [''.join(choices(ascii_lowercase, k=10)) for _ in range(n)],
        kernels=[lambda lst: [*map(pat1.search, lst)], lambda lst: [*map(pat2.match, lst)]],
        labels= [f"re.search(r'{pat1.pattern}', w)", f"re.match(r'{pat2.pattern}', w)"],
        n_range=[2**k for k in range(24)],
        xlabel='Length of list',
        equality_check=None
    )
fig.suptitle('re.match vs re.search')
fig.tight_layout();

re.match is anchored at the beginning of a string, while re.search scans through the entire string. So in the following example, x and y match the same thing.

x = re.match('pat', s)       # <--- already anchored at the beginning of string
y = re.search('\Apat', s)    # <--- match at the beginning

If a string doesn't contain line breaks, \A and ^ are essentially the same; the difference shows up in multiline strings. In the following example, re.match will never match the second line, while re.search can with the correct regex (and flag).

s = "1\n2"
re.match('2', s, re.M)       # no match
re.search('^2', s, re.M)     # match
re.search('\A2', s, re.M)    # no match  <--- mimics `re.match`

There's another function in re, re.fullmatch() that scans the entire string, so it is anchored both at the beginning and the end of a string. So in the following example, x, y and z match the same thing.

x = re.match('pat\Z', s)     # <--- already anchored at the beginning; must match end
y = re.search('\Apat\Z', s)  # <--- match at the beginning and end of string
z = re.fullmatch('pat', s)   # <--- already anchored at the beginning and end

Based on Jeyekomon's answer (and using their setup), using the perfplot library, I plotted the results of timeit tests that looks into:

  • how do they compare if re.search "mimics" re.match? (first plot)
  • how do they compare if re.match "mimics" re.search? (second plot)
  • how do they compare if the same pattern is passed to them? (last plot)

Note that the last pattern doesn't produce the same output (because re.match is anchored at the beginning of a string.)

performance plot

The first plot shows match is faster if search is used like match. The second plot supports @Jeyekomon's answer and shows search is faster if match is used like search. The last plot shows there's very little difference between the two if they scan for the same pattern.


Code used to produce the performance plot.

import re
from random import choices
from string import ascii_lowercase
import matplotlib.pyplot as plt
from perfplot import plot

patterns = [
    [re.compile(r'\Aword'), re.compile(r'word')],
    [re.compile(r'word'), re.compile(r'(.*?)word')],
    [re.compile(r'word')]*2
]

fig, axs = plt.subplots(1, 3, figsize=(20,6), facecolor='white')
for i, (pat1, pat2) in enumerate(patterns):
    plt.sca(axs[i])
    perfplot.plot(
        setup=lambda n: [''.join(choices(ascii_lowercase, k=10)) for _ in range(n)],
        kernels=[lambda lst: [*map(pat1.search, lst)], lambda lst: [*map(pat2.match, lst)]],
        labels= [f"re.search(r'{pat1.pattern}', w)", f"re.match(r'{pat2.pattern}', w)"],
        n_range=[2**k for k in range(24)],
        xlabel='Length of list',
        equality_check=None
    )
fig.suptitle('re.match vs re.search')
fig.tight_layout();
花落人断肠 2024-07-14 02:11:48

快速回答

re.search('test', ' test')      # returns a Truthy match object (because the search starts from any index) 

re.match('test', ' test')       # returns None (because the search start from 0 index)
re.match('test', 'test')        # returns a Truthy match object (match at 0 index)

Quick answer

re.search('test', ' test')      # returns a Truthy match object (because the search starts from any index) 

re.match('test', ' test')       # returns None (because the search start from 0 index)
re.match('test', 'test')        # returns a Truthy match object (match at 0 index)
暮年 2024-07-14 02:11:45

re.match 尝试匹配字符串开头的模式。 re.search 尝试整个字符串匹配模式,直到找到匹配项。

re.match attempts to match a pattern at the beginning of the string. re.search attempts to match the pattern throughout the string until it finds a match.

肩上的翅膀 2024-07-14 02:11:43

更短:

  • search 扫描整个字符串。

  • match 仅扫描字符串的开头。

以下 Ex 说:

>>> a = "123abc"
>>> re.match("[a-z]+",a)
None
>>> re.search("[a-z]+",a)
abc

Much shorter:

  • search scans through the whole string.

  • match scans only the beginning of the string.

Following Ex says it:

>>> a = "123abc"
>>> re.match("[a-z]+",a)
None
>>> re.search("[a-z]+",a)
abc
美胚控场 2024-07-14 02:11:40

您可以参考下面的示例来了解 re.match 的工作原理,并且 re.search

a = "123abc"
t = re.match("[a-z]+",a)
t = re.search("[a-z]+",a)

re.match 将返回 none,但是 re.search 将返回 abc

You can refer the below example to understand the working of re.match and re.search

a = "123abc"
t = re.match("[a-z]+",a)
t = re.search("[a-z]+",a)

re.match will return none, but re.search will return abc.

行雁书 2024-07-14 02:11:37

区别在于,re.match() 会误导任何习惯使用 Perlgrepsed 的人正则表达式匹配,而 re.search() 则不然。 :-)

更清醒的是,正如 John D. Cook 所说re.match()“表现得好像每个模式都在前面添加了 ^ ”。 换句话说,re.match('pattern') 等于re.search('^pattern')。 因此它锚定了图案的左侧。 但它也不锚定模式的右侧:仍然需要终止$

坦率地说,鉴于上述情况,我认为应该弃用 re.match() 。 我很想知道应该保留它的原因。

The difference is, re.match() misleads anyone accustomed to Perl, grep, or sed regular expression matching, and re.search() does not. :-)

More soberly, As John D. Cook remarks, re.match() "behaves as if every pattern has ^ prepended." In other words, re.match('pattern') equals re.search('^pattern'). So it anchors a pattern's left side. But it also doesn't anchor a pattern's right side: that still requires a terminating $.

Frankly given the above, I think re.match() should be deprecated. I would be interested to know reasons it should be retained.

如日中天 2024-07-14 02:11:35

re.search 在整个字符串中搜索模式,而 re.match 不搜索 模式; 如果不匹配,则除了在字符串开头匹配之外别无选择。

re.search searches for the pattern throughout the string, whereas re.match does not search the pattern; if it does not, it has no other choice than to match it at start of the string.

花辞树 2024-07-14 02:11:32

匹配比搜索快得多,因此您可以使用 regex.match((.*?)word(.*?)) 代替 regex.search("word") 并在工作时获得大量性能拥有数百万个样本。

这条评论来自 @ivan_bilan 在上面接受的答案下 让我思考这样的黑客是否真的能加速任何事情,所以让我们看看您将真正获得多少性能。

我准备了以下测试套件:

import random
import re
import string
import time

LENGTH = 10
LIST_SIZE = 1000000

def generate_word():
    word = [random.choice(string.ascii_lowercase) for _ in range(LENGTH)]
    word = ''.join(word)
    return word

wordlist = [generate_word() for _ in range(LIST_SIZE)]

start = time.time()
[re.search('python', word) for word in wordlist]
print('search:', time.time() - start)

start = time.time()
[re.match('(.*?)python(.*?)', word) for word in wordlist]
print('match:', time.time() - start)

我进行了 10 次测量(1M、2M、...、10M 个单词),得到了以下图:

匹配与搜索正则表达式速度测试线图

如您所见,搜索对于模式 'python' 比匹配模式 '(.*?)python(.*?)' 更快

Python 很聪明。 避免试图变得更聪明。

match is much faster than search, so instead of doing regex.search("word") you can do regex.match((.*?)word(.*?)) and gain tons of performance if you are working with millions of samples.

This comment from @ivan_bilan under the accepted answer above got me thinking if such hack is actually speeding anything up, so let's find out how many tons of performance you will really gain.

I prepared the following test suite:

import random
import re
import string
import time

LENGTH = 10
LIST_SIZE = 1000000

def generate_word():
    word = [random.choice(string.ascii_lowercase) for _ in range(LENGTH)]
    word = ''.join(word)
    return word

wordlist = [generate_word() for _ in range(LIST_SIZE)]

start = time.time()
[re.search('python', word) for word in wordlist]
print('search:', time.time() - start)

start = time.time()
[re.match('(.*?)python(.*?)', word) for word in wordlist]
print('match:', time.time() - start)

I made 10 measurements (1M, 2M, ..., 10M words) which gave me the following plot:

match vs. search regex speedtest line plot

As you can see, searching for the pattern 'python' is faster than matching the pattern '(.*?)python(.*?)'.

Python is smart. Avoid trying to be smarter.

一紙繁鸢 2024-07-14 02:11:26

search ⇒ 在字符串中的任意位置查找内容并返回匹配对象。

match ⇒ 在字符串的开头查找某些内容并返回一个匹配对象。

search ⇒ find something anywhere in the string and return a match object.

match ⇒ find something at the beginning of the string and return a match object.

清醇 2024-07-14 02:11:23

re.match 锚定在字符串的开头。 这与换行符无关,因此它与在模式中使用 ^ 不同。

正如 re.match 文档 所说:

如果零个或多个字符
字符串开头 匹配正则表达式模式,返回
相应的 MatchObject 实例。
如果字符串不存在,则返回 None
匹配图案; 请注意,这是
与零长度匹配不同。

注意:如果您想查找匹配项
字符串中的任何位置,使用 search()
相反。

re.search 搜索整个字符串,如 文档说:

扫描字符串寻找
正则表达式所在的位置
模式产生匹配,并返回
相应的 MatchObject 实例。
如果没有位置则返回 None
字符串与模式匹配; 注意
这与寻找一个不同
中某个点的零长度匹配
字符串。

因此,如果您需要在字符串的开头匹配,或者匹配整个字符串,请使用 match。 它更快。 否则使用搜索

该文档有一个 match 的特定部分与也涵盖多行字符串的 search 相比:

Python 提供了两种不同的原语
基于常规的操作
表达式:match 检查匹配
仅在字符串的开头
search 检查匹配项
字符串中的任意位置(这就是
Perl 默认情况下是这样做的)。

请注意,匹配可能与搜索不同
即使使用正则表达式
'^' 开头:仅匹配 '^'
在字符串的开头,或者在
MULTILINE 模式也立即
跟随换行符。 “匹配
仅当模式成功时,操作才会成功
匹配字符串的开头

无论模式如何,或在开始时
由可选的 pos 给出的位置
论点,无论是否
换行符位于其前面。

现在,说得够多了。 是时候看一些示例代码了:

# example code:
string_with_newlines = """something
someotherthing"""

import re

print re.match('some', string_with_newlines) # matches
print re.match('someother', 
               string_with_newlines) # won't match
print re.match('^someother', string_with_newlines, 
               re.MULTILINE) # also won't match
print re.search('someother', 
                string_with_newlines) # finds something
print re.search('^someother', string_with_newlines, 
                re.MULTILINE) # also finds something

m = re.compile('thing
, re.MULTILINE)

print m.match(string_with_newlines) # no match
print m.match(string_with_newlines, pos=4) # matches
print m.search(string_with_newlines, 
               re.MULTILINE) # also matches

re.match is anchored at the beginning of the string. That has nothing to do with newlines, so it is not the same as using ^ in the pattern.

As the re.match documentation says:

If zero or more characters at the
beginning of string match the regular expression pattern, return a
corresponding MatchObject instance.
Return None if the string does not
match the pattern; note that this is
different from a zero-length match.

Note: If you want to locate a match
anywhere in string, use search()
instead.

re.search searches the entire string, as the documentation says:

Scan through string looking for a
location where the regular expression
pattern produces a match, and return a
corresponding MatchObject instance.
Return None if no position in the
string matches the pattern; note that
this is different from finding a
zero-length match at some point in the
string.

So if you need to match at the beginning of the string, or to match the entire string use match. It is faster. Otherwise use search.

The documentation has a specific section for match vs. search that also covers multiline strings:

Python offers two different primitive
operations based on regular
expressions: match checks for a match
only at the beginning of the string,
while search checks for a match
anywhere in the string (this is what
Perl does by default).

Note that match may differ from search
even when using a regular expression
beginning with '^': '^' matches only
at the start of the string, or in
MULTILINE mode also immediately
following a newline. The “match
operation succeeds only if the pattern
matches at the start of the string

regardless of mode, or at the starting
position given by the optional pos
argument regardless of whether a
newline precedes it.

Now, enough talk. Time to see some example code:

# example code:
string_with_newlines = """something
someotherthing"""

import re

print re.match('some', string_with_newlines) # matches
print re.match('someother', 
               string_with_newlines) # won't match
print re.match('^someother', string_with_newlines, 
               re.MULTILINE) # also won't match
print re.search('someother', 
                string_with_newlines) # finds something
print re.search('^someother', string_with_newlines, 
                re.MULTILINE) # also finds something

m = re.compile('thing
, re.MULTILINE)

print m.match(string_with_newlines) # no match
print m.match(string_with_newlines, pos=4) # matches
print m.search(string_with_newlines, 
               re.MULTILINE) # also matches
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文