当前位置：文江博客话题详情

re.search 和 re.match 有什么区别？

发布于 2024-07-07 02:10:22 字数 360 浏览 11 评论 0 原文

Python re 模块中的 search() 和 match() 函数有什么区别？

我已阅读 Python 2 文档< /a> （Python 3 文档），但我似乎从来不记得它。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夜巴黎 2024-07-14 02:11:50

re.match 锚定在字符串的开头，而 re.search 则扫描整个字符串。因此，在下面的示例中，x 和 y 匹配相同的内容。

x = re.match('pat', s)       # <--- already anchored at the beginning of string
y = re.search('\Apat', s)    # <--- match at the beginning

如果字符串不包含换行符，则 \A 和 ^ 本质上是相同的；差异显示在多行字符串中。在以下示例中，re.match 永远不会匹配第二行，而 re.search 可以使用正确的正则表达式（和标志）。

s = "1\n2"
re.match('2', s, re.M)       # no match
re.search('^2', s, re.M)     # match
re.search('\A2', s, re.M)    # no match  <--- mimics `re.match`

re 中还有一个函数 re.fullmatch() 会扫描整个字符串，因此它会同时锚定在字符串的开头和结尾。因此，在下面的示例中，x、y 和 z 匹配相同的内容。

x = re.match('pat\Z', s)     # <--- already anchored at the beginning; must match end
y = re.search('\Apat\Z', s)  # <--- match at the beginning and end of string
z = re.fullmatch('pat', s)   # <--- already anchored at the beginning and end

基于 Jeyekomon 的答案（并使用他们的设置），使用 perfplot 库，我绘制了 timeit 测试的结果，该测试调查了：

如果re.search“模仿”re.match，他们如何比较？（第一个图）
如果re.match“模仿”re.search，他们如何比较？（第二个图）
如果将相同的模式传递给他们，他们如何比较？（最后一个图）

请注意，最后一个模式不会产生相同的输出（因为 re.match 锚定在字符串的开头。）

第一个图显示 match 一样使用 search，>match 会更快。第二个图支持@Jeyekomon的答案，并显示如果像search一样使用match，search会更快。最后一张图显示，如果两者扫描相同的模式，则两者之间几乎没有什么区别。

用于生成性能图的代码。

import re
from random import choices
from string import ascii_lowercase
import matplotlib.pyplot as plt
from perfplot import plot

patterns = [
    [re.compile(r'\Aword'), re.compile(r'word')],
    [re.compile(r'word'), re.compile(r'(.*?)word')],
    [re.compile(r'word')]*2
]

fig, axs = plt.subplots(1, 3, figsize=(20,6), facecolor='white')
for i, (pat1, pat2) in enumerate(patterns):
    plt.sca(axs[i])
    perfplot.plot(
        setup=lambda n: [''.join(choices(ascii_lowercase, k=10)) for _ in range(n)],
        kernels=[lambda lst: [*map(pat1.search, lst)], lambda lst: [*map(pat2.match, lst)]],
        labels= [f"re.search(r'{pat1.pattern}', w)", f"re.match(r'{pat2.pattern}', w)"],
        n_range=[2**k for k in range(24)],
        xlabel='Length of list',
        equality_check=None
    )
fig.suptitle('re.match vs re.search')
fig.tight_layout();

re.match is anchored at the beginning of a string, while re.search scans through the entire string. So in the following example, x and y match the same thing.

x = re.match('pat', s)       # <--- already anchored at the beginning of string
y = re.search('\Apat', s)    # <--- match at the beginning

If a string doesn't contain line breaks, \A and ^ are essentially the same; the difference shows up in multiline strings. In the following example, re.match will never match the second line, while re.search can with the correct regex (and flag).

s = "1\n2"
re.match('2', s, re.M)       # no match
re.search('^2', s, re.M)     # match
re.search('\A2', s, re.M)    # no match  <--- mimics `re.match`

There's another function in re, re.fullmatch() that scans the entire string, so it is anchored both at the beginning and the end of a string. So in the following example, x, y and z match the same thing.

x = re.match('pat\Z', s)     # <--- already anchored at the beginning; must match end
y = re.search('\Apat\Z', s)  # <--- match at the beginning and end of string
z = re.fullmatch('pat', s)   # <--- already anchored at the beginning and end

Based on Jeyekomon's answer (and using their setup), using the perfplot library, I plotted the results of timeit tests that looks into:

how do they compare if re.search "mimics" re.match? (first plot)
how do they compare if re.match "mimics" re.search? (second plot)
how do they compare if the same pattern is passed to them? (last plot)

Note that the last pattern doesn't produce the same output (because re.match is anchored at the beginning of a string.)

The first plot shows match is faster if search is used like match. The second plot supports @Jeyekomon's answer and shows search is faster if match is used like search. The last plot shows there's very little difference between the two if they scan for the same pattern.

Code used to produce the performance plot.

import re
from random import choices
from string import ascii_lowercase
import matplotlib.pyplot as plt
from perfplot import plot

patterns = [
    [re.compile(r'\Aword'), re.compile(r'word')],
    [re.compile(r'word'), re.compile(r'(.*?)word')],
    [re.compile(r'word')]*2
]

fig, axs = plt.subplots(1, 3, figsize=(20,6), facecolor='white')
for i, (pat1, pat2) in enumerate(patterns):
    plt.sca(axs[i])
    perfplot.plot(
        setup=lambda n: [''.join(choices(ascii_lowercase, k=10)) for _ in range(n)],
        kernels=[lambda lst: [*map(pat1.search, lst)], lambda lst: [*map(pat2.match, lst)]],
        labels= [f"re.search(r'{pat1.pattern}', w)", f"re.match(r'{pat2.pattern}', w)"],
        n_range=[2**k for k in range(24)],
        xlabel='Length of list',
        equality_check=None
    )
fig.suptitle('re.match vs re.search')
fig.tight_layout();

回复收藏 0 原文

花落人断肠 2024-07-14 02:11:48

快速回答

re.search('test', ' test')      # returns a Truthy match object (because the search starts from any index) 

re.match('test', ' test')       # returns None (because the search start from 0 index)
re.match('test', 'test')        # returns a Truthy match object (match at 0 index)

Quick answer

re.search('test', ' test')      # returns a Truthy match object (because the search starts from any index) 

re.match('test', ' test')       # returns None (because the search start from 0 index)
re.match('test', 'test')        # returns a Truthy match object (match at 0 index)

回复收藏 0 原文

暮年 2024-07-14 02:11:45

re.match 尝试匹配字符串开头的模式。 re.search 尝试整个字符串匹配模式，直到找到匹配项。

回复收藏 0 原文

肩上的翅膀 2024-07-14 02:11:43

更短：

search 扫描整个字符串。
match 仅扫描字符串的开头。

以下 Ex 说：

>>> a = "123abc"
>>> re.match("[a-z]+",a)
None
>>> re.search("[a-z]+",a)
abc

Much shorter:

search scans through the whole string.
match scans only the beginning of the string.

Following Ex says it:

>>> a = "123abc"
>>> re.match("[a-z]+",a)
None
>>> re.search("[a-z]+",a)
abc

回复收藏 0 原文

美胚控场 2024-07-14 02:11:40

您可以参考下面的示例来了解 re.match 的工作原理，并且 re.search

a = "123abc"
t = re.match("[a-z]+",a)
t = re.search("[a-z]+",a)

re.match 将返回 none，但是 re.search 将返回 abc。

You can refer the below example to understand the working of re.match and re.search

a = "123abc"
t = re.match("[a-z]+",a)
t = re.search("[a-z]+",a)

re.match will return none, but re.search will return abc.

回复收藏 0 原文

行雁书 2024-07-14 02:11:37

区别在于，re.match() 会误导任何习惯使用 Perl、grep 或 sed 的人正则表达式匹配，而 re.search() 则不然。 :-)

更清醒的是，正如 John D. Cook 所说，re.match()“表现得好像每个模式都在前面添加了 ^ ”。换句话说，re.match('pattern') 等于re.search('^pattern')。因此它锚定了图案的左侧。但它也不锚定模式的右侧：仍然需要终止$。

坦率地说，鉴于上述情况，我认为应该弃用 re.match() 。我很想知道应该保留它的原因。

回复收藏 0 原文

如日中天 2024-07-14 02:11:35

re.search 在整个字符串中搜索模式，而 re.match 不搜索模式；如果不匹配，则除了在字符串开头匹配之外别无选择。

回复收藏 0 原文

花辞树 2024-07-14 02:11:32

匹配比搜索快得多，因此您可以使用 regex.match((.*?)word(.*?)) 代替 regex.search("word") 并在工作时获得大量性能拥有数百万个样本。

这条评论来自 @ivan_bilan 在上面接受的答案下让我思考这样的黑客是否真的能加速任何事情，所以让我们看看您将真正获得多少性能。

我准备了以下测试套件：

import random
import re
import string
import time

LENGTH = 10
LIST_SIZE = 1000000

def generate_word():
    word = [random.choice(string.ascii_lowercase) for _ in range(LENGTH)]
    word = ''.join(word)
    return word

wordlist = [generate_word() for _ in range(LIST_SIZE)]

start = time.time()
[re.search('python', word) for word in wordlist]
print('search:', time.time() - start)

start = time.time()
[re.match('(.*?)python(.*?)', word) for word in wordlist]
print('match:', time.time() - start)

我进行了 10 次测量（1M、2M、...、10M 个单词），得到了以下图：

如您所见，搜索对于模式 'python' 比匹配模式 '(.*?)python(.*?)' 更快。

Python 很聪明。避免试图变得更聪明。

match is much faster than search, so instead of doing regex.search("word") you can do regex.match((.*?)word(.*?)) and gain tons of performance if you are working with millions of samples.

This comment from @ivan_bilan under the accepted answer above got me thinking if such hack is actually speeding anything up, so let's find out how many tons of performance you will really gain.

I prepared the following test suite:

import random
import re
import string
import time

LENGTH = 10
LIST_SIZE = 1000000

def generate_word():
    word = [random.choice(string.ascii_lowercase) for _ in range(LENGTH)]
    word = ''.join(word)
    return word

wordlist = [generate_word() for _ in range(LIST_SIZE)]

start = time.time()
[re.search('python', word) for word in wordlist]
print('search:', time.time() - start)

start = time.time()
[re.match('(.*?)python(.*?)', word) for word in wordlist]
print('match:', time.time() - start)

I made 10 measurements (1M, 2M, ..., 10M words) which gave me the following plot:

As you can see, searching for the pattern 'python' is faster than matching the pattern '(.*?)python(.*?)'.

Python is smart. Avoid trying to be smarter.

回复收藏 0 原文

一紙繁鸢 2024-07-14 02:11:26

search ⇒ 在字符串中的任意位置查找内容并返回匹配对象。

match ⇒ 在字符串的开头查找某些内容并返回一个匹配对象。

回复收藏 0 原文

清醇 2024-07-14 02:11:23

re.match 锚定在字符串的开头。这与换行符无关，因此它与在模式中使用 ^ 不同。

正如 re.match 文档所说：

如果零个或多个字符
字符串开头 匹配正则表达式模式，返回
相应的 MatchObject 实例。
如果字符串不存在，则返回 None
匹配图案；请注意，这是
与零长度匹配不同。

注意：如果您想查找匹配项
字符串中的任何位置，使用 search()
相反。

re.search 搜索整个字符串，如文档说：

扫描字符串寻找
正则表达式所在的位置
模式产生匹配，并返回
相应的 MatchObject 实例。
如果没有位置则返回 None
字符串与模式匹配；注意
这与寻找一个不同
中某个点的零长度匹配
字符串。

因此，如果您需要在字符串的开头匹配，或者匹配整个字符串，请使用 match。它更快。否则使用搜索。

该文档有一个 match 的特定部分与也涵盖多行字符串的 search 相比：

Python 提供了两种不同的原语
基于常规的操作
表达式：match 检查匹配
仅在字符串的开头，
search 检查匹配项
字符串中的任意位置（这就是
Perl 默认情况下是这样做的）。

请注意，匹配可能与搜索不同
即使使用正则表达式
以 '^' 开头：仅匹配 '^'
在字符串的开头，或者在
MULTILINE 模式也立即
跟随换行符。 “匹配”
仅当模式成功时，操作才会成功
匹配字符串的开头
无论模式如何，或在开始时
由可选的 pos 给出的位置
论点，无论是否
换行符位于其前面。

现在，说得够多了。是时候看一些示例代码了：

# example code:
string_with_newlines = """something
someotherthing"""

import re

print re.match('some', string_with_newlines) # matches
print re.match('someother', 
               string_with_newlines) # won't match
print re.match('^someother', string_with_newlines, 
               re.MULTILINE) # also won't match
print re.search('someother', 
                string_with_newlines) # finds something
print re.search('^someother', string_with_newlines, 
                re.MULTILINE) # also finds something

m = re.compile('thing
, re.MULTILINE)

print m.match(string_with_newlines) # no match
print m.match(string_with_newlines, pos=4) # matches
print m.search(string_with_newlines, 
               re.MULTILINE) # also matches

re.match is anchored at the beginning of the string. That has nothing to do with newlines, so it is not the same as using ^ in the pattern.

As the re.match documentation says:

If zero or more characters at the
beginning of string match the regular expression pattern, return a
corresponding MatchObject instance.
Return None if the string does not
match the pattern; note that this is
different from a zero-length match.

Note: If you want to locate a match
anywhere in string, use search()
instead.

re.search searches the entire string, as the documentation says:

Scan through string looking for a
location where the regular expression
pattern produces a match, and return a
corresponding MatchObject instance.
Return None if no position in the
string matches the pattern; note that
this is different from finding a
zero-length match at some point in the
string.

So if you need to match at the beginning of the string, or to match the entire string use match. It is faster. Otherwise use search.

The documentation has a specific section for match vs. search that also covers multiline strings:

Python offers two different primitive
operations based on regular
expressions: match checks for a match
only at the beginning of the string,
while search checks for a match
anywhere in the string (this is what
Perl does by default).

Note that match may differ from search
even when using a regular expression
beginning with '^': '^' matches only
at the start of the string, or in
MULTILINE mode also immediately
following a newline. The “match”
operation succeeds only if the pattern
matches at the start of the string
regardless of mode, or at the starting
position given by the optional pos
argument regardless of whether a
newline precedes it.

Now, enough talk. Time to see some example code:

# example code:
string_with_newlines = """something
someotherthing"""

import re

print re.match('some', string_with_newlines) # matches
print re.match('someother', 
               string_with_newlines) # won't match
print re.match('^someother', string_with_newlines, 
               re.MULTILINE) # also won't match
print re.search('someother', 
                string_with_newlines) # finds something
print re.search('^someother', string_with_newlines, 
                re.MULTILINE) # also finds something

m = re.compile('thing
, re.MULTILINE)

print m.match(string_with_newlines) # no match
print m.match(string_with_newlines, pos=4) # matches
print m.search(string_with_newlines, 
               re.MULTILINE) # also matches

回复收藏 0 原文

~没有更多了~

关于作者

凉宸

暂无简介

0 文章

0 评论

21 人气

关注发私信

友情链接

文江博客

re.search 和 re.match 有什么区别？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（10）

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

re.search 和 re.match 有什么区别？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（10）

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。