“懒”做什么?和“贪婪”在正则表达式的上下文中意味着什么?

发布于 2024-08-22 15:43:48 字数 16 浏览 16 评论 0原文

这两个术语是什么?

What are these two terms in an understandable way?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(13

棒棒糖 2024-08-29 15:43:48

贪婪的人会尽可能地消耗。从 http://www.regular-expressions.info/repeat.html 我们看到尝试将 HTML 标记与 <.+> 匹配的示例。假设您有以下内容:

<em>Hello World</em>

您可能认为 <.+>. 表示任何非换行符+< /code> 表示一个或多个)只会匹配 ,而实际上它会匹配非常贪心,从第一个 < 到最后一个 >。这意味着它将匹配 Hello World 而不是您想要的。

使其变得懒惰(<.+?>)将防止这种情况发生。通过在 + 之后添加 ?,我们告诉它重复尽可能少的次数,因此第一个 > code> 遇到的,是我们想要停止匹配的地方。

顺便说一句,我建议您下载 RegExr,这是一个很棒的工具,可以帮助您探索正则表达式 -我一直用它。

Greedy will consume as much as possible. From http://www.regular-expressions.info/repeat.html we see the example of trying to match HTML tags with <.+>. Suppose you have the following:

<em>Hello World</em>

You may think that <.+> (. means any non newline character and + means one or more) would only match the <em> and the </em>, when in reality it will be very greedy, and go from the first < to the last >. This means it will match <em>Hello World</em> instead of what you wanted.

Making it lazy (<.+?>) will prevent this. By adding the ? after the +, we tell it to repeat as few times as possible, so the first > it comes across, is where we want to stop the matching.

By the way, I suggest you download RegExr, a great tool that will help you explore regular expressions - I use it all the time.

岁月流歌 2024-08-29 15:43:48

'Greedy' 表示匹配最长的可能字符串。

'Lazy' 表示匹配尽可能短的字符串。

例如,贪婪的 h.+l 匹配 'hello' 中的 'hell',但惰性的 h.+?l< /code> 匹配 'hel'

'Greedy' means match longest possible string.

'Lazy' means match shortest possible string.

For example, the greedy h.+l matches 'hell' in 'hello' but the lazy h.+?l matches 'hel'.

只涨不跌 2024-08-29 15:43:48
贪婪量词惰性量词描述
**?星号量词:​​0 或多个
++?加量词:1 或more
???可选量词:0 或 1
{n}{n}?量词:正好 n
< code>{n,}{n,}?量词:n 个或以上
{n,m}{n,m}?量词:n 到 m 之间

添加一个?到量词使其变得不贪婪,即懒惰。

示例:
测试字符串:stackoverflow
贪婪的reg表达式s.*o输出:<强>stackoverflow
惰性正则表达式s.*?o 输出: stackoverflow

Greedy quantifierLazy quantifierDescription
**?Star Quantifier: 0 or more
++?Plus Quantifier: 1 or more
???Optional Quantifier: 0 or 1
{n}{n}?Quantifier: exactly n
{n,}{n,}?Quantifier: n or more
{n,m}{n,m}?Quantifier: between n and m

Add a ? to a quantifier to make it ungreedy i.e lazy.

Example:
test string : stackoverflow
greedy reg expression : s.*o output: stackoverflow
lazy reg expression : s.*?o output: stackoverflow

空宴 2024-08-29 15:43:48

贪婪意味着你的表达式将匹配尽可能大的组,惰性意味着它将匹配尽可能小的组。对于这个字符串:

abcdefghijklmc

和这个表达式:

a.*c

贪婪匹配将匹配整个字符串,惰性匹配将仅匹配第一个 abc

Greedy means your expression will match as large a group as possible, lazy means it will match the smallest group possible. For this string:

abcdefghijklmc

and this expression:

a.*c

A greedy match will match the whole string, and a lazy match will match just the first abc.

远山浅 2024-08-29 15:43:48

据我所知,大多数正则表达式引擎默认都是贪婪的。在量词末尾添加问号将启用惰性匹配。

正如@Andre S 在评论中提到的。

  • 贪心:继续搜索,直到不满足条件为止。
  • 惰性:一旦满足条件就停止搜索。

什么是贪婪什么是懒惰,请参考下面的例子。

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {
    public static void main(String args[]){
        String money = "100000000999";
        String greedyRegex = "100(0*)";
        Pattern pattern = Pattern.compile(greedyRegex);
        Matcher matcher = pattern.matcher(money);
        while(matcher.find()){
            System.out.println("I'm greedy and I want " + matcher.group() + " dollars. This is the most I can get.");
        }
        
        String lazyRegex = "100(0*?)";
        pattern = Pattern.compile(lazyRegex);
        matcher = pattern.matcher(money);
        while(matcher.find()){
            System.out.println("I'm too lazy to get so much money, only " + matcher.group() + " dollars is enough for me");
        }
    }
}

The result is:

I'm greedy and I want 100000000 dollars. This is the most I can get.

I'm too lazy to get so much money, only 100 dollars is enough for me

As far as I know, most regex engine is greedy by default. Add a question mark at the end of quantifier will enable lazy match.

As @Andre S mentioned in comment.

  • Greedy: Keep searching until condition is not satisfied.
  • Lazy: Stop searching once condition is satisfied.

Refer to the example below for what is greedy and what is lazy.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {
    public static void main(String args[]){
        String money = "100000000999";
        String greedyRegex = "100(0*)";
        Pattern pattern = Pattern.compile(greedyRegex);
        Matcher matcher = pattern.matcher(money);
        while(matcher.find()){
            System.out.println("I'm greedy and I want " + matcher.group() + " dollars. This is the most I can get.");
        }
        
        String lazyRegex = "100(0*?)";
        pattern = Pattern.compile(lazyRegex);
        matcher = pattern.matcher(money);
        while(matcher.find()){
            System.out.println("I'm too lazy to get so much money, only " + matcher.group() + " dollars is enough for me");
        }
    }
}

The result is:

I'm greedy and I want 100000000 dollars. This is the most I can get.

I'm too lazy to get so much money, only 100 dollars is enough for me
乙白 2024-08-29 15:43:48

摘自 www.regular-expressions.info

贪婪:贪婪量词首先尝试多次重复标记
尽可能地,并随着引擎回溯寻找而逐渐放弃匹配
一场整体的比赛。

惰性:惰性量词首先根据需要重复标记几次,然后
随着引擎通过正则表达式回溯,逐渐扩大匹配范围
找到一个整体匹配。

Taken From www.regular-expressions.info

Greediness: Greedy quantifiers first tries to repeat the token as many times
as possible, and gradually gives up matches as the engine backtracks to find
an overall match.

Laziness: Lazy quantifier first repeats the token as few times as required, and
gradually expands the match as the engine backtracks through the regex to
find an overall match.

深空失忆 2024-08-29 15:43:48

来自正则表达式

正则中的标准量词
表达式是贪婪的,这意味着它们
尽可能多地匹配,只给予
根据需要返回以匹配
正则表达式的其余部分。

通过使用惰性量词,
表达式尝试最小匹配
首先。

From Regular expression

The standard quantifiers in regular
expressions are greedy, meaning they
match as much as they can, only giving
back as necessary to match the
remainder of the regex.

By using a lazy quantifier, the
expression tries the minimal match
first.

惟欲睡 2024-08-29 15:43:48

贪婪匹配。正则表达式的默认行为是贪婪的。这意味着它会尝试尽可能多地提取,直到符合模式,即使较小的部分在语法上就足够了。

示例:

import re
text = "<body>Regex Greedy Matching Example </body>"
re.findall('<.*>', text)
#> ['<body>Regex Greedy Matching Example </body>']

它不是匹配到第一次出现“>”,而是提取整个字符串。这是正则表达式默认的贪婪或“通吃”行为。

另一方面,惰性匹配“花费尽可能少的时间”。这可以通过在模式末尾添加 ? 来实现。

示例:

re.findall('<.*?>', text)
#> ['<body>', '</body>']

如果您只想检索第一个匹配项,请改用搜索方法。

re.search('<.*?>', text).group()
#> '<body>'

来源:Python 正则表达式示例

Greedy matching. The default behavior of regular expressions is to be greedy. That means it tries to extract as much as possible until it conforms to a pattern even when a smaller part would have been syntactically sufficient.

Example:

import re
text = "<body>Regex Greedy Matching Example </body>"
re.findall('<.*>', text)
#> ['<body>Regex Greedy Matching Example </body>']

Instead of matching till the first occurrence of ‘>’, it extracted the whole string. This is the default greedy or ‘take it all’ behavior of regex.

Lazy matching, on the other hand, ‘takes as little as possible’. This can be effected by adding a ? at the end of the pattern.

Example:

re.findall('<.*?>', text)
#> ['<body>', '</body>']

If you want only the first match to be retrieved, use the search method instead.

re.search('<.*?>', text).group()
#> '<body>'

Source: Python Regex Examples

云淡月浅 2024-08-29 15:43:48

贪婪量词就像国税局一样,

他们会拿走尽可能多的东西。例如与此正则表达式匹配: .*

$50,000

与该正则表达式吻别您的银行余额!

请参阅此处的示例:贪婪示例

非贪婪量词 - 它们尽可能少地

请求退税:国税局突然变得不贪婪 - 并尽可能少地退税:即他们使用这个量词:

(.{2,5}?)([0-9]*) 反对这一点输入:$50,000

第一组是非需求组,仅匹配 $5 – 因此我根据 $50,000 输入获得 $5 退款。

请参阅此处:非贪婪示例

为什么我们需要贪婪与非贪婪?

如果您尝试匹配表达式的某些部分,这一点就变得很重要。有时您不想匹配所有内容 - 尽可能少。有时您希望尽可能匹配。仅此而已。

您可以尝试上面发布的链接中的示例。

(用于帮助您记住的类比)。

Greedy Quantifiers are like the IRS

They’ll take as much as they can. e.g. matches with this regex: .*

$50,000

Kiss your bank balance goodbye with that regex!

See here for an example: Greedy-example

Non-greedy quantifiers - they take as little as they can

Ask for a tax refund: the IRS sudden becomes non-greedy - and return as little as possible: i.e. they use this quantifier:

(.{2,5}?)([0-9]*) against this input: $50,000

The first group is non-needy and only matches $5 – so I get a $5 refund against the $50,000 input.

See here: Non-greedy-example.

Why do we need greedy vs non-greedy?

It becomes important if you are trying to match certain parts of an expression. Sometimes you don't want to match everything - as little as possible. Sometimes you want to match as much as possible. Nothing more to it.

You can play around with the examples in the links posted above.

(Analogy used to help you remember).

悲喜皆因你 2024-08-29 15:43:48

最好通过例子来展示。细绳。 192.168.1.1 和贪婪的正则表达式 \b.+\b
您可能认为这会给您第一个八位字节,但实际上与整个字符串匹配。为什么?因为.+是贪婪的,贪婪匹配会匹配192.168.1.1中的每个字符,直到到达字符串末尾。这是重要的一点!现在它开始一次回溯一个字符,直到找到第三个标记 (\b) 的匹配项。

如果字符串 4GB 文本文件和 192.168.1.1 位于开头,您可以很容易地看到这种回溯将如何导致问题。

要使正则表达式非贪婪(惰性),请在贪婪搜索后添加问号,例如

*?
??
+?

现在发生的情况是标记 2 (+?) 找到匹配项,正则表达式沿字符移动,然后尝试下一个标记(\b) 而不是标记 2 (+?)。所以它小心翼翼地爬行。

Best shown by example. String. 192.168.1.1 and a greedy regex \b.+\b
You might think this would give you the 1st octet but is actually matches against the whole string. Why? Because the.+ is greedy and a greedy match matches every character in 192.168.1.1 until it reaches the end of the string. This is the important bit! Now it starts to backtrack one character at a time until it finds a match for the 3rd token (\b).

If the string a 4GB text file and 192.168.1.1 was at the start you could easily see how this backtracking would cause an issue.

To make a regex non greedy (lazy) put a question mark after your greedy search e.g

*?
??
+?

What happens now is token 2 (+?) finds a match, regex moves along a character and then tries the next token (\b) rather than token 2 (+?). So it creeps along gingerly.

靖瑶 2024-08-29 15:43:48

贪婪意味着它会消耗你的模式,直到没有任何模式剩下,并且它无法再继续寻找。

Lazy 一旦遇到您请求的第一个模式就会停止。

我经常遇到的一个常见示例是正则表达式 ([0-9]{2}\s*-\s*?[0-9 ]{7})

第一个 \s* 由于 * 被归类为贪婪,并且在遇到数字后会查找尽可能多的空格,并且然后寻找破折号字符“-”。由于存在 *?,第二个 \s*? 是惰性的,这意味着它将查找第一个空白字符并在那里停止。

Greedy means it will consume your pattern until there are none of them left and it can look no further.

Lazy will stop as soon as it will encounter the first pattern you requested.

One common example that I often encounter is \s*-\s*? of a regex ([0-9]{2}\s*-\s*?[0-9]{7})

The first \s* is classified as greedy because of * and will look as many white spaces as possible after the digits are encountered and then look for a dash character "-". Where as the second \s*? is lazy because of the present of *? which means that it will look the first white space character and stop right there.

孤凫 2024-08-29 15:43:48

为了对懒惰进行额外说明,这里有一个例子,乍一看可能不直观,但解释了 Suganthan Madhavan Pillai 答案中“逐渐扩大匹配”的想法。

input -> [email protected]@
regex -> ^.*?@$

此输入的正则表达式将有一个匹配。乍一看,有人可能会说 LAZY match(".*?@") 将在第一个 @ 处停止,然后检查输入字符串是否结束("$")。按照这个逻辑,有人会得出结论,没有匹配,因为输入字符串在第一个 @ 之后没有结束。

但正如您所看到的,情况并非如此,即使我们使用非贪婪(惰性模式)搜索,正则表达式也会继续前进,直到它命中第二个@并具有最小匹配。

To give extra clarification on Laziness, here is one example which is maybe not intuitive on first look but explains idea of "gradually expands the match" from Suganthan Madhavan Pillai answer.

input -> [email protected]@
regex -> ^.*?@$

Regex for this input will have a match. At first glance somebody could say LAZY match(".*?@") will stop at first @ after which it will check that input string ends("$"). Following this logic someone would conclude there is no match because input string doesn't end after first @.

But as you can see this is not the case, regex will go forward even though we are using non-greedy(lazy mode) search until it hits second @ and have a MINIMAL match.

携余温的黄昏 2024-08-29 15:43:48

尝试理解以下行为:

    var input = "0014.2";

Regex r1 = new Regex("\\d+.{0,1}\\d+");
Regex r2 = new Regex("\\d*.{0,1}\\d*");

Console.WriteLine(r1.Match(input).Value); // "0014.2"
Console.WriteLine(r2.Match(input).Value); // "0014.2"

input = " 0014.2";

Console.WriteLine(r1.Match(input).Value); // "0014.2"
Console.WriteLine(r2.Match(input).Value); // " 0014"

input = "  0014.2";

Console.WriteLine(r1.Match(input).Value); // "0014.2"
Console.WriteLine(r2.Match(input).Value); // ""

try to understand the following behavior:

    var input = "0014.2";

Regex r1 = new Regex("\\d+.{0,1}\\d+");
Regex r2 = new Regex("\\d*.{0,1}\\d*");

Console.WriteLine(r1.Match(input).Value); // "0014.2"
Console.WriteLine(r2.Match(input).Value); // "0014.2"

input = " 0014.2";

Console.WriteLine(r1.Match(input).Value); // "0014.2"
Console.WriteLine(r2.Match(input).Value); // " 0014"

input = "  0014.2";

Console.WriteLine(r1.Match(input).Value); // "0014.2"
Console.WriteLine(r2.Match(input).Value); // ""
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文