正则表达式匹配字符串 - 正向前瞻

发布于 2024-12-25 20:03:20 字数 310 浏览 4 评论 0原文

正则表达式:(?=(\d+))\w+\1 字符串:456x56

嗨,

我不明白这个正则表达式如何匹配字符串“456x56”中的“56x56”。

  1. 环视 (?=(\d+)) 捕获 456 并放入 \1,对于 (\d+)
  2. 单词字符 \w+ 匹配整个字符串(“456x56”)
  3. \1,即 456,后面应该跟 \w+
  4. 回溯字符串后,应该找不到匹配项,因为前面没有单词字符“456”

但是正则表达式匹配 56x56。

Regexp: (?=(\d+))\w+\1
String: 456x56

Hi,

I am not getting the concept, how this regex matches "56x56" in the string "456x56".

  1. The lookaround, (?=(\d+)), captures 456 and put into \1, for (\d+)
  2. The wordcharacter, \w+, matches the whole string("456x56")
  3. \1, which is 456, should be followed by \w+
  4. After backtracking the string, it should not find a match, as there is no "456" preceded by a word character

However the regexp matches 56x56.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

李白 2025-01-01 20:03:20

正如前面所说,您不会锚定您的正则表达式。另一个问题是 \w 也匹配数字...现在看看正则表达式引擎如何继续与您的输入匹配:

# begin
regex: |(?=(\d+))\w+\1
input: |456x56
# lookahead (first group = '456')
regex: (?=(\d+))|\w+\1
input: |456x56 
# \w+
regex: (?=(\d+))\w+|\1
input: 456x56|
# \1 cannot be satisfied: backtrack on \w+
regex: (?=(\d+))\w+|\1
input: 456x5|6 
# And again, and again... Until the beginning of the input: \1 cannot match
# Regex engine therefore decides to start from the next character:
regex: |(?=(\d+))\w+\1
input: 4|56x56
# lookahead (first group = '56')
regex: (?=(\d+))|\w+\1
input: 4|56x56
# \w+
regex: (?=(\d+))\w+|\1
input: 456x56|
# \1 cannot be satisfied: backtrack
regex: (?=(\d+))\w+|\1
input: 456x5|6
# \1 cannot be satisfied: backtrack
regex: (?=(\d+))\w+|\1
input: 456x|56
# \1 satified: match
regex: (?=(\d+))\w+\1|
input: 4<56x56>

You don't anchor your regex, as has been said. Another problem is that \w also matches digits... Now look at how the regex engine proceeds to match with your input:

# begin
regex: |(?=(\d+))\w+\1
input: |456x56
# lookahead (first group = '456')
regex: (?=(\d+))|\w+\1
input: |456x56 
# \w+
regex: (?=(\d+))\w+|\1
input: 456x56|
# \1 cannot be satisfied: backtrack on \w+
regex: (?=(\d+))\w+|\1
input: 456x5|6 
# And again, and again... Until the beginning of the input: \1 cannot match
# Regex engine therefore decides to start from the next character:
regex: |(?=(\d+))\w+\1
input: 4|56x56
# lookahead (first group = '56')
regex: (?=(\d+))|\w+\1
input: 4|56x56
# \w+
regex: (?=(\d+))\w+|\1
input: 456x56|
# \1 cannot be satisfied: backtrack
regex: (?=(\d+))\w+|\1
input: 456x5|6
# \1 cannot be satisfied: backtrack
regex: (?=(\d+))\w+|\1
input: 456x|56
# \1 satified: match
regex: (?=(\d+))\w+\1|
input: 4<56x56>
下雨或天晴 2025-01-01 20:03:20

5) 正则表达式引擎得出的结论是,如果从 4 开始搜索,则无法找到匹配项,因此它会跳过一个字符并再次搜索。这次,它将两个数字捕获到 \1 中并最终匹配 56x56

如果您只想匹配整个字符串,请使用 ^(?=(\d+) )\w+\1$

^ matches beginning of string
$ matches end of string

5) Regex engines concludes that it cannot find a match if it start searching from 4, so it skips one character and searches again. This time, it captures two digits into \1 and ends up matching 56x56

If you want to match only whole strings, use ^(?=(\d+))\w+\1$

^ matches beginning of string
$ matches end of string
策马西风 2025-01-01 20:03:20

运算符 + 是贪婪的,并且会根据需要回溯。前瞻 (?=(\d+)) 将匹配 456,如果正则表达式失败则匹配 56,如果正则表达式失败则匹配 6。第一次尝试:456。它匹配,组 1 包含 456。然后我们有 \w+ ,它是贪婪的,需要 456x56,没有剩下任何东西,但我们仍然必须匹配 \1即456。因此:失败。然后 \w+ 一次回溯一步,直到到达正则表达式的开头。但它仍然失败。

我们使用字符串中的一个字符。下一个回溯尝试与子字符串 56 进行前向匹配。它匹配并且组 1 包含 56。 \w+ 匹配直到字符串末尾并获得 456x56,然后我们尝试匹配 56:失败。因此 \w+ 回溯,直到字符串中剩下 56 个,然后我们就获得了全局匹配和正则表达式成功。

您应该尝试使用正则表达式伙伴调试模式。

The operator + is greedy and backtracks as necessary. The lookahead (?=(\d+)) will match 456 then 56 if the regex fails then 6 if the regex fails. First attempt: 456. It matches, the group 1 contains 456. Then we have \w+ which is greedy and takes 456x56, there is nothing left but we still have to match \1 i.e. 456. Thus: failure. Then \w+ backtraks one step at a time till we get to the beginning of the regex. And it still fails.

We consume a character from the string. Next backtrack is trying to lookahead match with substring 56. it matches and the group 1 contains 56. \w+ matches until the end of the string and gets 456x56 and then we try to match 56: failure. So \w+ bactracks until we have 56 left in the string and then we have a global match and regex success.

You should try it with regex buddy debug mode.

北方。的韩爷 2025-01-01 20:03:20

好吧,这就是使它成为正向先行的原因

 (?=(\d+))\w+\1

当您说第一个 \d+ 将匹配 456 时,您是正确的,因此 \1 也必须是 456,但如果是这样的话:表达式将与字符串不匹配。

x 之前和 x 之后的唯一共同字符是 56,这就是获得正匹配的原因。

Well that's what makes it a positive lookahead

 (?=(\d+))\w+\1

You are correct when you say the first \d+ will match 456, so \1 must also be 456, but if that's the case: the expression won't match the string.

The only common characters of before the x and after the x are 56, and that's what it will do to get a positive match.

离旧人 2025-01-01 20:03:20

您列出的观点几乎完全错误,但不完全错误!

 1) The group  (?=(\d+)) matches a sequence of one or more digits
    not necessarily 456 
 2) \w captures any "word" character (a letter, a digit, or an underscore)
 3) \1 the is a back reference to the match in the group

因此,角色表达式意味着找到一个数字序列,后跟 s 个单词字符序列,后跟在字符前面找到的相同序列。因此比赛为 56x56。

The points you listed are almost entirely, but not quite, wrong!

 1) The group  (?=(\d+)) matches a sequence of one or more digits
    not necessarily 456 
 2) \w captures any "word" character (a letter, a digit, or an underscore)
 3) \1 the is a back reference to the match in the group

So the role expression means find a sequence of digits followed by s sequence of word characters with are followed by the same sequence that was found in front of the characters. Hence the match 56x56.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文