当前位置：文江博客话题详情

正则表达式匹配字符串 - 正向前瞻

发布于 2024-12-25 20:03:20 字数 310 浏览 4 评论 0原文

正则表达式：(?=(\d+))\w+\1 字符串：456x56

嗨，

我不明白这个正则表达式如何匹配字符串“456x56”中的“56x56”。

环视 (?=(\d+)) 捕获 456 并放入 \1，对于 (\d+)
单词字符 \w+ 匹配整个字符串（“456x56”）
\1，即 456，后面应该跟 \w+
回溯字符串后，应该找不到匹配项，因为前面没有单词字符“456”

但是正则表达式匹配 56x56。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

李白 2025-01-01 20:03:20

正如前面所说，您不会锚定您的正则表达式。另一个问题是 \w 也匹配数字...现在看看正则表达式引擎如何继续与您的输入匹配：

# begin
regex: |(?=(\d+))\w+\1
input: |456x56
# lookahead (first group = '456')
regex: (?=(\d+))|\w+\1
input: |456x56 
# \w+
regex: (?=(\d+))\w+|\1
input: 456x56|
# \1 cannot be satisfied: backtrack on \w+
regex: (?=(\d+))\w+|\1
input: 456x5|6 
# And again, and again... Until the beginning of the input: \1 cannot match
# Regex engine therefore decides to start from the next character:
regex: |(?=(\d+))\w+\1
input: 4|56x56
# lookahead (first group = '56')
regex: (?=(\d+))|\w+\1
input: 4|56x56
# \w+
regex: (?=(\d+))\w+|\1
input: 456x56|
# \1 cannot be satisfied: backtrack
regex: (?=(\d+))\w+|\1
input: 456x5|6
# \1 cannot be satisfied: backtrack
regex: (?=(\d+))\w+|\1
input: 456x|56
# \1 satified: match
regex: (?=(\d+))\w+\1|
input: 4<56x56>

You don't anchor your regex, as has been said. Another problem is that \w also matches digits... Now look at how the regex engine proceeds to match with your input:

# begin
regex: |(?=(\d+))\w+\1
input: |456x56
# lookahead (first group = '456')
regex: (?=(\d+))|\w+\1
input: |456x56 
# \w+
regex: (?=(\d+))\w+|\1
input: 456x56|
# \1 cannot be satisfied: backtrack on \w+
regex: (?=(\d+))\w+|\1
input: 456x5|6 
# And again, and again... Until the beginning of the input: \1 cannot match
# Regex engine therefore decides to start from the next character:
regex: |(?=(\d+))\w+\1
input: 4|56x56
# lookahead (first group = '56')
regex: (?=(\d+))|\w+\1
input: 4|56x56
# \w+
regex: (?=(\d+))\w+|\1
input: 456x56|
# \1 cannot be satisfied: backtrack
regex: (?=(\d+))\w+|\1
input: 456x5|6
# \1 cannot be satisfied: backtrack
regex: (?=(\d+))\w+|\1
input: 456x|56
# \1 satified: match
regex: (?=(\d+))\w+\1|
input: 4<56x56>

回复收藏 0 原文

下雨或天晴 2025-01-01 20:03:20

5) 正则表达式引擎得出的结论是，如果从 4 开始搜索，则无法找到匹配项，因此它会跳过一个字符并再次搜索。这次，它将两个数字捕获到 \1 中并最终匹配 56x56

如果您只想匹配整个字符串，请使用 ^(?=(\d+) )\w+\1$

^ matches beginning of string
$ matches end of string

5) Regex engines concludes that it cannot find a match if it start searching from 4, so it skips one character and searches again. This time, it captures two digits into \1 and ends up matching 56x56

If you want to match only whole strings, use ^(?=(\d+))\w+\1$

^ matches beginning of string
$ matches end of string

回复收藏 0 原文

策马西风 2025-01-01 20:03:20

运算符 + 是贪婪的，并且会根据需要回溯。前瞻 (?=(\d+)) 将匹配 456，如果正则表达式失败则匹配 56，如果正则表达式失败则匹配 6。第一次尝试：456。它匹配，组 1 包含 456。然后我们有 \w+ ，它是贪婪的，需要 456x56，没有剩下任何东西，但我们仍然必须匹配 \1即456。因此：失败。然后 \w+ 一次回溯一步，直到到达正则表达式的开头。但它仍然失败。

我们使用字符串中的一个字符。下一个回溯尝试与子字符串 56 进行前向匹配。它匹配并且组 1 包含 56。 \w+ 匹配直到字符串末尾并获得 456x56，然后我们尝试匹配 56：失败。因此 \w+ 回溯，直到字符串中剩下 56 个，然后我们就获得了全局匹配和正则表达式成功。

您应该尝试使用正则表达式伙伴调试模式。

回复收藏 0 原文

北方。的韩爷 2025-01-01 20:03:20

好吧，这就是使它成为正向先行的原因

 (?=(\d+))\w+\1

当您说第一个 \d+ 将匹配 456 时，您是正确的，因此 \1 也必须是 456，但如果是这样的话：表达式将与字符串不匹配。

x 之前和 x 之后的唯一共同字符是 56，这就是获得正匹配的原因。

Well that's what makes it a positive lookahead

 (?=(\d+))\w+\1

You are correct when you say the first \d+ will match 456, so \1 must also be 456, but if that's the case: the expression won't match the string.

The only common characters of before the x and after the x are 56, and that's what it will do to get a positive match.

回复收藏 0 原文

离旧人 2025-01-01 20:03:20

您列出的观点几乎完全错误，但不完全错误！

 1) The group  (?=(\d+)) matches a sequence of one or more digits
    not necessarily 456 
 2) \w captures any "word" character (a letter, a digit, or an underscore)
 3) \1 the is a back reference to the match in the group

因此，角色表达式意味着找到一个数字序列，后跟 s 个单词字符序列，后跟在字符前面找到的相同序列。因此比赛为 56x56。

The points you listed are almost entirely, but not quite, wrong!

 1) The group  (?=(\d+)) matches a sequence of one or more digits
    not necessarily 456 
 2) \w captures any "word" character (a letter, a digit, or an underscore)
 3) \1 the is a back reference to the match in the group

So the role expression means find a sequence of digits followed by s sequence of word characters with are followed by the same sequence that was found in front of the characters. Hence the match 56x56.

回复收藏 0 原文

~没有更多了~