正则表达式:负向后看和否定之间的区别

发布于 2024-12-03 01:03:49 字数 438 浏览 3 评论 0原文

来自 regular-expressions.info

<代码>\b\w+(?。这绝对与 \b\w+[^s]\b 不同。当应用于 Jon's 时,前者将匹配 Jon,后者将匹配 Jon'(包括撇号)。我将让你找出原因。 (提示:\b 匹配撇号和 s)。后者也不会匹配“a”或“I”等单字母单词。

你能解释一下为什么吗?

另外,您能否清楚地说明 \b 的作用,以及为什么它在撇号和 s 之间匹配?

From regular-expressions.info:

\b\w+(?<!s)\b. This is definitely not the same as \b\w+[^s]\b. When applied to Jon's, the former will match Jon and the latter Jon' (including the apostrophe). I will leave it up to you to figure out why. (Hint: \b matches between the apostrophe and the s). The latter will also not match single-letter words like "a" or "I".

Can you explain why ?

Also, can you make clear what exacly \b does, and why it matches between the apostrophe and the s ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

鱼窥荷 2024-12-10 01:03:49

\b 是一个零宽度断言,表示单词边界 。这些字符位置(取自该链接)被视为单词边界:

  • 如果第一个字符是单词字符,则在字符串中的第一个字符之前。
  • 在字符串中的最后一个字符之后,如果最后一个字符是单词字符。
  • 字符串中的两个字符之间,其中一个是单词字符,另一个不是单词字符。

单词字符当然是任何\ws 是单词字符,但 ' 不是。在上面的示例中,'s 之间的区域是单词边界。

如果我突出显示锚点和边界,字符串 "Jon's" 看起来像这样(第一个和最后一个 \b 出现在与 ^$): ^Jon\b'\bs$

否定后向断言 (? 意味着它将仅匹配单词边界(如果是)前面没有字母s(即最后一个单词字符不是s)。所以它在一定条件下寻找单词边界。

因此,第一个正则表达式的工作方式如下:

  1. \b\w+ 匹配前三个字母 J o n >.

  2. 如上所示,n' 之间实际上还有另一个单词边界,因此 (? 匹配 这个单词边界,因为它前面是 n,而不是 s

  3. 由于已到达模式末尾,因此生成的匹配项为 Jon

互补字符类 [^s]\b 表示它将匹配除字母 s 之外的任何字符,后跟单词边界。与上面的不同,这会查找一个字符,后跟一个单词边界。

因此,第二个正则表达式的工作方式如下:

  1. \b\w+ 匹配前三个字母 J o n >.

  2. 由于'不是字母s(它满足字符类[^s]),并且它后面跟着一个单词边界('s 之间),因此匹配。

  3. 由于已到达模式末尾,因此结果匹配为 Jon'。字母s匹配,因为其之前的单词边界已被匹配。

\b is a zero-width assertion that means word boundary. These character positions (taken from that link) are considered word boundaries:

  • Before the first character in the string, if the first character is a word character.
  • After the last character in the string, if the last character is a word character.
  • Between two characters in the string, where one is a word character and the other is not a word character.

Word characters are of course any \w. s is a word character, but ' is not. In the above example, the area between the ' and the s is a word boundary.

The string "Jon's" looks like this if I highlight the anchors and boundaries (the first and last \bs occur in the same positions as ^ and $): ^Jon\b'\bs$

The negative lookbehind assertion (?<!s)\b means it will only match a word boundary if it's not preceded by the letter s (i.e. the last word character is not an s). So it looks for a word boundary under a certain condition.

Therefore the first regex works like this:

  1. \b\w+ matches the first three letters J o n.

  2. There's actually another word boundary between n and ' as shown above, so (?<!s)\b matches this word boundary because it's preceded by an n, not an s.

  3. Since the end of the pattern has been reached, the resultant match is Jon.

The complementary character class [^s]\b means it will match any character that is not the letter s, followed by a word boundary. Unlike the above, this looks for one character followed by a word boundary.

Therefore the second regex works like this:

  1. \b\w+ matches the first three letters J o n.

  2. Since the ' is not the letter s (it fulfills the character class [^s]), and it's followed by a word boundary (between ' and s), it's matched.

  3. Since the end of the pattern has been reached, the resultant match is Jon'. The letter s is not matched because the word boundary before it has already been matched.

没有伤那来痛 2024-12-10 01:03:49

该示例试图演示前瞻和后瞻可用于创建“与”条件。


\b\w+(?<!s)\b

也可以写为

\b\w*\w(?<!s)\b

That 给了我们

\b\w*[^s]\b    vs    \b\w*\w(?<!s)\b

I did that 这样我们就可以忽略不相关的内容。 (在这个例子中,\b只是干扰。)

[^s]    vs    \w(?<!s)

在左边,我们可以匹配除“s”之外的任何字符

在右边,我们可以匹配任何< em>单词字符除了“s”

顺便说一句,

\w(?<!s)

也可以写成

(?!s)\w      # Not followed by "s" and followed by \w

The example is trying to demonstrate that lookaheads and lookbehinds can be used to create "and" conditions.


\b\w+(?<!s)\b

could also be written as

\b\w*\w(?<!s)\b

That gives us

\b\w*[^s]\b    vs    \b\w*\w(?<!s)\b

I did that so we can ignore the irrelevant. (The \b are simply distractions in this example.) We have

[^s]    vs    \w(?<!s)

On the left, we can match any character except "s"

On the right, we can match any word character except "s"

By the way,

\w(?<!s)

could also be written

(?!s)\w      # Not followed by "s" and followed by \w
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文