正则表达式:负向后看和否定之间的区别
<代码>\b\w+(?。这绝对与
\b\w+[^s]\b
不同。当应用于Jon's
时,前者将匹配Jon
,后者将匹配Jon'
(包括撇号)。我将让你找出原因。 (提示:\b 匹配撇号和 s)。后者也不会匹配“a”或“I”等单字母单词。
你能解释一下为什么吗?
另外,您能否清楚地说明 \b
的作用,以及为什么它在撇号和 s
之间匹配?
From regular-expressions.info:
\b\w+(?<!s)\b
. This is definitely not the same as\b\w+[^s]\b
. When applied toJon's
, the former will matchJon
and the latterJon'
(including the apostrophe). I will leave it up to you to figure out why. (Hint: \b matches between the apostrophe and the s). The latter will also not match single-letter words like "a" or "I".
Can you explain why ?
Also, can you make clear what exacly \b
does, and why it matches between the apostrophe and the s
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
\b
是一个零宽度断言,表示单词边界 。这些字符位置(取自该链接)被视为单词边界:单词字符当然是任何
\w
。s
是单词字符,但'
不是。在上面的示例中,'
和s
之间的区域是单词边界。如果我突出显示锚点和边界,字符串
"Jon's"
看起来像这样(第一个和最后一个\b
出现在与^
和$
):^Jon\b'\bs$
否定后向断言
(? 意味着它将仅匹配单词边界(如果是)前面没有字母
s
(即最后一个单词字符不是s
)。所以它在一定条件下寻找单词边界。因此,第一个正则表达式的工作方式如下:
\b\w+
匹配前三个字母J
o
n
>.如上所示,
n
和'
之间实际上还有另一个单词边界,因此(? 匹配 这个单词边界,因为它前面是
n
,而不是s
。由于已到达模式末尾,因此生成的匹配项为
Jon
。互补字符类
[^s]\b
表示它将匹配除字母s
之外的任何字符,后跟单词边界。与上面的不同,这会查找一个字符,后跟一个单词边界。因此,第二个正则表达式的工作方式如下:
\b\w+
匹配前三个字母J
o
n
>.由于
'
不是字母s
(它满足字符类[^s]
),并且它后面跟着一个单词边界('
和s
之间),因此匹配。由于已到达模式末尾,因此结果匹配为
Jon'
。字母s
未匹配,因为其之前的单词边界已被匹配。\b
is a zero-width assertion that means word boundary. These character positions (taken from that link) are considered word boundaries:Word characters are of course any
\w
.s
is a word character, but'
is not. In the above example, the area between the'
and thes
is a word boundary.The string
"Jon's"
looks like this if I highlight the anchors and boundaries (the first and last\b
s occur in the same positions as^
and$
):^Jon\b'\bs$
The negative lookbehind assertion
(?<!s)\b
means it will only match a word boundary if it's not preceded by the letters
(i.e. the last word character is not ans
). So it looks for a word boundary under a certain condition.Therefore the first regex works like this:
\b\w+
matches the first three lettersJ
o
n
.There's actually another word boundary between
n
and'
as shown above, so(?<!s)\b
matches this word boundary because it's preceded by ann
, not ans
.Since the end of the pattern has been reached, the resultant match is
Jon
.The complementary character class
[^s]\b
means it will match any character that is not the letters
, followed by a word boundary. Unlike the above, this looks for one character followed by a word boundary.Therefore the second regex works like this:
\b\w+
matches the first three lettersJ
o
n
.Since the
'
is not the letters
(it fulfills the character class[^s]
), and it's followed by a word boundary (between'
ands
), it's matched.Since the end of the pattern has been reached, the resultant match is
Jon'
. The letters
is not matched because the word boundary before it has already been matched.该示例试图演示前瞻和后瞻可用于创建“与”条件。
也可以写为
That 给了我们
I did that 这样我们就可以忽略不相关的内容。 (在这个例子中,
\b
只是干扰。)在左边,我们可以匹配除“s”之外的任何字符
在右边,我们可以匹配任何< em>单词字符除了“s”
顺便说一句,
也可以写成
The example is trying to demonstrate that lookaheads and lookbehinds can be used to create "and" conditions.
could also be written as
That gives us
I did that so we can ignore the irrelevant. (The
\b
are simply distractions in this example.) We haveOn the left, we can match any character except "s"
On the right, we can match any word character except "s"
By the way,
could also be written