负向前瞻正则表达式不起作用
input1="caused/VBN by/IN thyroid disorder"
要求:找到单词 "caused"
,后跟斜杠,后跟任意数量的大写字母 - 并且后面不跟空格 + "by/IN
。
在上面的示例中"caused/VBN"
后面跟着 " by/IN"
,因此 'caused' 不应匹配
input2="caused/VBN thyroid disorder"
"by/IN"
。 t 跟在 Caused 之后,因此它应该匹配
regex="caused/[A-Z]+(?![\\s]+by/IN)"
caused/[AZ]+
-- 单词 'caused' + / + 一个或多个大写字母(?![\\s]+by)
-- 负向前视 - 不匹配空格和 by
下面是我用来测试
public static void main(String[] args){
String input = "caused/VBN by/IN thyroid disorder";
String regex = "caused/[A-Z]+(?![\\s]+by/IN)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
while(matcher.find()){
System.out.println(matcher.group());
}
输出的一个简单方法: caused/VB
我不明白为什么我的负向前瞻正则表达式不起作用。
input1="caused/VBN by/IN thyroid disorder"
Requirement: find word "caused"
that is followed by slash followed by any number of capital alphabets -- and not followed by space + "by/IN
.
In the example above "caused/VBN"
is followed by " by/IN"
, so 'caused' should not match.
input2="caused/VBN thyroid disorder"
"by/IN"
doesn't follow caused, so it should match
regex="caused/[A-Z]+(?![\\s]+by/IN)"
caused/[A-Z]+
-- word 'caused' + / + one or more capital letters(?![\\s]+by)
-- negative lookahead - not matching space and by
Below is a simple method that I used to test
public static void main(String[] args){
String input = "caused/VBN by/IN thyroid disorder";
String regex = "caused/[A-Z]+(?![\\s]+by/IN)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
while(matcher.find()){
System.out.println(matcher.group());
}
Output: caused/VB
I don't understand why my negative lookahead regex is not working.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您需要在正则表达式中包含单词边界:
没有它,您可以获得匹配项,但不是您所期望的:
You need to include a word boundary in your regular expression:
Without it you can get a match, but not what you were expecting:
字符类 []+ 匹配将被调整(通过回溯),以便前瞻匹配。
你要做的就是停止回溯,使表达式 []+ 完全匹配。
这可以通过几种不同的方式来完成。
积极的前瞻,然后是消费
"caused(?=(/[AZ]+))\\1(?!\\s+by/IN)"
独立子表达式
"caused(?>/[AZ]+)(?!\\s+by/IN)"
所有格量词
"caused/[AZ]++(?!\\s+by/IN)"
The character class []+ match will be adjusted (via backtracking) so that the lookahead will match.
What you have to do is stop the backtracking so that the expression []+ is fully matched.
This can be done a couple of different ways.
A positive lookahead, followed by a consumption
"caused(?=(/[A-Z]+))\\1(?!\\s+by/IN)"
A standalone sub-expression
"caused(?>/[A-Z]+)(?!\\s+by/IN)"
A possesive quantifier
"caused/[A-Z]++(?!\\s+by/IN)"