负向前瞻正则表达式不起作用

发布于 2024-10-26 03:01:02 字数 975 浏览 2 评论 0原文

input1="caused/VBN by/IN thyroid disorder"

要求:找到单词 "caused",后跟斜杠,后跟任意数量的大写字母 - 并且后面不跟空格 + "by/IN

在上面的示例中"caused/VBN" 后面跟着 " by/IN",因此 'caused' 不应匹配

input2="caused/VBN thyroid disorder" 

"by/IN" 。 t 跟在 Caused 之后,因此它应该匹配

regex="caused/[A-Z]+(?![\\s]+by/IN)"

caused/[AZ]+ -- 单词 'caused' + / + 一个或多个大写字母
(?![\\s]+by) -- 负向前视 - 不匹配空格和 by

下面是我用来测试

public static void main(String[] args){
    String input = "caused/VBN by/IN thyroid disorder";

    String regex = "caused/[A-Z]+(?![\\s]+by/IN)";

    Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(input);

    while(matcher.find()){
        System.out.println(matcher.group());
    }

输出的一个简单方法: caused/VB

我不明白为什么我的负向前瞻正则表达式不起作用。

input1="caused/VBN by/IN thyroid disorder"

Requirement: find word "caused" that is followed by slash followed by any number of capital alphabets -- and not followed by space + "by/IN.

In the example above "caused/VBN" is followed by " by/IN", so 'caused' should not match.

input2="caused/VBN thyroid disorder" 

"by/IN" doesn't follow caused, so it should match

regex="caused/[A-Z]+(?![\\s]+by/IN)"

caused/[A-Z]+ -- word 'caused' + / + one or more capital letters
(?![\\s]+by) -- negative lookahead - not matching space and by

Below is a simple method that I used to test

public static void main(String[] args){
    String input = "caused/VBN by/IN thyroid disorder";

    String regex = "caused/[A-Z]+(?![\\s]+by/IN)";

    Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(input);

    while(matcher.find()){
        System.out.println(matcher.group());
    }

Output: caused/VB

I don't understand why my negative lookahead regex is not working.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

妳是的陽光 2024-11-02 03:01:02

您需要在正则表达式中包含单词边界:

String regex = "caused/[A-Z]+\\b(?![\\s]+by/IN)";

没有它,您可以获得匹配项,但不是您所期望的:

"caused/VBN by/IN thyroid disorder";
 ^^^^^^^^^
 this matches because "N by" doesn't match "[\\s]+by"

You need to include a word boundary in your regular expression:

String regex = "caused/[A-Z]+\\b(?![\\s]+by/IN)";

Without it you can get a match, but not what you were expecting:

"caused/VBN by/IN thyroid disorder";
 ^^^^^^^^^
 this matches because "N by" doesn't match "[\\s]+by"
别再吹冷风 2024-11-02 03:01:02

字符类 []+ 匹配将被调整(通过回溯),以便前瞻匹配。

你要做的就是停止回溯,使表达式 []+ 完全匹配。
这可以通过几种不同的方式来完成。

  1. 积极的前瞻,然后是消费
    "caused(?=(/[AZ]+))\\1(?!\\s+by/IN)"

  2. 独立子表达式
    "caused(?>/[AZ]+)(?!\\s+by/IN)"

  3. 所有格量词
    "caused/[AZ]++(?!\\s+by/IN)"

The character class []+ match will be adjusted (via backtracking) so that the lookahead will match.

What you have to do is stop the backtracking so that the expression []+ is fully matched.
This can be done a couple of different ways.

  1. A positive lookahead, followed by a consumption
    "caused(?=(/[A-Z]+))\\1(?!\\s+by/IN)"

  2. A standalone sub-expression
    "caused(?>/[A-Z]+)(?!\\s+by/IN)"

  3. A possesive quantifier
    "caused/[A-Z]++(?!\\s+by/IN)"

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文