查找正则表达式中缺少的单词

发布于 2024-10-31 12:28:33 字数 570 浏览 6 评论 0原文

我见过在正则表达式中查找缺少字符的示例，我正在尝试查找正则表达式中缺少单词（可能使用负向后查找）。

我有这样的代码行：

示例一：

protected static readonly string BACKGROUND_MUSIC_NAME = "Music_Mission_Complete_Loop_audio";

这是另一行：

mainWindow.Id = "MainWindow";

最后一个：

mainStoLabel.Text = "#stb_entry_clah";

我想通过查找所有像这样的字符串来捕获中间的一个。 a.) 在实际字符串中前面没有“#”引号之间的 b.) 前面根本没有“readonly”一词。

我当前的正则表达式是这样的：

.*\W\=\W"[^#].*"

它捕获了前两个示例。现在我只想缩小上面的例子的范围。如何捕获整个单词（而不是字符）的缺失。

谢谢。

原文

I've seen examples of finding the absence of characters in a regular expression, I'm trying to find the absence of words in a regular expression (likely using a negative lookbehind).

I have lines of code like this:

Example One:

protected static readonly string BACKGROUND_MUSIC_NAME = "Music_Mission_Complete_Loop_audio";

And here's another one:

mainWindow.Id = "MainWindow";

Final one:

mainStoLabel.Text = "#stb_entry_clah";

I want to capture only the middle one by finding all strings like these that a.) aren't preceded by a "#" in the actual string between the quotes, and b.) aren't preceded at all by the word "readonly".

My current Regular Expression is this:

.*\W\=\W"[^#].*"

It captures the top two examples. Now I just want to narrow down the top example. How do I capture the absence of (not characters) whole words.

Thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浅忆 2024-11-07 12:28:33

您的否定前瞻断言中的错误是您没有将其正确地组合在一起以适应一般情况。当你向前爬行时，你需要使其断言适用于每个角色位置。它仅适用于您编写的方式中的一个可能的点，而您需要它适用于所有点。请参阅下文，了解如何正确执行此操作。

这是一个工作演示，显示了两种不同的方法：

第一个使用负前瞻来确保左侧部分不包含只读，并且右侧部分不以数字符号开头.
第二个执行更简单的解析器，然后分别检查左侧和右侧是否适用于每个约束。

演示语言是 Perl，但相同的模式和逻辑几乎在任何地方都适用。

#!/usr/bin/perl

while (<DATA>) {
    chomp;
#
# First demo: use a complicated regex to get desired part only
#
    my($label) = m{
        ^                           # start at the beginning
        (?:                         # noncapture group:
            (?! \b readonly \b )    #   no "readonly" here
            .                       #   now advance one character
        ) +                         # repeated 1 or more times
        \s* = \s*                   # skip an equals sign w/optional spaces
        " ( [^#"] [^"]* ) "         # capture #1: quote-delimited text
                                    #   BUT whose first char isn't a "#"
    }x;

    if (defined $label) {
        print "Demo One: found label <$label> at line $.\n";
    }
#
# Second demo: This time use simpler patterns, several
#
    my($lhs, $rhs) = m{
        ^                       # from the start of line
        ( [^=]+ )               # capture #1: 1 or more non-equals chars
        \s* = \s*               # skip an equals sign w/optional spaces
        " ( [^"]+ ) "           # capture #2: all quote-delimited text
    }x;

    unless ($lhs =~ /\b readonly \b/x || $rhs =~ /^#/) {
        print "Demo Two: found label <$rhs> at line $.\n";
    }

}
__END__
protected static readonly string BACKGROUND_MUSIC_NAME = "Music_Mission_Complete_Loop_audio";
mainWindow.Id = "MainWindow";
mainStoLabel.Text = "#stb_entry_clah";

我有两点建议。首先是确保始终使用/x模式，这样您就可以生成有文档记录且可维护的正则表达式。第二个是，在第二个解决方案中一次做一些事情比在第一个解决方案中一次做所有事情要干净得多。

The bug in your negation lookahead assertion is that you didn’t put it together right to suit the general case. You need to make its assertion apply to every character position as you crawl ahead. It only applies to one possible dot the way you’ve written it, whereas you need it to apply to all of them. See below for how you must do this to do it correctly.

Here is a working demo that shows two different approaches:

The first uses a negative lookahead to ensure that the left-hand portion not contain readonly and the right-hand portion not start with a number sign.
The second does a simpler parser, then separately inspects the left- and right-hand sides for the individual constraints that apply to each.

The demo language is Perl, but the same patterns and logic should work virtually everywhere.

#!/usr/bin/perl

while (<DATA>) {
    chomp;
#
# First demo: use a complicated regex to get desired part only
#
    my($label) = m{
        ^                           # start at the beginning
        (?:                         # noncapture group:
            (?! \b readonly \b )    #   no "readonly" here
            .                       #   now advance one character
        ) +                         # repeated 1 or more times
        \s* = \s*                   # skip an equals sign w/optional spaces
        " ( [^#"] [^"]* ) "         # capture #1: quote-delimited text
                                    #   BUT whose first char isn't a "#"
    }x;

    if (defined $label) {
        print "Demo One: found label <$label> at line $.\n";
    }
#
# Second demo: This time use simpler patterns, several
#
    my($lhs, $rhs) = m{
        ^                       # from the start of line
        ( [^=]+ )               # capture #1: 1 or more non-equals chars
        \s* = \s*               # skip an equals sign w/optional spaces
        " ( [^"]+ ) "           # capture #2: all quote-delimited text
    }x;

    unless ($lhs =~ /\b readonly \b/x || $rhs =~ /^#/) {
        print "Demo Two: found label <$rhs> at line $.\n";
    }

}
__END__
protected static readonly string BACKGROUND_MUSIC_NAME = "Music_Mission_Complete_Loop_audio";
mainWindow.Id = "MainWindow";
mainStoLabel.Text = "#stb_entry_clah";

I have two bits of advice. The first is to make very sure you ALWAYS use /x mode so you can produce documented and maintainable regexes. The second is that it is much cleaner doing things a bit at a time as in the second solution rather than all at once as in the first.

回复收藏 0 原文