查找正则表达式中缺少的单词

发布于 2024-10-31 12:28:33 字数 570 浏览 6 评论 0原文

我见过在正则表达式中查找缺少字符的示例,我正在尝试查找正则表达式中缺少单词(可能使用负向后查找)。

我有这样的代码行:

示例一:

protected static readonly string BACKGROUND_MUSIC_NAME = "Music_Mission_Complete_Loop_audio";

这是另一行:

mainWindow.Id = "MainWindow";

最后一个:

mainStoLabel.Text = "#stb_entry_clah";

我想通过查找所有像这样的字符串来捕获中间的一个。 a.) 在实际字符串中前面没有“#”引号之间的 b.) 前面根本没有“readonly”一词。

我当前的正则表达式是这样的:

.*\W\=\W"[^#].*"

它捕获了前两个示例。现在我只想缩小上面的例子的范围。如何捕获整个单词(而不是字符)的缺失。

谢谢。

I've seen examples of finding the absence of characters in a regular expression, I'm trying to find the absence of words in a regular expression (likely using a negative lookbehind).

I have lines of code like this:

Example One:

protected static readonly string BACKGROUND_MUSIC_NAME = "Music_Mission_Complete_Loop_audio";

And here's another one:

mainWindow.Id = "MainWindow";

Final one:

mainStoLabel.Text = "#stb_entry_clah";

I want to capture only the middle one by finding all strings like these that a.) aren't preceded by a "#" in the actual string between the quotes, and b.) aren't preceded at all by the word "readonly".

My current Regular Expression is this:

.*\W\=\W"[^#].*"

It captures the top two examples. Now I just want to narrow down the top example. How do I capture the absence of (not characters) whole words.

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

浅忆 2024-11-07 12:28:33

您的否定前瞻断言中的错误是您没有将其正确地组合在一起以适应一般情况。当你向前爬行时,你需要使其断言适用于每个角色位置。它仅适用于您编写的方式中的一个可能的点,而您需要它适用于所有点。请参阅下文,了解如何正确执行此操作。

这是一个工作演示,显示了两种不同的方法:

  1. 第一个使用负前瞻来确保左侧部分不包含只读,并且右侧部分不以数字符号开头.

  2. 第二个执行更简单的解析器,然后分别检查左侧和右侧是否适用于每个约束。

演示语言是 Perl,但相同的模式和逻辑几乎在任何地方都适用。

#!/usr/bin/perl

while (<DATA>) {
    chomp;
#
# First demo: use a complicated regex to get desired part only
#
    my($label) = m{
        ^                           # start at the beginning
        (?:                         # noncapture group:
            (?! \b readonly \b )    #   no "readonly" here
            .                       #   now advance one character
        ) +                         # repeated 1 or more times
        \s* = \s*                   # skip an equals sign w/optional spaces
        " ( [^#"] [^"]* ) "         # capture #1: quote-delimited text
                                    #   BUT whose first char isn't a "#"
    }x;

    if (defined $label) {
        print "Demo One: found label <$label> at line $.\n";
    }
#
# Second demo: This time use simpler patterns, several
#
    my($lhs, $rhs) = m{
        ^                       # from the start of line
        ( [^=]+ )               # capture #1: 1 or more non-equals chars
        \s* = \s*               # skip an equals sign w/optional spaces
        " ( [^"]+ ) "           # capture #2: all quote-delimited text
    }x;

    unless ($lhs =~ /\b readonly \b/x || $rhs =~ /^#/) {
        print "Demo Two: found label <$rhs> at line $.\n";
    }

}
__END__
protected static readonly string BACKGROUND_MUSIC_NAME = "Music_Mission_Complete_Loop_audio";
mainWindow.Id = "MainWindow";
mainStoLabel.Text = "#stb_entry_clah";

我有两点建议。首先是确保始终使用/x模式,这样您就可以生成有文档记录且可维护的正则表达式。第二个是,在第二个解决方案中一次做一些事情比在第一个解决方案中一次做所有事情要干净得多。

The bug in your negation lookahead assertion is that you didn’t put it together right to suit the general case. You need to make its assertion apply to every character position as you crawl ahead. It only applies to one possible dot the way you’ve written it, whereas you need it to apply to all of them. See below for how you must do this to do it correctly.

Here is a working demo that shows two different approaches:

  1. The first uses a negative lookahead to ensure that the left-hand portion not contain readonly and the right-hand portion not start with a number sign.

  2. The second does a simpler parser, then separately inspects the left- and right-hand sides for the individual constraints that apply to each.

The demo language is Perl, but the same patterns and logic should work virtually everywhere.

#!/usr/bin/perl

while (<DATA>) {
    chomp;
#
# First demo: use a complicated regex to get desired part only
#
    my($label) = m{
        ^                           # start at the beginning
        (?:                         # noncapture group:
            (?! \b readonly \b )    #   no "readonly" here
            .                       #   now advance one character
        ) +                         # repeated 1 or more times
        \s* = \s*                   # skip an equals sign w/optional spaces
        " ( [^#"] [^"]* ) "         # capture #1: quote-delimited text
                                    #   BUT whose first char isn't a "#"
    }x;

    if (defined $label) {
        print "Demo One: found label <$label> at line $.\n";
    }
#
# Second demo: This time use simpler patterns, several
#
    my($lhs, $rhs) = m{
        ^                       # from the start of line
        ( [^=]+ )               # capture #1: 1 or more non-equals chars
        \s* = \s*               # skip an equals sign w/optional spaces
        " ( [^"]+ ) "           # capture #2: all quote-delimited text
    }x;

    unless ($lhs =~ /\b readonly \b/x || $rhs =~ /^#/) {
        print "Demo Two: found label <$rhs> at line $.\n";
    }

}
__END__
protected static readonly string BACKGROUND_MUSIC_NAME = "Music_Mission_Complete_Loop_audio";
mainWindow.Id = "MainWindow";
mainStoLabel.Text = "#stb_entry_clah";

I have two bits of advice. The first is to make very sure you ALWAYS use /x mode so you can produce documented and maintainable regexes. The second is that it is much cleaner doing things a bit at a time as in the second solution rather than all at once as in the first.

撩动你心 2024-11-07 12:28:33

我不完全理解你的问题,否定的前瞻看起来像这样:

(?!.*readonly)(?:.*\s\=\s"[^#].*")

如果字符串中没有“readonly”一词,则第一部分将匹配。

您使用哪种语言?

你想匹配什么,只有第二个例子,我理解正确吗?

I don 't understand your question completely, a negative lookahead would look like this:

(?!.*readonly)(?:.*\s\=\s"[^#].*")

The first part will match if there is not the word "readonly" in the string.

Which language are you using?

What do you want to match, only the second example, did I understand this correct?

诗笺 2024-11-07 12:28:33

^[^"=]*(? 似乎适合您需要:

  • 第一个等号之前的所有内容不应包含 readonly 或引号
  • readonly 不是通过单词边界来识别的,而是通过空格(行首除外)
  • 来识别的,等号可以是由任意空格包围 等号后面
  • 必须跟有带引号的字符串
  • 带引号的字符串不应以 # 开头

如果您只需要字符串或带引号的字符串,则可以使用环视或捕获组

注意:as。根据您自己的正则表达式,这会丢弃最后一个引号之后的任何内容(与示例中的分号不匹配)

^[^"=]*(?<!(^|\s)readonly\s.*)\s*=\s*"[^#].*" seems to fit your needs:

  • everything before the first equal sign should not contain readonly or quotes
  • readonly is recognized not with word boundaries but with whitespace (except at beginning of line)
  • the equal sign can be surrounded by arbitrary whitespace
  • the equal sign must be followed by a quoted string
  • the quoted string should not start with #

You can work with lookarounds or capture groups if you only want the strings or quoted strings.

Note: as per your own regex, this discards anything after the last quote (not matching the semi-colon in your examples)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文