正则表达式问题:上下文匹配

发布于 2024-08-05 21:14:29 字数 1291 浏览 12 评论 0原文

我有一个带有分层文本的结构化文件,它描述了 Delphi 中的 GUI(DFM 文件)。

假设我有这个文件,并且必须匹配所有“Color = xxx”行,这些行位于 TmyButton(已标记)的上下文中,但不是其他上下文中的行。在 TMyButton-Context 内不会有更深的层次结构级别。

object frmMain: TfrmMain
  Left = 311
  Top = 201
  Color = clBtnFace
  object MyFirstButton: TMyButton
    Left = 555
    Top = 301
    Color = 16645072           <<<<<<MATCH THIS
    OnClick = ButtonClick
  end
  object MyLabel: TLabel
    Left = 362
    Top = 224
    Caption = 'a Caption'
    Color = 16772831
    Font.Color = clWindowText
  end
  object Panel2: TLTPanel
    Left = 348
    Top = 58
    Width = 444
    Height = 155
    Color = clRed
    object MyOtherButton: TMyButton
      Left = 555
      Top = 301
      Color = 16645072         <<<<<<MATCH THIS
      OnClick = ButtonClick
    end
  end
end

我尝试了两天,进行了很多很多不同的尝试。 这是我的一些不完整的模式:(

/^[ ]{2,}object [A-Za-z0-9]+: TmyButton\r\n/mi  <<<Matches the needed context
/^[ ]{4,}Color = [A-Za-z0-9]+\r\n/mi            <<<Matches the needed result
/^[ ]{2,}end\r\n/mi                             <<<Matches the end of the context

我不知道为什么,但我不得不使用“\r\n”而不是“$”...)。我需要将其放在一起,但忽略除其他“object xxx: yyy”和“end”行之外的其他行......

我很高兴获得一些帮助!

I have a structured file with hierarchical text which describes a GUI in Delphi (a DFM-File).

Let's assume I have this file and I have to match all "Color = xxx" Lines, which are in the context of TmyButton (marked), but not those in other context. Within the TMyButton-Context there won't be a deeper hierarchical level.

object frmMain: TfrmMain
  Left = 311
  Top = 201
  Color = clBtnFace
  object MyFirstButton: TMyButton
    Left = 555
    Top = 301
    Color = 16645072           <<<<<<MATCH THIS
    OnClick = ButtonClick
  end
  object MyLabel: TLabel
    Left = 362
    Top = 224
    Caption = 'a Caption'
    Color = 16772831
    Font.Color = clWindowText
  end
  object Panel2: TLTPanel
    Left = 348
    Top = 58
    Width = 444
    Height = 155
    Color = clRed
    object MyOtherButton: TMyButton
      Left = 555
      Top = 301
      Color = 16645072         <<<<<<MATCH THIS
      OnClick = ButtonClick
    end
  end
end

I tried it two days long with many, many different tries.
Here some of my incomplete pieces of the pattern:

/^[ ]{2,}object [A-Za-z0-9]+: TmyButton\r\n/mi  <<<Matches the needed context
/^[ ]{4,}Color = [A-Za-z0-9]+\r\n/mi            <<<Matches the needed result
/^[ ]{2,}end\r\n/mi                             <<<Matches the end of the context

(I don't know why, but I had to use "\r\n" instead of "$"...). I need to put this together, but ignoring the other lines except other "object xxx: yyy" and "end" Lines....

I would be glad to have some help!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

玩物 2024-08-12 21:14:29

如果您想要或必须使用单个正则表达式来匹配复杂上下文中的行,则需要称为环视的正则表达式功能。具体来说,您需要 PCRE 不提供的可变长度lookbehind。

所以有两种可能:
使用 Rorick 建议的脚本方法,或使用正则表达式来匹配从所需上下文开始到实际匹配的所有内容,并使用捕获组提取该内容。这可以通过

[ ]{2,}object \w+: TMyButton\r\n.*?^([ ]{4,}Color = \w+[ \t]*\r\n)

(为了清楚起见插入空格周围的括号)来完成。然后,您的匹配将在捕获组 \1

嵌套结构通常不太适合正则表达式(更适合解析器),但如果您确定您提到的数据结构,它可能会起作用好的。

Matching a line in a complex context requires a regex feature called lookaround, if you want or have to do it with a single regex. Specifically, you'd need variable-length lookbehind which PCRE doesn't offer.

So there are two possibilities:
Use a scripting approach like Rorick suggested or use a regex that matches everything from the start of your needed context until the actual match, and extract that using a capturing group. That could be done with

[ ]{2,}object \w+: TMyButton\r\n.*?^([ ]{4,}Color = \w+[ \t]*\r\n)

(brackets around the space inserted for clarity). Your match would then be in capturing group \1

Nested structures generally are not well suited for regexes (better for parsers) but if you're sure of the structure of your data as you mentioned, it might work OK.

陌上芳菲 2024-08-12 21:14:29

如果我理解正确,您会尝试为此创建单个正则表达式。没有理由这样做。

  1. 只需找到具有模式 object [A-Za-z0-9]+: TmyButton 的行
  2. ,然后对照 Color = [A-Za-z0-9]+ 检查下一行直到找到它或到达 end 关键字。
  3. 重复步骤直到文件末尾

如果您尝试修改大量源文件,则可以使用一些脚本来实现此目的。

If I understand you correctly, you try to create single regexp for this. There is no reason to do so.

  1. Just find line with pattern object [A-Za-z0-9]+: TmyButton
  2. Then check each next line against Color = [A-Za-z0-9]+ until you find it or reach end keyword.
  3. Repeat steps until end of file

If you try to modify a bulk of source files, you could use some scripting for this purpose.

束缚m 2024-08-12 21:14:29

我知道这不是 PCRE,而是软件考古学的一个很好的替代方案。

如果您从命令提示符处执行此操作,则可以随时使用 AWK。该脚本如下所示:

BEGIN       { inObj = 0; } // Not really necessary
/TMyButton/ { inObj = 1; }
/end$/      { inObj = 0; }
/^[ ]{4,}Color = [A-Za-z0-9]+\r\n/ && inObj == 1
            { //do whatever you need to do
              print $3;
            }

AWK 可以在 Internet 上找到。我会尝试 GAWK

I know this is not PCRE, but a good alternative for software archeology.

You could at any time use AWK, if you do this from a command prompt. The script would look like this:

BEGIN       { inObj = 0; } // Not really necessary
/TMyButton/ { inObj = 1; }
/end$/      { inObj = 0; }
/^[ ]{4,}Color = [A-Za-z0-9]+\r\n/ && inObj == 1
            { //do whatever you need to do
              print $3;
            }

AWK can be found all over the internet. I would try GAWK.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文