正则表达式问题:上下文匹配
我有一个带有分层文本的结构化文件,它描述了 Delphi 中的 GUI(DFM 文件)。
假设我有这个文件,并且必须匹配所有“Color = xxx”行,这些行位于 TmyButton(已标记)的上下文中,但不是其他上下文中的行。在 TMyButton-Context 内不会有更深的层次结构级别。
object frmMain: TfrmMain
Left = 311
Top = 201
Color = clBtnFace
object MyFirstButton: TMyButton
Left = 555
Top = 301
Color = 16645072 <<<<<<MATCH THIS
OnClick = ButtonClick
end
object MyLabel: TLabel
Left = 362
Top = 224
Caption = 'a Caption'
Color = 16772831
Font.Color = clWindowText
end
object Panel2: TLTPanel
Left = 348
Top = 58
Width = 444
Height = 155
Color = clRed
object MyOtherButton: TMyButton
Left = 555
Top = 301
Color = 16645072 <<<<<<MATCH THIS
OnClick = ButtonClick
end
end
end
我尝试了两天,进行了很多很多不同的尝试。 这是我的一些不完整的模式:(
/^[ ]{2,}object [A-Za-z0-9]+: TmyButton\r\n/mi <<<Matches the needed context
/^[ ]{4,}Color = [A-Za-z0-9]+\r\n/mi <<<Matches the needed result
/^[ ]{2,}end\r\n/mi <<<Matches the end of the context
我不知道为什么,但我不得不使用“\r\n”而不是“$”...)。我需要将其放在一起,但忽略除其他“object xxx: yyy”和“end”行之外的其他行......
我很高兴获得一些帮助!
I have a structured file with hierarchical text which describes a GUI in Delphi (a DFM-File).
Let's assume I have this file and I have to match all "Color = xxx" Lines, which are in the context of TmyButton (marked), but not those in other context. Within the TMyButton-Context there won't be a deeper hierarchical level.
object frmMain: TfrmMain
Left = 311
Top = 201
Color = clBtnFace
object MyFirstButton: TMyButton
Left = 555
Top = 301
Color = 16645072 <<<<<<MATCH THIS
OnClick = ButtonClick
end
object MyLabel: TLabel
Left = 362
Top = 224
Caption = 'a Caption'
Color = 16772831
Font.Color = clWindowText
end
object Panel2: TLTPanel
Left = 348
Top = 58
Width = 444
Height = 155
Color = clRed
object MyOtherButton: TMyButton
Left = 555
Top = 301
Color = 16645072 <<<<<<MATCH THIS
OnClick = ButtonClick
end
end
end
I tried it two days long with many, many different tries.
Here some of my incomplete pieces of the pattern:
/^[ ]{2,}object [A-Za-z0-9]+: TmyButton\r\n/mi <<<Matches the needed context
/^[ ]{4,}Color = [A-Za-z0-9]+\r\n/mi <<<Matches the needed result
/^[ ]{2,}end\r\n/mi <<<Matches the end of the context
(I don't know why, but I had to use "\r\n" instead of "$"...). I need to put this together, but ignoring the other lines except other "object xxx: yyy" and "end" Lines....
I would be glad to have some help!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您想要或必须使用单个正则表达式来匹配复杂上下文中的行,则需要称为环视的正则表达式功能。具体来说,您需要 PCRE 不提供的可变长度lookbehind。
所以有两种可能:
使用 Rorick 建议的脚本方法,或使用正则表达式来匹配从所需上下文开始到实际匹配的所有内容,并使用捕获组提取该内容。这可以通过
(为了清楚起见插入空格周围的括号)来完成。然后,您的匹配将在捕获组
\1
嵌套结构通常不太适合正则表达式(更适合解析器),但如果您确定您提到的数据结构,它可能会起作用好的。
Matching a line in a complex context requires a regex feature called lookaround, if you want or have to do it with a single regex. Specifically, you'd need variable-length lookbehind which PCRE doesn't offer.
So there are two possibilities:
Use a scripting approach like Rorick suggested or use a regex that matches everything from the start of your needed context until the actual match, and extract that using a capturing group. That could be done with
(brackets around the space inserted for clarity). Your match would then be in capturing group
\1
Nested structures generally are not well suited for regexes (better for parsers) but if you're sure of the structure of your data as you mentioned, it might work OK.
如果我理解正确,您会尝试为此创建单个正则表达式。没有理由这样做。
object [A-Za-z0-9]+: TmyButton
的行Color = [A-Za-z0-9]+
检查下一行直到找到它或到达end
关键字。如果您尝试修改大量源文件,则可以使用一些脚本来实现此目的。
If I understand you correctly, you try to create single regexp for this. There is no reason to do so.
object [A-Za-z0-9]+: TmyButton
Color = [A-Za-z0-9]+
until you find it or reachend
keyword.If you try to modify a bulk of source files, you could use some scripting for this purpose.
我知道这不是 PCRE,而是软件考古学的一个很好的替代方案。
如果您从命令提示符处执行此操作,则可以随时使用 AWK。该脚本如下所示:
AWK 可以在 Internet 上找到。我会尝试 GAWK。
I know this is not PCRE, but a good alternative for software archeology.
You could at any time use AWK, if you do this from a command prompt. The script would look like this:
AWK can be found all over the internet. I would try GAWK.