如何在 Perl 中提取匹配大括号之间的字符串?
我的输入文件如下:
HEADER
{ABC|*|DEF {GHI 0 1 0} {{Points {}}}}
{ABC|*|DEF {GHI 0 2 0} {{Points {}}}}
{ABC|*|XYZ:abc:def {GHI 0 22 0} {{Points {{F1 1.1} {F2 1.2} {F3 1.3} {F4 1.4}}}}}
{ABC|*|XYZ:ghi:jkl {JKL 0 372 0} {{Points {}}}}
{ABC|*|XYZ:mno:pqr {GHI 0 34 0} {{Points {}}}}
{
ABC|*|XYZ:abc:pqr {GHI 0 68 0}
{{Points {{F1 11.11} {F2 12.10} {F3 14.11} {F4 16.23}}}}
}
TRAILER
我想将文件提取到数组中,如下所示:
$array[0] = "{ABC|*|DEF {GHI 0 1 0} {{Points {}}}}"
$array[1] = "{ABC|*|DEF {GHI 0 2 0} {{Points {}}}}"
$array[2] = "{ABC|*|XYZ:abc:def {GHI 0 22 0} {{Points {{F1 1.1} {F2 1.2} {F3 1.3} {F4 1.4}}}}}"
..
..
$array[5] = "{
ABC|*|XYZ:abc:pqr {GHI 0 68 0}
{{Points {{F1 11.11} {F2 12.10} {F3 14.11} {F4 16.23}}}}
}"
这意味着,我需要将第一个左大括号与其右大括号匹配,并提取之间的字符串。
我已检查以下链接,但这不适用于我的问题。 正则表达式获取字符串在大括号之间“{我想要大括号之间的内容}”
我正在尝试,但如果有人可以用他们的专业知识帮助我,那真的会很有帮助......
谢谢 斯里...
My input file is as below :
HEADER
{ABC|*|DEF {GHI 0 1 0} {{Points {}}}}
{ABC|*|DEF {GHI 0 2 0} {{Points {}}}}
{ABC|*|XYZ:abc:def {GHI 0 22 0} {{Points {{F1 1.1} {F2 1.2} {F3 1.3} {F4 1.4}}}}}
{ABC|*|XYZ:ghi:jkl {JKL 0 372 0} {{Points {}}}}
{ABC|*|XYZ:mno:pqr {GHI 0 34 0} {{Points {}}}}
{
ABC|*|XYZ:abc:pqr {GHI 0 68 0}
{{Points {{F1 11.11} {F2 12.10} {F3 14.11} {F4 16.23}}}}
}
TRAILER
I want to extract the file into an array as below :
$array[0] = "{ABC|*|DEF {GHI 0 1 0} {{Points {}}}}"
$array[1] = "{ABC|*|DEF {GHI 0 2 0} {{Points {}}}}"
$array[2] = "{ABC|*|XYZ:abc:def {GHI 0 22 0} {{Points {{F1 1.1} {F2 1.2} {F3 1.3} {F4 1.4}}}}}"
..
..
$array[5] = "{
ABC|*|XYZ:abc:pqr {GHI 0 68 0}
{{Points {{F1 11.11} {F2 12.10} {F3 14.11} {F4 16.23}}}}
}"
Which means, I need to match the first opening curly brace with its closing curly brace and extract the string in between.
I have checked the below link, but this doesnt apply to my question.
Regex to get string between curly braces "{I want what's between the curly braces}"
I am trying but would really help if someone can assist me with their expertise ...
Thanks
Sri ...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
这当然可以通过正则表达式来完成,至少在现代版本的 Perl 中:
正则表达式匹配包含非大括号字符的大括号块,或递归到其自身(匹配嵌套大括号)
编辑:上面的代码适用于 Perl 5.10+ ,对于早期版本,递归有点冗长:
This can certainly be done with regex at least in modern versions of Perl:
The regex matches a curly brace block that contains either non curly brace characters, or a recursion into itself (matches nested braces)
Edit: the above code works in Perl 5.10+, for earlier versions the recursion is a bit more verbose:
使用 Text::Balanced
Use Text::Balanced
我同意 ysth 的建议,使用
Text::Balanced
模块。几行就可以让您上路。输出
I second ysth's suggestion to use the
Text::Balanced
module. A few lines will get you on your way.OUTPUT
你总是可以计算大括号的数量:
这是古老的、简单的 Perl 风格(而且可能很丑)。
You can always count braces:
This is old, plain Perl style (and ugly, probably).
我不认为你想在这里使用纯正则表达式(恕我直言,这甚至可能无法使用正则表达式进行解析)。
相反,构建一个小型解析器,类似于此处所示的内容: http://www.perlmonks.org/ ?node_id=308039
(参见 shotgunefx (Parson) 于 2003 年 11 月 18 日 18:29 UTC 的回答)
更新 似乎可以使用正则表达式来实现 - 我在 掌握正则表达式(可以在 Google 图书上找到,因此如果您没有这本书,可以在 google 上搜索 - 请参阅第 5 章,“匹配括号的平衡组”部分)
I don't think pure regular expressions are what you want to use here (IMHO this might not even be parsable using regex).
Instead, build a small parser, similar to what's shown here: http://www.perlmonks.org/?node_id=308039
(see the answer by shotgunefx (Parson) on Nov 18, 2003 at 18:29 UTC)
UPDATE It seems it might be doable with a regex - I saw a reference to matching nested parentheses in Mastering Regular Expressions (that's available on Google Books and thus can be googled for if you don't have the book - see Chapter 5, section "Matching balanced sets of parentheses")
对于这种类型的解析,使用状态机比使用正则表达式要好得多。
You're much better off using a state machine than a regex for this type of parsing.
正则表达式实际上对于匹配大括号来说非常糟糕。根据您想要深入的程度,您可以为 解析::RecDescent。或者,如果您只想获取块,请搜索打开的“{”标记和关闭的“}”,然后记录在任何给定时间有多少个打开的块。
Regular expressions are actually pretty bad for matching braces. Depending how deep you want to go, you could write a full grammar (which is a lot easier than it sounds!) for Parse::RecDescent. Or, if you just want to get the blocks, search through for opening '{' marks and closing '}', and just keep count of how many are open at any given time.