Python解析括号内的块
Python 中解析匹配括号中包含的文本块的最佳方法是什么?
"{ { a } { b } { { { c } } } }"
最初应该返回:
[ "{ a } { b } { { { c } } }" ]
将其作为输入应该返回:
[ "a", "b", "{ { c } }" ]
应该返回:
[ "{ c }" ]
[ "c" ]
[]
What would be the best way in Python to parse out chunks of text contained in matching brackets?
"{ { a } { b } { { { c } } } }"
should initially return:
[ "{ a } { b } { { { c } } }" ]
putting that as an input should return:
[ "a", "b", "{ { c } }" ]
which should return:
[ "{ c }" ]
[ "c" ]
[]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
或者这个 pyparsing 版本:
Or this pyparsing version:
伪代码:
Pseudocode:
我对 Python 有点陌生,所以请放轻松,但这里有一个有效的实现:
输出:
I'm kind of new to Python, so go easy on me, but here is an implementation that works:
Output:
使用
lepl
进行解析(可通过$ easy_install lepl 安装
):输出:
Parse using
lepl
(installable via$ easy_install lepl
):Output:
更清洁的解决方案。这将返回最外层括号中包含的字符串。如果返回 None,则表示没有匹配项。
Cleaner solution. This will find return the string enclosed in the outermost bracket. If None is returned, there was no match.
您也可以一次解析它们,尽管我发现
{a}
的意思是"a"
而不是["a"]
稍微诡异的。如果我正确理解了格式:You could also parse them all at once, though I find the
{a}
to mean"a"
rather than["a"]
slightly weird. If I've understood the format correctly:如果您想使用解析器(在本例中为 lepl),但仍然想要中间结果而不是最终解析列表,那么我认为这就是您正在寻找的东西:
一开始可能看起来不透明,但它是确实相当简单:o)
nested 是嵌套括号匹配器的递归定义(定义中的“+”和 [...] 在匹配后将所有内容保留为单个字符串) 。然后 split 表示尽可能多地匹配由“{”...“}”(我们用“Drop”丢弃)包围的内容(“[:]”),并且包含嵌套表达式或任何字母。
最后,这是“一体式”解析器的 lepl 版本,它给出的结果格式与上面的 pyparsing 示例相同,但(我相信)对于空格在输入中的显示方式更加灵活:
If you want to use a parser (lepl in this case), but still want the intermediate results rather than a final parsed list, then I think this is the kind of thing you were looking for:
That might look opaque at first, but it's fairly simple really :o)
nested is a recursive definition of a matcher for nested brackets (the "+" and [...] in the definition keep everything as a single string after it has been matched). Then split says match as many as possible ("[:]") of something that is surrounded by "{" ... "}" (which we discard with "Drop") and contains either a nested expression or any letter.
Finally, here's a lepl version of the "all in one" parser that gives a result in the same format as the pyparsing example above, but which (I believe) is more flexible about how spaces appear in the input:
使用 Grako(语法编译器):
输出
Using Grako (grammar compiler):
Output
这是我针对类似用例提出的解决方案。这大致基于已接受的伪代码答案。我不想为外部库添加任何依赖项:
Here is a solution I came up with for a similar use case. This was loosely based on the accepted psuedo code answer. I didn't want to add any dependencies for external libraries: