如何使用 pyparsing 来解析具有多个开启/关闭类型的嵌套表达式？

发布于 2024-10-13 22:47:28 字数 499 浏览 2 评论 0原文

我想使用 pyparsing 来解析以下形式的表达式： expr = '(gimme [some {nested [lists]}])'，并返回以下形式的 python 列表： <代码>[[['gimme', ['some', ['nested', ['lists']]]]]]。现在我的语法如下所示：

nestedParens =nestedExpr('(', ')')
nestedBrackets =nestedExpr('[', ']')
nestedCurlies =nestedExpr('{', '}')
封闭 = 嵌套Parens |嵌套括号 | nestedCurlies

目前，enheld.searchString(expr) 返回以下形式的列表：[[['gimme', ['some', '{nested', '[lists]}'] ]]]。这不是我想要的，因为它无法识别方括号或大括号，但我不知道为什么。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

深海夜未眠 2024-10-20 22:47:28

这是一个 pyparsing 解决方案，它使用自修改语法来动态匹配正确的右大括号字符。

from pyparsing import *

data = '(gimme [some {nested, nested [lists]}])'

opening = oneOf("( { [")
nonBracePrintables = ''.join(c for c in printables if c not in '(){}[]')
closingFor = dict(zip("({[",")}]"))
closing = Forward()
# initialize closing with an expression
closing << NoMatch()
closingStack = []
def pushClosing(t):
    closingStack.append(closing.expr)
    closing << Literal( closingFor[t[0]] )
def popClosing():
    closing << closingStack.pop()
opening.setParseAction(pushClosing)
closing.setParseAction(popClosing)

matchedNesting = nestedExpr( opening, closing, Word(alphas) | Word(nonBracePrintables) )

print matchedNesting.parseString(data).asList()

prints:

[['gimme', ['some', ['nested', ',', 'nested', ['lists']]]]]

更新： 我发布了上述解决方案，因为我实际上是在一年前作为实验编写的。我只是仔细查看了您的原始帖子，它让我想到了由 operatorPrecedence 方法创建的递归类型定义，因此我使用您原来的方法重新编写了此解决方案 - 更容易遵循！（虽然右输入数据可能存在左递归问题，但未经过彻底测试）：

from pyparsing import *

enclosed = Forward()
nestedParens = nestedExpr('(', ')', content=enclosed) 
nestedBrackets = nestedExpr('[', ']', content=enclosed) 
nestedCurlies = nestedExpr('{', '}', content=enclosed) 
enclosed << (Word(alphas) | ',' | nestedParens | nestedBrackets | nestedCurlies)


data = '(gimme [some {nested, nested [lists]}])' 

print enclosed.parseString(data).asList()

给出：

[['gimme', ['some', ['nested', ',', 'nested', ['lists']]]]]

编辑：
这是更新后的解析器的图表，使用 pyparsing 3.0 中提供的铁路图表支持。

Here's a pyparsing solution that uses a self-modifying grammar to dynamically match the correct closing brace character.

from pyparsing import *

data = '(gimme [some {nested, nested [lists]}])'

opening = oneOf("( { [")
nonBracePrintables = ''.join(c for c in printables if c not in '(){}[]')
closingFor = dict(zip("({[",")}]"))
closing = Forward()
# initialize closing with an expression
closing << NoMatch()
closingStack = []
def pushClosing(t):
    closingStack.append(closing.expr)
    closing << Literal( closingFor[t[0]] )
def popClosing():
    closing << closingStack.pop()
opening.setParseAction(pushClosing)
closing.setParseAction(popClosing)

matchedNesting = nestedExpr( opening, closing, Word(alphas) | Word(nonBracePrintables) )

print matchedNesting.parseString(data).asList()

prints:

[['gimme', ['some', ['nested', ',', 'nested', ['lists']]]]]

Updated: I posted the above solution because I had actually written it over a year ago as an experiment. I just took a closer look at your original post, and it made me think of the recursive type definition created by the operatorPrecedence method, and so I redid this solution, using your original approach - much simpler to follow! (might have a left-recursion issue with the right input data though, not thoroughly tested):

from pyparsing import *

enclosed = Forward()
nestedParens = nestedExpr('(', ')', content=enclosed) 
nestedBrackets = nestedExpr('[', ']', content=enclosed) 
nestedCurlies = nestedExpr('{', '}', content=enclosed) 
enclosed << (Word(alphas) | ',' | nestedParens | nestedBrackets | nestedCurlies)


data = '(gimme [some {nested, nested [lists]}])' 

print enclosed.parseString(data).asList()

Gives:

[['gimme', ['some', ['nested', ',', 'nested', ['lists']]]]]

EDITED:
Here is a diagram of the updated parser, using the railroad diagramming support coming in pyparsing 3.0.

回复收藏 0 原文

天荒地未老 2024-10-20 22:47:28

这应该对你有用。我在你的例子上测试了它：

import re
import ast

def parse(s):
    s = re.sub("[\{\(\[]", '[', s)
    s = re.sub("[\}\)\]]", ']', s)
    answer = ''
    for i,char in enumerate(s):
        if char == '[':
            answer += char + "'"
        elif char == '[':
            answer += "'" + char + "'"
        elif char == ']':
            answer += char
        else:
            answer += char
            if s[i+1] in '[]':
                answer += "', "
    ast.literal_eval("s=%s" %answer)
    return s

如果你需要更多，请评论

This should do the trick for you. I tested it on your example:

import re
import ast

def parse(s):
    s = re.sub("[\{\(\[]", '[', s)
    s = re.sub("[\}\)\]]", ']', s)
    answer = ''
    for i,char in enumerate(s):
        if char == '[':
            answer += char + "'"
        elif char == '[':
            answer += "'" + char + "'"
        elif char == ']':
            answer += char
        else:
            answer += char
            if s[i+1] in '[]':
                answer += "', "
    ast.literal_eval("s=%s" %answer)
    return s

Comment if you need more

回复收藏 0 原文

~没有更多了~