“Deparing”使用 pyparsing 的列表

发布于 2024-09-08 02:16:48 字数 40 浏览 4 评论 0原文

是否可以给 pyparsing 一个解析列表并让它返回原始字符串?

Is it possible to give pyparsing a parsed list and have it return the original string?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

暖树树初阳… 2024-09-15 02:16:48

是的,如果您已指示解析器不要丢弃任何输入,则可以。您可以使用Combine 组合器来完成此操作。

假设您的输入是:

>>> s = 'abc,def,  ghi'

这是一个解析器,它获取列表的确切文本:

>>> from pyparsing import *
>>> myList = Word(alphas) + ZeroOrMore(',' + Optional(White()) + Word(alphas))
>>> myList.leaveWhitespace()
>>> myList.parseString(s)
(['abc', ',', 'def', ',', '  ', 'ghi'], {})

“deparse”:

>>> reconstitutedList = Combine(myList)
>>> reconstitutedList.parseString(s)
(['abc,def,  ghi'], {})

它为您提供初始输入。

但这是有代价的:将所有额外的空格作为标记浮动通常并不方便,而且您会注意到我们必须在 myList 中显式关闭空格跳过关闭 >。这是一个删除空格的版本:

>>> myList = Word(alphas) + ZeroOrMore(',' + Word(alphas))
>>> myList.parseString(s)
(['abc', ',', 'def', ',', 'ghi'], {})
>>> reconstitutedList = Combine(myList, adjacent=False)
>>> reconstitutedList.parseString(s)
(['abc,def,ghi'], {})

请注意,此时您还没有得到文字输入,但这对您来说可能已经足够了。另请注意,我们必须明确告诉合并允许跳过空格。

但实际上,在许多情况下您甚至不关心分隔符;您希望解析器专注于项目本身。有一个名为 commaSeparatedList 的函数,可以方便地为您去除分隔符和空格:

>>> myList = commaSeparatedList
>>> myList.parseString(s)
(['abc', 'def', 'ghi'], {})

不过,在这种情况下,“解析”步骤没有足够的信息来使重构的字符串有意义:

>>> reconstitutedList = Combine(myList, adjacent=False)
>>> reconstitutedList.parseString(s)
(['abcdefghi'], {})

Yes, you can if you've instructed the parser not to throw away any input. You do it with the Combine combinator.

Let's say your input is:

>>> s = 'abc,def,  ghi'

Here's a parser that grabs the exact text of the list:

>>> from pyparsing import *
>>> myList = Word(alphas) + ZeroOrMore(',' + Optional(White()) + Word(alphas))
>>> myList.leaveWhitespace()
>>> myList.parseString(s)
(['abc', ',', 'def', ',', '  ', 'ghi'], {})

To "deparse":

>>> reconstitutedList = Combine(myList)
>>> reconstitutedList.parseString(s)
(['abc,def,  ghi'], {})

which gives you the initial input back.

But this comes at a cost: having all that extra whitespace floating around as tokens is usually not convenient, and you'll note that we had to explicitly turn whitespace skipping off in myList. Here's a version that strips whitespace:

>>> myList = Word(alphas) + ZeroOrMore(',' + Word(alphas))
>>> myList.parseString(s)
(['abc', ',', 'def', ',', 'ghi'], {})
>>> reconstitutedList = Combine(myList, adjacent=False)
>>> reconstitutedList.parseString(s)
(['abc,def,ghi'], {})

Note you're not getting the literal input back at this point, but this may be good enough for you. Also note we had to explicitly tell Combine to allow the skipping of whitespace.

Really, though, in many cases you don't even care about the delimiters; you want the parser to focus on the items themselves. There's a function called commaSeparatedList that conveniently strips both delimiters and whitespace for you:

>>> myList = commaSeparatedList
>>> myList.parseString(s)
(['abc', 'def', 'ghi'], {})

In this case, though, the "deparsing" step doesn't have enough information for the reconstituted string to make sense:

>>> reconstitutedList = Combine(myList, adjacent=False)
>>> reconstitutedList.parseString(s)
(['abcdefghi'], {})
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文