当前位置：文江博客话题详情

“Deparing”使用 pyparsing 的列表

发布于 2024-09-08 02:16:48 字数 40 浏览 7 评论 0原文

是否可以给 pyparsing 一个解析列表并让它返回原始字符串？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

暖树树初阳… 2024-09-15 02:16:48

是的，如果您已指示解析器不要丢弃任何输入，则可以。您可以使用Combine 组合器来完成此操作。

假设您的输入是：

>>> s = 'abc,def,  ghi'

这是一个解析器，它获取列表的确切文本：

>>> from pyparsing import *
>>> myList = Word(alphas) + ZeroOrMore(',' + Optional(White()) + Word(alphas))
>>> myList.leaveWhitespace()
>>> myList.parseString(s)
(['abc', ',', 'def', ',', '  ', 'ghi'], {})

“deparse”：

>>> reconstitutedList = Combine(myList)
>>> reconstitutedList.parseString(s)
(['abc,def,  ghi'], {})

它为您提供初始输入。

但这是有代价的：将所有额外的空格作为标记浮动通常并不方便，而且您会注意到我们必须在 myList 中显式关闭空格跳过关闭 >。这是一个删除空格的版本：

>>> myList = Word(alphas) + ZeroOrMore(',' + Word(alphas))
>>> myList.parseString(s)
(['abc', ',', 'def', ',', 'ghi'], {})
>>> reconstitutedList = Combine(myList, adjacent=False)
>>> reconstitutedList.parseString(s)
(['abc,def,ghi'], {})

请注意，此时您还没有得到文字输入，但这对您来说可能已经足够了。另请注意，我们必须明确告诉合并允许跳过空格。

但实际上，在许多情况下您甚至不关心分隔符；您希望解析器专注于项目本身。有一个名为 commaSeparatedList 的函数，可以方便地为您去除分隔符和空格：

>>> myList = commaSeparatedList
>>> myList.parseString(s)
(['abc', 'def', 'ghi'], {})

不过，在这种情况下，“解析”步骤没有足够的信息来使重构的字符串有意义：

>>> reconstitutedList = Combine(myList, adjacent=False)
>>> reconstitutedList.parseString(s)
(['abcdefghi'], {})

Yes, you can if you've instructed the parser not to throw away any input. You do it with the Combine combinator.

Let's say your input is:

>>> s = 'abc,def,  ghi'

Here's a parser that grabs the exact text of the list:

>>> from pyparsing import *
>>> myList = Word(alphas) + ZeroOrMore(',' + Optional(White()) + Word(alphas))
>>> myList.leaveWhitespace()
>>> myList.parseString(s)
(['abc', ',', 'def', ',', '  ', 'ghi'], {})

To "deparse":

>>> reconstitutedList = Combine(myList)
>>> reconstitutedList.parseString(s)
(['abc,def,  ghi'], {})

which gives you the initial input back.

But this comes at a cost: having all that extra whitespace floating around as tokens is usually not convenient, and you'll note that we had to explicitly turn whitespace skipping off in myList. Here's a version that strips whitespace:

>>> myList = Word(alphas) + ZeroOrMore(',' + Word(alphas))
>>> myList.parseString(s)
(['abc', ',', 'def', ',', 'ghi'], {})
>>> reconstitutedList = Combine(myList, adjacent=False)
>>> reconstitutedList.parseString(s)
(['abc,def,ghi'], {})

Note you're not getting the literal input back at this point, but this may be good enough for you. Also note we had to explicitly tell Combine to allow the skipping of whitespace.

Really, though, in many cases you don't even care about the delimiters; you want the parser to focus on the items themselves. There's a function called commaSeparatedList that conveniently strips both delimiters and whitespace for you:

>>> myList = commaSeparatedList
>>> myList.parseString(s)
(['abc', 'def', 'ghi'], {})

In this case, though, the "deparsing" step doesn't have enough information for the reconstituted string to make sense:

>>> reconstitutedList = Combine(myList, adjacent=False)
>>> reconstitutedList.parseString(s)
(['abcdefghi'], {})

回复收藏 0 原文

~没有更多了~

关于作者

起风了

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

“Deparing”使用 pyparsing 的列表

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

饮湿

明月

02

hs1283

风向决定发型

落花浅忆

友情链接

“Deparing”使用 pyparsing 的列表

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

饮湿

明月

02

hs1283

风向决定发型

落花浅忆

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。