使用 pyparsing 解析多行中的单词转义分割
我正在尝试使用 pyparsing.这是我所做的:
from pyparsing import *
continued_ending = Literal('\\') + lineEnd
word = Word(alphas)
split_word = word + Suppress(continued_ending)
multi_line_word = Forward()
multi_line_word << (word | (split_word + multi_line_word))
print multi_line_word.parseString(
'''super\\
cali\\
fragi\\
listic''')
我得到的输出是 ['super']
,而预期输出是 ['super', 'cali', fragi', 'listic'].更好的是,将它们全部连接为一个单词(我认为我可以使用
multi_line_word.parseAction(lambda t: ''.join(t))
来完成。
我尝试在pyparsing helper,但它给了我一个错误,超出了最大递归深度
编辑 2009-11-15: 后来我意识到 pyparsing 在空白方面有点慷慨,这导致了一些糟糕的假设,即我认为我正在解析的内容要宽松得多。也就是说,我们希望单词、转义符和 EOL 字符之间没有空格。
我意识到上面的小示例字符串不足以作为测试用例,因此我编写了以下内容。通过这些测试的代码应该能够匹配我直观地认为的转义拆分词,并且仅它们不会匹配不是的基本单词。逃逸分裂。我们可以——而且我相信应该——为此使用不同的语法结构。这使得两者分开,一切都保持整洁。
import unittest
import pyparsing
# Assumes you named your module 'multiline.py'
import multiline
class MultiLineTests(unittest.TestCase):
def test_continued_ending(self):
case = '\\\n'
expected = ['\\', '\n']
result = multiline.continued_ending.parseString(case).asList()
self.assertEqual(result, expected)
def test_continued_ending_space_between_parse_error(self):
case = '\\ \n'
self.assertRaises(
pyparsing.ParseException,
multiline.continued_ending.parseString,
case
)
def test_split_word(self):
cases = ('shiny\\', 'shiny\\\n', ' shiny\\')
expected = ['shiny']
for case in cases:
result = multiline.split_word.parseString(case).asList()
self.assertEqual(result, expected)
def test_split_word_no_escape_parse_error(self):
case = 'shiny'
self.assertRaises(
pyparsing.ParseException,
multiline.split_word.parseString,
case
)
def test_split_word_space_parse_error(self):
cases = ('shiny \\', 'shiny\r\\', 'shiny\t\\', 'shiny\\ ')
for case in cases:
self.assertRaises(
pyparsing.ParseException,
multiline.split_word.parseString,
case
)
def test_multi_line_word(self):
cases = (
'shiny\\',
'shi\\\nny',
'sh\\\ni\\\nny\\\n',
' shi\\\nny\\',
'shi\\\nny '
'shi\\\nny captain'
)
expected = ['shiny']
for case in cases:
result = multiline.multi_line_word.parseString(case).asList()
self.assertEqual(result, expected)
def test_multi_line_word_spaces_parse_error(self):
cases = (
'shi \\\nny',
'shi\\ \nny',
'sh\\\n iny',
'shi\\\n\tny',
)
for case in cases:
self.assertRaises(
pyparsing.ParseException,
multiline.multi_line_word.parseString,
case
)
if __name__ == '__main__':
unittest.main()
I'm trying to parse words which can be broken up over multiple lines with a backslash-newline combination ("\\n
") using pyparsing. Here's what I have done:
from pyparsing import *
continued_ending = Literal('\\') + lineEnd
word = Word(alphas)
split_word = word + Suppress(continued_ending)
multi_line_word = Forward()
multi_line_word << (word | (split_word + multi_line_word))
print multi_line_word.parseString(
'''super\\
cali\\
fragi\\
listic''')
The output I get is ['super']
, while the expected output is ['super', 'cali', fragi', 'listic']
. Better still would be all of them joined as one word (which I think I can just do with multi_line_word.parseAction(lambda t: ''.join(t))
.
I tried looking at this code in pyparsing helper, but it gives me an error, maximum recursion depth exceeded
.
EDIT 2009-11-15: I realized later that pyparsing gets a little generous with regards to white space, and that leads to some poor assumptions that what I thought I was parsing for was a lot looser. That is to say, we want to see no white space between any of the portions of the word, the escape, and the EOL character.
I realized that the little example string above is insufficient as a test case, so I wrote the following unit tests. Code that passes these tests should be able to match what I intuitively think of as a escape-split word—and only an escape-split word. They will not match a basic word that is not escape-split. We can—and I believe should—use a different grammatical construct for that. This keeps it all tidy having the two separate.
import unittest
import pyparsing
# Assumes you named your module 'multiline.py'
import multiline
class MultiLineTests(unittest.TestCase):
def test_continued_ending(self):
case = '\\\n'
expected = ['\\', '\n']
result = multiline.continued_ending.parseString(case).asList()
self.assertEqual(result, expected)
def test_continued_ending_space_between_parse_error(self):
case = '\\ \n'
self.assertRaises(
pyparsing.ParseException,
multiline.continued_ending.parseString,
case
)
def test_split_word(self):
cases = ('shiny\\', 'shiny\\\n', ' shiny\\')
expected = ['shiny']
for case in cases:
result = multiline.split_word.parseString(case).asList()
self.assertEqual(result, expected)
def test_split_word_no_escape_parse_error(self):
case = 'shiny'
self.assertRaises(
pyparsing.ParseException,
multiline.split_word.parseString,
case
)
def test_split_word_space_parse_error(self):
cases = ('shiny \\', 'shiny\r\\', 'shiny\t\\', 'shiny\\ ')
for case in cases:
self.assertRaises(
pyparsing.ParseException,
multiline.split_word.parseString,
case
)
def test_multi_line_word(self):
cases = (
'shiny\\',
'shi\\\nny',
'sh\\\ni\\\nny\\\n',
' shi\\\nny\\',
'shi\\\nny '
'shi\\\nny captain'
)
expected = ['shiny']
for case in cases:
result = multiline.multi_line_word.parseString(case).asList()
self.assertEqual(result, expected)
def test_multi_line_word_spaces_parse_error(self):
cases = (
'shi \\\nny',
'shi\\ \nny',
'sh\\\n iny',
'shi\\\n\tny',
)
for case in cases:
self.assertRaises(
pyparsing.ParseException,
multiline.multi_line_word.parseString,
case
)
if __name__ == '__main__':
unittest.main()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
经过一番探索后,我发现了 这个帮助线程这是值得注意的一点吗
有了这个,我想到将这两行更改
为
这让它输出我正在寻找的内容:
['super', 'cali', fragi', 'listic']< /代码>。
接下来,我添加了一个解析操作,将这些标记连接在一起:
这给出了
['supercalifragilistic']
的最终输出。我学到的重要信息是,一个人并不是简单地走进魔多 。
只是在开玩笑。
最重要的是,我们不能简单地使用 pyparsing 实现 BNF 的一对一转换。应该使用一些使用迭代类型的技巧。
编辑 2009-11-25: 为了补偿更繁重的测试用例,我将代码修改为以下内容:
这样做的好处是确保任何元素之间没有空格(使用转义反斜杠后的换行符除外)。
After poking around for a bit more, I came upon this help thread where there was this notable bit
With that, I got the idea to change these two lines
To
This got it to output what I was looking for:
['super', 'cali', fragi', 'listic']
.Next, I added a parse action that would join these tokens together:
This gives a final output of
['supercalifragilistic']
.The take home message I learned is that one doesn't simply walk into Mordor.
Just kidding.
The take home message is that one can't simply implement a one-to-one translation of BNF with pyparsing. Some tricks with using the iterative types should be called into use.
EDIT 2009-11-25: To compensate for the more strenuous test cases, I modified the code to the following:
This has the benefit of making sure that no space comes between any of the elements (with the exception of newlines after the escaping backslashes).
你的代码非常接近。这些 mod 中的任何一个都可以工作:
正如您在 pyparsing 谷歌搜索中发现的那样,BNF->pyparsing 翻译应该以特殊的视角来完成,以使用 pyparsing 功能来代替 BNF,嗯,缺点。实际上,我正在撰写一个较长的答案,讨论更多的 BNF 翻译问题,但您已经找到了此材料(我猜是在 wiki 上)。
You are pretty close with your code. Any of these mods would work:
As you found in your pyparsing googling, BNF->pyparsing translations should be done with a special view to using pyparsing features in place of BNF, um, shortcomings. I was actually in the middle of composing a longer answer, going into more of the BNF translation issues, but you have already found this material (on the wiki, I assume).