Pyparsing - 在不同位置带有换行符的文字文本
我正在使用 pyparsing 来解析包含行结束位置不同的文本的文档。我需要编写一个与文本匹配的解析器表达式,无论换行符位置如何。以下内容不起作用:
from __future__ import print_function
from pyparsing import *
string_1 = """The quick brown
fox jumps over the lazy dog.
"""
string_2 = """The quick brown fox jumps
over the lazy dog.
"""
my_expr = Literal(string_1)
print(my_expr.searchString(string_1)
print(my_expr.searchString(string_2)
这会导致控制台上显示以下内容:
[['The quick brown \nfox jumps over the lazy dog.\n']]
[]
由于 ParserElement.DEFAULT_WHITE_CHARS 中包含换行符,因此我不明白为什么两个字符串与我的表达式不匹配。如何创建一个解析器元素,无论换行符发生在哪里,它都会匹配文本?
I'm using pyparsing to parse documents containing text in which the line ends vary in location. I need to write a parser expression that matches the text regardless of line break location. The following does NOT work:
from __future__ import print_function
from pyparsing import *
string_1 = """The quick brown
fox jumps over the lazy dog.
"""
string_2 = """The quick brown fox jumps
over the lazy dog.
"""
my_expr = Literal(string_1)
print(my_expr.searchString(string_1)
print(my_expr.searchString(string_2)
This results in the following being displayed on the console:
[['The quick brown \nfox jumps over the lazy dog.\n']]
[]
Since line breaks are included in ParserElement.DEFAULT_WHITE_CHARS, I don't understand why both strings do not match my expression. How do I create a parser element which DOES match text regardless of where the line breaks occur?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
你的问题是一个很好的例子,说明了为什么我不鼓励人们定义带有嵌入空格的文字,因为这会破坏 pyparsing 的内置空格跳过功能。 Pyparsing 会跳过表达式之间的空格。在您的情况下,您仅指定一个表达式,即一个包含整个单词字符串(包括单词之间的空格)的 Literal 。
您可以通过将字符串分解为单独的文字来跳过空格(将字符串添加到 pyparsing 表达式会自动从该字符串构造一个文字):
如果您不喜欢输入所有这些单独的带引号的字符串,则可以让 Python 拆分字符串为您做好准备,将它们映射到文字,并提供整个列表以构成 pyparsing 并且:
如果您想保留原始空白,请将表达式包装在
originalTextFor
中:Your question is a good example of why I discourage people from defining literals with embedded whitespace, because this defeats pyparsing's built-in whitespace skipping. Pyparsing skips over whitespace between expressions. In your case, you are specifying only a single expression, a Literal comprising an entire string of words, including whitespace between them.
You can get whitespace skipped by breaking your string up into separate Literals (adding a string to a pyparsing expression automatically constructs a Literal from that string):
If you don't like typing all those separate quoted strings, you can have Python split the string up for you, map them to Literals, and feed the whole list to make up a pyparsing And:
If you want to preserve the original whitespace, wrap your expression in
originalTextFor
: