Pyparsing - 在不同位置带有换行符的文字文本

发布于 2024-12-17 00:52:52 字数 617 浏览 0 评论 0原文

我正在使用 pyparsing 来解析包含行结束位置不同的文本的文档。我需要编写一个与文本匹配的解析器表达式,无论换行符位置如何。以下内容不起作用:

from __future__ import print_function
from pyparsing import *

string_1 = """The quick brown 
fox jumps over the lazy dog.
"""

string_2 = """The quick brown fox jumps
over the lazy dog.
"""

my_expr = Literal(string_1)
print(my_expr.searchString(string_1)
print(my_expr.searchString(string_2)

这会导致控制台上显示以下内容:

[['The quick brown \nfox jumps over the lazy dog.\n']]
[]

由于 ParserElement.DEFAULT_WHITE_CHARS 中包含换行符,因此我不明白为什么两个字符串与我的表达式不匹配。如何创建一个解析器元素,无论换行符发生在哪里,它都会匹配文本?

I'm using pyparsing to parse documents containing text in which the line ends vary in location. I need to write a parser expression that matches the text regardless of line break location. The following does NOT work:

from __future__ import print_function
from pyparsing import *

string_1 = """The quick brown 
fox jumps over the lazy dog.
"""

string_2 = """The quick brown fox jumps
over the lazy dog.
"""

my_expr = Literal(string_1)
print(my_expr.searchString(string_1)
print(my_expr.searchString(string_2)

This results in the following being displayed on the console:

[['The quick brown \nfox jumps over the lazy dog.\n']]
[]

Since line breaks are included in ParserElement.DEFAULT_WHITE_CHARS, I don't understand why both strings do not match my expression. How do I create a parser element which DOES match text regardless of where the line breaks occur?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

蓦然回首 2024-12-24 00:52:52

你的问题是一个很好的例子,说明了为什么我不鼓励人们定义带有嵌入空格的文字,因为这会破坏 pyparsing 的内置空格跳过功能。 Pyparsing 会跳过表达式之间的空格。在您的情况下,您仅指定一个表达式,即一个包含整个单词字符串(包括单词之间的空格)的 Literal 。

您可以通过将字符串分解为单独的文字来跳过空格(将字符串添加到 pyparsing 表达式会自动从该字符串构造一个文字):

from pyparsing import *

my_expr = Literal("The") + "quick" + "brown" + "fox" + "jumps" + "over" + "the" + "lazy" + "dog"

string_1 = """The quick brown 
fox jumps over the lazy dog.
"""

string_2 = """The quick brown fox jumps
over the lazy dog.
"""

for test in (string_1, string_2):
    print '-'*40
    print test
    print my_expr.parseString(test)
    print

如果您不喜欢输入所有这些单独的带引号的字符串,则可以让 Python 拆分字符串为您做好准备,将它们映射到文字,并提供整个列表以构成 pyparsing 并且:

my_expr = And(map(Literal, "The quick brown fox jumps over the lazy dog".split()))

如果您想保留原始空白,请将表达式包装在 originalTextFor 中:

my_expr = originalTextFor(my_expr)

Your question is a good example of why I discourage people from defining literals with embedded whitespace, because this defeats pyparsing's built-in whitespace skipping. Pyparsing skips over whitespace between expressions. In your case, you are specifying only a single expression, a Literal comprising an entire string of words, including whitespace between them.

You can get whitespace skipped by breaking your string up into separate Literals (adding a string to a pyparsing expression automatically constructs a Literal from that string):

from pyparsing import *

my_expr = Literal("The") + "quick" + "brown" + "fox" + "jumps" + "over" + "the" + "lazy" + "dog"

string_1 = """The quick brown 
fox jumps over the lazy dog.
"""

string_2 = """The quick brown fox jumps
over the lazy dog.
"""

for test in (string_1, string_2):
    print '-'*40
    print test
    print my_expr.parseString(test)
    print

If you don't like typing all those separate quoted strings, you can have Python split the string up for you, map them to Literals, and feed the whole list to make up a pyparsing And:

my_expr = And(map(Literal, "The quick brown fox jumps over the lazy dog".split()))

If you want to preserve the original whitespace, wrap your expression in originalTextFor:

my_expr = originalTextFor(my_expr)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文