PLY:快速解析长列表?
我正在 PLY 中使用一个相当简单的解析器,我的规则之一采用以下形式:
def p_things(p):
'''
things : thing things
things : thing
'''
p[0] = [p[1]]
if len(p) == 3:
p[0] += p[2]
输入文件通常是简单的事物列表,因此解析本身并不复杂。然而,我的一些输入文件非常大(经常超过 100,000 行,在极端情况下超过 1,000,000 行)。在分析中(通过 cProfile 和 pstats),大部分运行时间被重复占用调用 p_things
- 大概是对 things
列表中的每一项进行一次调用。
有没有办法减少这个时间,或者有更有效的方法来构建这个规则?到目前为止,我看到的大多数答案(以及我找到的规范编译器信息)都将此方法列为构造可解析项目列表的普遍接受的方法,无论长度如何。
I'm working with a fairly simple parser in PLY, and one of my rules takes on the following form:
def p_things(p):
'''
things : thing things
things : thing
'''
p[0] = [p[1]]
if len(p) == 3:
p[0] += p[2]
Input files are generally simple lists of thing
s, and so the parsing itself is not complex. Some of my input files are very large, however (exceeding 100,000 lines fairly regularly, and over 1,000,000 in extreme cases). In profiling (via cProfile and pstats), the bulk of the runtime is taken up by repeated calls to p_things
- presumably, one call for each item in a things
list.
Is there a way to reduce this time, or a more efficient way to structure this rule? Most answers I've seen so far (and the canonical compilers info I've found) have listed this method as the generally accepted way to construct a list of parseable items, no matter the length.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
事实证明我忘记了一些基本的编译器理论。 PLY 是一个 LALR(1) 解析器,因此最好将规则编写为:
虽然它可能看起来更冗长,但实际上有一个显着的改进 - 在 PLY 或 Python 中的某个地方,解析器能够在左侧应用一些优化- 递归形式。我发现在较大的输入文件上,性能从指数下降到线性;一个示例的
things
列表中包含超过一百万个项目,其运行时间不到 20%。Turns out I'm forgetting some of my basic compilers theory. PLY is a LALR(1) parser, and so it's better to write the rule as:
Though it may look more verbose, there's actually a significant improvement - somewhere in either PLY or Python, the parser was able to apply some optimization on the left-recursive form. I've seen performance drop from exponential to linear on my larger input files; one sample, with over a million items in the
things
list, ran in under 20% of the time.在不更改代码的情况下,您可以尝试使用“PyPy”版本的 python 进行即时编译——它可能比普通 CPython 更快地运行您的代码。
Without changing your code, you could try using the "PyPy" version of python with just-in-time compilation -- it might run your code way faster than normal CPython.
总结一下:
+=
的使用有关)为了更好地理解Ioannis Filippidis的评论,更容易实际可视化它。我认为这就是我的意思,也是我最终得到的结果。
So to summarize:
+=
)To better understand Ioannis Filippidis' comment it's easier to actually visualize it. This is what I think was meant and approximately what I ended up with as well.