口译员:简化了多少?
在我的解释器中,如下代码
x=(y+4)*z
echo x
解析并“优化”为解释器执行的四个单个操作,几乎类似于汇编:
add 4 to y
multiply <last operation result> with z
set x to <last operation result>
echo x
在现代解释器(例如:CPython、Ruby、PHP)中,简化程度如何 解释器运行的“操作码”有多简化?
当尝试使解释器的结构和命令更加复杂和高级时,我能否获得更好的性能?这肯定会困难得多,或者?
In my interpreter, code like the following
x=(y+4)*z
echo x
parses and "optimizes" down to four single operations performed by the interpreter, pretty much assembly-like:
add 4 to y
multiply <last operation result> with z
set x to <last operation result>
echo x
In modern interpreters (for example: CPython, Ruby, PHP), how simplified are the "opcodes" for which are in end-effect run by the interpreter?
Could I achieve better performance when trying to keep the structures and commands for the interpreter more complex and high-level? That would be surely a lot harder, or?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
在Python的情况下,你可以让它告诉你给定函数的字节码 dis 模块。
给你:
其中一些是无关的(例如最后的 LOAD_CONST 和 RETURN_VALUE 用于
foo()
中隐式的return None
),但 Python 似乎推送 y 和4 入栈,加,压入z,乘,然后写入x。然后按下 x 并打印In Python's case, you can have it tell you the bytecode for a given function with the dis module.
gives you:
Some of that is extraneous (e.g. the LOAD_CONST and RETURN_VALUE at the end are for the implicit
return None
infoo()
), but Python appears to push y and 4 onto the stack, add, push z, multiply, and write to x. Then it pushes x and prints尝试一下并看看:)这实际上取决于您未提供的许多因素(并且区域和任务是如此之大,如果您提供了足够的因素,它们将包含一些明显的答案)。其中一个主要因素是您是否(以及如何)实现某些语言功能(或者换句话说,您是否打算使这些功能成为一流),例如:
Try it and see :) it really depends on lots of factors you didn't provide (and the area and task is so huge that if you provided enough factors, they would contain a few obvious answers). One of such major factors are if (and how) are you going to implement some language features (or, to put it in other words, if you are going to make these things first-class), for example:
尝试对操作码进行建模,就像它们模仿解释器的内部工作原理一样。 此页面有一篇关于 .NET 如何从正则表达式生成解释语言的文章。在 .NET 中,正则表达式首先被编译为中间语言。然后该中间代码将被解释。中间代码看起来非常像特定的正则表达式引擎的内部数据结构。
Try modeling your opcodes like they would mimic the internal workings of your interpreter. This page has an article about how .NET generates an interpreted language out of regexes. In .NET the regex is first compiled to an intermediate language. Then that intermediate code will be interpreted. The intermediate code looks very much like the internal data structures of a specific, uhh, regex engine.
经验法则:如果字节码中存在重复模式(例如,每个 GC 控制的堆分配都有一个通用模式),则每个模式都应该有一个特殊的高级操作。
无论如何,如今,有了所有可用的 .NET、JVM、LLVM 东西,如果您真的对解释器的性能感兴趣,那么插入适当的 JIT 编译器确实非常便宜且容易。
A rule of thumb: if there are repeating patterns in your bytecode (e.g., a common pattern for every GC-controlled heap allocation), there should be a special high level operation for every pattern.
Any way, nowdays, with all that .NET, JVM, LLVM stuff available, it's really cheap and easy to plug in a proper JIT compiler, if you're really interested in a performance of your interpreter.