当前位置：文江博客话题详情

如何使用字节码来优化动态语言的执行时间？

发布于 2024-09-16 04:57:59 字数 58 浏览 6 评论 0原文

我对一些优化方法或通用字节码设计感兴趣，与 AST 的解释相比，这可能有助于使用 VM 加快执行速度。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

说好的呢 2024-09-23 04:57:59

AST 解释与字节码相比的主要优势是操作调度成本，对于高度优化的解释器来说，这开始成为一个真正的问题。 “调度”是用于描述开始执行操作（例如算术、属性访问等）所需的开销的术语。

一个相当普通的基于 AST 的解释器看起来像这样：

class ASTNode {
    virtual double execute() = 0;
}

class NumberNode {
    virtual double execute() { return m_value; }
    double m_value;
}

class AddNode {
    virtual double execute() { return left->execute() + right->execute(); }
}

因此执行像 1+1 这样简单的代码需要 3 个虚拟调用。由于进行呼叫的多个间接途径以及首先进行呼叫的一般成本，虚拟呼叫非常昂贵（从总体上看）。

在字节码解释器中，您有一个不同的调度模型 - 而不是虚拟调用，您有一个执行循环，类似于：

while (1) {
    switch (op.type) {
        case op_add:
            // Efficient interpreters use "registers" rather than
            // a stack these days, but the example code would be more
            // complicated
            push(pop() + pop());
            continue;
        case op_end:
            return pop();
    }
}

与本机代码相比，这仍然具有相当昂贵的调度成本，但比虚拟调度快得多。您可以使用名为“计算转到”的 gcc 扩展进一步提高性能，该扩展允许您删除交换机调度，从而将总调度成本基本上减少到单个间接分支。

除了提高调度成本之外，基于字节码的解释器比 AST 解释器还有许多额外的优势，主要是由于字节码能够像真实机器一样“直接”跳转到其他位置，例如想象一下这样的代码片段：

while (1) {
    ...statements...
    if (a)
        break;
    else
        continue;
}

要在每次执行语句时正确实现此功能，您需要指示执行是停留在循环中还是停止，因此执行循环变得类似于：

while (condition->execute() == true) {
    for (i = 0; i < statements->length(); i++) {
        result = statements[i]->execute();
        if (result.type == BREAK)
            break;
        if (result.type == CONTINUE)
            i = 0;
    }
}

随着您添加更多形式的流控制，此信号变得越来越昂贵。一旦添加了异常（例如，到处都可能发生的流量控制），您就开始需要在基本算术中间检查这些事情，从而导致开销不断增加。如果您想在现实世界中看到这一点，我鼓励您查看 ECMAScript 规范，其中他们用 AST 解释器描述了执行模型。

在字节码解释器中，这些问题基本上消失了，因为字节码能够直接表达控制流，而不是通过信令间接表达，例如。 continue 只是简单地转换为跳转指令，只有在实际命中时您才会获得该成本。

最后，根据定义，AST 解释器是递归的，因此必须防止系统堆栈溢出，这对代码中可以递归的程度施加了非常严格的限制，例如：

1+(1+(1+(1+(1+(1+(1+(1+1)))))))

在口译员——这可能是一笔非常大的费用；旧版本的 Safari（SquirrelFish 之前）使用 AST 解释器，因此 JS 只允许几百级递归，而现代浏览器允许 1000 级递归。

The main win in AST interpretation vs. bytecode is operation dispatch cost, for highly optimised interpreters this starts to become a real problem. "Dispatch" is the term used to describe the overhead required to start executing an operation (such as arithmetic, property access, etc).

A fairly normal AST based interpreter would look something like this:

class ASTNode {
    virtual double execute() = 0;
}

class NumberNode {
    virtual double execute() { return m_value; }
    double m_value;
}

class AddNode {
    virtual double execute() { return left->execute() + right->execute(); }
}

So executing the code for something as simple as 1+1 requires 3 virtual calls. Virtual calls a very expensive (in the grand scheme of things) due to the multiple indirections to make the call, and the general cost of making a call in the first place.

In a bytecode interpreter you have you a different dispatch model -- rather than virtual calls you have an execution loop, akin to:

while (1) {
    switch (op.type) {
        case op_add:
            // Efficient interpreters use "registers" rather than
            // a stack these days, but the example code would be more
            // complicated
            push(pop() + pop());
            continue;
        case op_end:
            return pop();
    }
}

This still has a reasonably expensive dispatch cost vs native code, but is much faster than virtual dispatch. You can further improve perf using a gcc extension called "computed goto" which allows you to remove the switch dispatch, reducing total dispatch cost to basically a single indirect branch.

In addition to improving dispatch costs bytecode based interpreters have a number of additional advantages over AST interpreters, mostly due to the ability of the bytecode to "directly" jump to other locations as a real machine would, for example imagine a snippet of code like this:

while (1) {
    ...statements...
    if (a)
        break;
    else
        continue;
}

To implement this correctly everytime a statement is executed you would need to indicate whether execution is meant to stay in the loop or stop, so the execution loop becomes something like:

while (condition->execute() == true) {
    for (i = 0; i < statements->length(); i++) {
        result = statements[i]->execute();
        if (result.type == BREAK)
            break;
        if (result.type == CONTINUE)
            i = 0;
    }
}

As you add more forms of flow control this signalling becomes more and more expensive. Once you add exceptions (eg. flow control that can happen everywhere) you start needing to check for these things in the middle of even basic arithmetic, leading to ever increasing overhead. If you want to see this in the real world I encourage you to look at the ECMAScript spec, where they describe the execution model in terms of an AST interpreter.

In a bytecode interpreter these problems basically go away, as the bytecode is able to directly express control flow rather than indirectly through signalling, eg. continue is simply converted into a jump instruction, and you only get that cost if it's actually hit.

Finally an AST interpreter by definition is recursive, and so has to be prevented from overflowing the system stack, which puts very heavy restrictions on how much you can recurse in your code, something like:

1+(1+(1+(1+(1+(1+(1+(1+1)))))))

Has 8 levels of recursion (at least) in the interpreter -- this can be a very significant cost; older versions of Safari (pre-SquirrelFish) used an AST interpreter, and for this reason JS was allowed only a couple of hundred levels of recursion vs 1000's allowed in modern browsers.

回复收藏 0 原文