动态与静态编译器 (JavaScript)

发布于 2024-12-01 03:41:55 字数 522 浏览 7 评论 0原文

我目前正在用 ANTLR+Java 编写一个 JavaScript 编译器。

我在 Stack Overflow 上读到过关于如何继续执行的问题 - 答案始终是对动态语言进行静态编译(没有 JIT 信息)太难 - 但这到底是为什么?当然存在明显的“类型解析”问题,并且在 JavaScript 中可能存在 eval 函数的问题 - 但还有其他原因吗? (因为它们似乎不太难克服纯静态(无 JITS))

我排除基于 JIT 的编译,因为我认为它对我来说太难实现。

我在编写具有字节码执行的静态编译器方面有一些经验。

更新:

您的所有答案都非常有助于理解问题。 澄清一下,这是否意味着 JavaScript 比其他动态语言更难实现?

这是否也意味着我更好地使用基于树的解释器而不是字节码(如果我们忘记了 JS 总是以原始源代码提供的属性 - 因此增加了生成和 IR 然后执行它的额外时间) ? - 或者它们应该同样容易/困难地做到吗?

(我是 SOF 新手;不知道这是否是更新问题的首选方式?)

I'm currently writing a JavaScript compiler in ANTLR+Java.

I've read questions here on Stack Overflow on how to proceed with the execution - and the answer is always that it would be way too hard to do a static compilation (without JIT-information) of a dynamic language - but why is that exactly? There are of course the obvious "type resolving" problem and in JavaScript maybe a problem with the eval function - but are there other reasons? (because they don't seem too hard to overcome pure statically (no JITS))

I'm excluding JIT-based compilation because I figure it would be too hard for me to implement.

I have some experience in writing static compilers with a byte-code execution.

UPDATE:

All your answers are really helpfull understanding the problem.
To clarify does this mean that JavaScript is harder to implement than other dynamic languages?

And does this also means that im better of using a Tree-based interpreter than e.g. Byte-code (if we forget about the property that JS always is shipped in raw source code - hence adding extra time for generating and IR and afterwards execute it)? - or should they be about equally easy / hard to do?

(Im new to SOF; dont know if this is the preferred way to update a question?)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

浪菊怪哟 2024-12-08 03:41:55

这场对话可以有很多种方式进行。这是一个方向。在 javascript 中,几乎所有东西都是对象,并且可以在运行时将属性或方法添加到任何对象。因此,您在编译时不知道哪些方法或属性将或不会附加到对象。因此,一切都必须在运行时进行查找。

例如:

var myObj = {};

function configureObject() {
    if (something in the environment) {
        myObj.myfunc = function () {alert("Hi");}
    } else {
        myObj.myfunc = function () {document.write("Hello");}
    }
}

现在,稍后在代码中调用 myObj.myfunc(); 在编译时不知道 myfunc 是什么,也不知道它是否是 myfunc 的属性。代码>myObj。它必须是运行时查找。

再举个例子,看这行代码:

var c = a + b;

他的意思完全取决于 a 和 b 的类型,而这些类型在编译时是未知的。

如果a和b都是数字,那么这是一个加法语句,c将是一个数字。

如果 a 或 b 是字符串,则另一个将被强制为字符串,而 c 将是字符串。

您无法将此类逻辑预编译为本机代码。执行环境必须记录这是对这两个操作数之间的加法运算符的请求,并且它必须(在运行时)检查两个操作数的类型并决定要做什么。

There are lots of ways this conversation could go. Here's one direction. In javascript, nearly everything is an object and properties or methods can be added to any object at run-time. As such, you don't know at compile time what methods or properties will or won't be attached to an object. As such, everything has to be looked up at run-time.

For example:

var myObj = {};

function configureObject() {
    if (something in the environment) {
        myObj.myfunc = function () {alert("Hi");}
    } else {
        myObj.myfunc = function () {document.write("Hello");}
    }
}

Now, sometime later in the code you call myObj.myfunc(); It is not known at compile time what myfunc is or whether it's even an attribute of myObj. It has to be a run-time lookup.

In another example, take this line of code:

var c = a + b;

What his means depends entirely upon the types of a and b and those types are not known at compile time.

If a and b are both numbers, then this is an addition statement and c will be a number.

If either a or b is a string, then the other will be coerced to a string and c will be a string.

You can't precompile this kind of logic into native code. The execution environment has to record that this is a request for the addition operator between these two operands and it has to (at runtime) examine the types of the two operands and decide what to do.

梦里梦着梦中梦 2024-12-08 03:41:55

编写静态 JavaScript 编译器的挑战在于,通常很难确定在任何程序点引用了哪些对象或调用了哪些函数。我可以利用 JavaScript 是动态的这一事实来根据某些图灵机的输出来决定调用哪个函数。例如:

var functionName = RunTuringMachineAndReportOutputOnTape(myTM, myInput);
eval(functionName + "();");

此时,除非您预先了解 myTMmyInput 是什么,否则证明不可能决定使用什么函数通过调用 eval 来调用,因为如果图灵机停止,则无法确定其磁带上的内容(您可以将停止问题简化为此问题)。因此,无论您多么聪明,无论您构建的静态分析器多么优秀,您都永远无法正确地静态解析所有函数调用。您甚至无法绑定此处可能调用的函数集,因为图灵机的输出可能定义一些函数,然后由上述代码执行。

您可以做的是编译代码,每当调用函数时,包含额外的逻辑来解析调用,并可能使用诸如 内联缓存以加快速度。此外,在某些情况下,您可能能够证明正在调用某个函数(或者将调用少数函数之一),然后可以在这些调用中进行硬编码。您还可以编译一段代码的多个版本,每个版本对应一种常见类型(对象、数字等),然后发出代码以跳转到基于动态类型的适当编译跟踪。

The challenge with writing a static JavaScript compiler is that it is in general undecidably hard to determine what objects are being referenced at any program point or what functions are being called. I could use the fact that JavaScript is dynamic to decide which function to call based on the output of some Turing machine. For example:

var functionName = RunTuringMachineAndReportOutputOnTape(myTM, myInput);
eval(functionName + "();");

At this point, unless you have advance knowledge about what myTM and myInput are, it is provably impossible to decide what function will be invoked by the call to eval, since it's undecidable to determine what is on a Turing machine's tape if it halts (you can reduce the halting problem to this problem). Consequently, no matter how clever you are, and no matter how good of a static analyzer you build, you will never be able to correctly statically resolve all function calls. You can't even bound the set of functions that might be called here, since the Turing machine's output might define some function that is then executed by the above code.

What you can do is compile code that, whenever a function is called, includes extra logic to resolve the call, and possibly uses techniques like inline caching to speed things up. Additionally, in some cases you might be able to prove that a certain function is being called (or that one of a small number of functions will be called) and can then hardcode in those calls. You could also compile multiple versions of a piece of code, one for each common type (object, numeric, etc.), then emit code to jump to the appropriate compiled trace based on the dynamic type.

左秋 2024-12-08 03:41:55

V8 就是这么做的。请参阅使用 V8 将 JavaScript 编译为原生代码

使用 EcmaScript 3 和 5 非-strict 作用域周围存在许多在其他动态语言中不会遇到的问题。您可能认为对局部变量进行编译器优化很容易,但是语言中存在一些边缘情况,但事实并非如此,甚至忽略 eval 的作用域内省。

考虑一下,

function f(o, x, y) {
  with (o) { return x + y + z; }
}

当使用

o = {};
o = { z: 3 };
o = { x: 1, z: 2 };
Object.prototype.z = 3, o = {};

EcmaScript 3 进行调用时,

x = (function () { return toString(); })()

应该会产生完全不同的结果,

x = toString();

因为 EcmaScript 3 将激活记录定义为具有原型链的对象。

V8 does that. See Compile JavaScript to Native Code with V8

With EcmaScript 3 and 5 non-strict there are a number of wrinkles around scopes which you don't run into in other dynamic languages. You might think that it is easy to do compiler optimizations on local variables, but there are edge cases in the language when it is not, even ignoring eval's scope introspection.

Consider

function f(o, x, y) {
  with (o) { return x + y + z; }
}

when called with

o = {};
o = { z: 3 };
o = { x: 1, z: 2 };
Object.prototype.z = 3, o = {};

and according to EcmaScript 3,

x = (function () { return toString(); })()

should produce quite a different result from

x = toString();

because EcmaScript 3 defines an activation record as an object with a prototype chain.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文