语言和虚拟机：难以优化的功能及其原因

发布于 2024-08-26 06:53:17 字数 1485 浏览 14 评论 0原文

我正在对功能进行调查，为研究项目做准备。

说出一种难以优化的主流语言或语言功能，以及为什么该功能值得或不值得付出的代价，或者只是用轶事证据揭穿我的理论。在有人将此标记为主观之前，我要求提供语言或功能的具体示例，以及优化这些功能的想法，或者我尚未考虑的重要功能。另外，任何对证明我的理论正确或错误的实现的引用。

在我的难以优化功能和理论列表中名列前茅（我的一些理论未经测试并且基于思想实验）：

1）运行时方法重载（又名多方法分派或基于签名的分派）。与允许运行时重新编译或方法添加的功能结合使用时是否很难优化？或者说这真的很难吗？调用站点缓存是许多运行时系统的常见优化，但多种方法增加了额外的复杂性，并且使其对于内联方法不太实用。

2) 类型变形/变体（又名基于值的类型，而不是基于变量的类型）当您不知道基本块中某些类型是否可以更改时，传统优化根本无法应用。与多种方法相结合，内联必须仔细进行（如果有的话），并且可能仅针对被调用者大小的给定阈值。 IE。考虑内联简单的属性获取（getter / setter）很容易，但内联复杂的方法可能会导致代码膨胀。另一个问题是我不能只将一个变体分配给寄存器并将其 JIT 到本机指令，因为我必须携带类型信息，或者每个变量需要 2 个寄存器而不是 1 个。在 IA-32 上，这很不方便，即使通过 x64 的额外寄存器进行改进。这可能是我最喜欢的动态语言功能，因为它从程序员的角度简化了很多事情。

3) 第一类延续 - 有多种方法可以实现它们，我已经使用了两种最常见的方法，一种是堆栈复制，另一种是实现运行时以使用延续传递风格、仙人掌堆栈、写时复制堆栈帧和垃圾收集。一流的延续存在资源管理问题，即。我们必须保存所有内容，以防继续继续，我不知道是否有任何语言支持以“意图”留下继续（即“我不会回到这里，所以你可以丢弃这个世界的副本” ）。在线程模型和延续模型中进行编程后，我知道两者都可以完成相同的事情，但是延续的优雅给运行时带来了相当大的复杂性，并且还可能影响缓存效率（堆栈的局部性随着使用延续和协同例程而改变更多））。另一个问题是它们不映射到硬件。优化延续是针对不太常见的情况进行优化，众所周知，常见的情况应该是快速的，而不太常见的情况应该是正确的。

4）指针算术和屏蔽指针的能力（以整数存储等）不得不把它扔进去，但实际上我可以很容易地没有它。

我的感觉是，许多高级功能，尤其是动态语言中的功能只是不映射到硬件。微处理器实现在芯片优化背后花费了数十亿美元的研究，但是语言功能的选择可能会边缘化其中许多功能（例如缓存、别名堆栈顶部以进行寄存器、指令并行性、返回地址缓冲区、循环等功能）缓冲区和分支预测）。微观功能的宏观应用并不一定像一些开发人员想的那样成功，在虚拟机中实现许多语言最终会将本机操作映射到函数调用中（即，语言越动态，我们就越需要查找/在运行时缓存，没有什么可以假设的，因此我们的指令组合由比传统的静态编译代码更高比例的非本地分支组成），并且我们唯一能真正做好 JIT 的是非动态类型的表达式求值和对常量或立即数类型的操作。我的直觉是，字节码虚拟机和 JIT 核心可能并不总是适合某些语言，因此。

我欢迎您的回答。

原文

I'm doing a survey of features in preparation for a research project.

Name a mainstream language or language feature that is hard to optimize, and why the feature is or isn't worth the price paid, or instead, just debunk my theories below with anecdotal evidence. Before anyone flags this as subjective, I am asking for specific examples of languages or features, and ideas for optimization of these features, or important features that I haven't considered. Also, any references to implementations that prove my theories right or wrong.

Top on my list of hard to optimize features and my theories (some of my theories are untested and are based on thought experiments):

1) Runtime method overloading (aka multi-method dispatch or signature based dispatch). Is it hard to optimize when combined with features that allow runtime recompilation or method addition. Or is it just hard, anyway? Call site caching is a common optimization for many runtime systems, but multi-methods add additional complexity as well as making it less practical to inline methods.

2) Type morphing / variants (aka value based typing as opposed to variable based)
Traditional optimizations simply cannot be applied when you don't know if the type of someting can change in a basic block. Combined with multi-methods, inlining must be done carefully if at all, and probably only for a given threshold of size of the callee. ie. it is easy to consider inlining simple property fetches (getters / setters) but inlining complex methods may result in code bloat. The other issue is I cannot just assign a variant to a register and JIT it to the native instructions because I have to carry around the type info, or every variable needs 2 registers instead of 1. On IA-32 this is inconvenient, even if improved with x64's extra registers. This is probably my favorite feature of dynamic languages, as it simplifies so many things from the programmer's perspective.

3) First class continuations - There are multiple ways to implement them, and I have done so in both of the most common approaches, one being stack copying and the other as implementing the runtime to use continuation passing style, cactus stacks, copy-on-write stack frames, and garbage collection. First class continuations have resource management issues, ie. we must save everything, in case the continuation is resumed, and I'm not aware if any languages support leaving a continuation with "intent" (ie. "I am not coming back here, so you may discard this copy of the world"). Having programmed in the threading model and the contination model, I know both can accomplish the same thing, but continuations' elegance imposes considerable complexity on the runtime and also may affect cache efficienty (locality of stack changes more with use of continuations and co-routines). The other issue is they just don't map to hardware. Optimizing continuations is optimizing for the less-common case, and as we know, the common case should be fast, and the less-common cases should be correct.

4) Pointer arithmetic and ability to mask pointers (storing in integers, etc.) Had to throw this in, but I could actually live without this quite easily.

My feelings are that many of the high-level features, particularly in dynamic languages just don't map to hardware. Microprocessor implementations have billions of dollars of research behind the optimizations on the chip, yet the choice of language feature(s) may marginalize many of these features (features like caching, aliasing top of stack to register, instruction parallelism, return address buffers, loop buffers and branch prediction). Macro-applications of micro-features don't necessarily pan out like some developers like to think, and implementing many languages in a VM ends up mapping native ops into function calls (ie. the more dynamic a language is the more we must lookup/cache at runtime, nothing can be assumed, so our instruction mix is made up of a higher percentage of non-local branching than traditional, statically compiled code) and the only thing we can really JIT well is expression evaluation of non-dynamic types and operations on constant or immediate types. It is my gut feeling that bytecode virtual machines and JIT cores are perhaps not always justified for certain languages because of this.

I welcome your answers.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

雨后咖啡店 2024-09-02 06:53:17

几点评论：

所有具有动态调度的语言，即使基于单个对象，似乎也很难有效地实现。看看为 Self 运行时优化所做的所有努力（或者最近的 JavaScript，使用 SpiderMonkey）。
不要忽视分隔延续。目前尚无定论，但它们比经典的无限制延续更容易优化。阅读 Gasbichler 和 Sperber 的论文。

回复收藏 0 原文