即使效率很低,如何解释代码呢? (理论)

发布于 2024-09-17 23:37:57 字数 1010 浏览 7 评论 0原文

好吧,首先,我不想在这里发生任何形式的口水战或类似的事情。我的更大问题更加理论化,并且将包括一些例子。

因此,正如我所写,我无法理解解释性语言的效率为何如此之低。由于它是现代的,我将以Java为例。

让我们回到没有 JIT 编译器的日子。 Java有它的虚拟机,它基本上是它的硬件。您编写代码,然后将其编译成字节码,以便至少从虚拟机中承担一些工作,这很好。但考虑到 RISC 指令集在硬件中的复杂程度,我什至想不出在软件模拟硬件上做到这一点的方法。

我没有编写虚拟机的经验,所以我不知道它是如何以最有效的水平完成的,但我想不出有什么比测试每条指令的匹配和执行适当的操作更有效的了。你知道,类似: if(instruction=="something") { (do it) } else if(instruction=="something_diffrent"){ (do it) }etc....

但是这一定非常慢。尽管如此,即使有文章说 java 在 JIT 编译器之前很慢,他们仍然说它并没有那么慢。但为了进行仿真,必须花费真实硬件的许多时钟周期来执行一条字节码指令。

而且,甚至整个平台都是基于 java 的。例如,安卓。 Android 的第一个版本没有 JIT 编译器。他们被解释了。但 Android 不应该非常慢吗?但事实并非如此。我知道,当你从 Android 库调用一些 API 函数时,它们是用机器代码编写的,所以它们很高效,所以这有很大帮助。

但想象一下,您将从头开始编写自己的游戏引擎,仅使用 API 来显示图像。您将需要执行许多数组复制操作和许多计算,这些计算在模拟时会非常慢。

现在是我承诺的一些例子。由于我主要使用 MCU,因此我找到了 Atmel AVR MCU 的 JVM。 Thay 表示 8MHZ MCU 每秒可以执行 20K java optcodes。但由于 AVR 可以在一两个周期内完成大多数指令,假设平均 6000000 条指令。这表明,没有 JIT 编译器的 JVM 处理机器代码的速度要慢 300 倍。那么,为什么没有 JIT 编译器,java 就变得如此流行呢?这不是太糟糕的性能损失了吗?我就是无法理解。谢谢。

OK, first, I dont want any kind of flamewar here or anything like it. My bigger question is more theoretical, and will include few examples.

So, as I wrote, I cannot understand how can interpreted language be even little efficient. And since its modern, I will take Java as example.

Lets go back to days where there was no JIT compilers. Java has its virtual machine which is basically its hardware. You write code, than it compiled into bytecode for taking at least some job off the virtual machine, thats fine. But considering how complex even RISC instruction set can be in hardware, I cannot even think of way to do it at software emulated hardware.

I have no experience writing virtual machines, so I dont know how its done at most efficient level, but I cannot think of anything more efifcient than testing every instruction for match adn than do appropriate actions. You know, something like: if(instruction=="something") { (do it) } else if(instruction=="something_diffrent"){ (do it) }etc....

But this has to be terribly slow. And still, even there are articles that java was slow before JIT compilers, they still say that its not so slow. But for emulating it must take many clock cycles of real HW to perform one bytecode instruction.

And still, even entire platforms are based on java. For example, Android. And first verisons of Android had no JIT compiler. They were interpreted. But should not than be Android terribly slow? And yet it is not. I know, when you call some API function, from Android library, they are written in machine code, so they are efficient, so this helps a lot.

But imagine that you would write your own game engine from sratch, using API just for displaying images. You would need to do many array copy operations, many calculations which would be terribly slow when emulated.

And now some examples as I promised. Since I mainly work with MCUs, I found JVM for Atmel AVR MCU. Thay state that 8MHZ MCU can do 20K java optcodes per second. But since AVR can do most instructions in one or two cycles, lets say 6000000 instructions average. This gives us that JVM without JIT compiler is 300 times slower to machine code. So why become java so popular without JIT compiler? Isnt this too bad performance loss? I just cannot understand it. Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

总攻大人 2024-09-24 23:37:57

我们已经使用字节码很长时间了。在旧的 Apple II 上,USCD p 系统非常流行,它将 Pascal 编译为字节代码,可由可能以 2 MHz 运行的 8 位 6502 进行解释。这些程序确实运行得相当快。

字节码解释器通常基于跳转表而不是 if/then/else 语句链。在 C 或 C++ 中,这将涉及 switch 语句。从根本上讲,解释器将具有相当于处理代码数组的功能,并使用字节码指令中的操作码作为数组的索引。

也可以使用比机器指令更高级别的字节代码,以便一个字节代码指令可以转换为多个(有时是多个)机器代码指令。为特定语言构建的字节代码可以相当容易地做到这一点,因为它只需要匹配该特定语言的控制和数据结构。这延长了解释开销并使解释器更加高效。

与编译语言相比,解释语言可能会有一些速度损失,但这通常并不重要。许多程序以人类的速度处理输入和输出,这会浪费大量的性能。即使是受网络限制的程序也可能拥有比其需要更多的可用 CPU 能力。有些程序可以利用它们可以获得的所有 CPU 效率,但由于显而易见的原因,它们往往不是用解释语言编写的。

当然,还有一个问题是,某些低效率的行为可能会产生影响,也可能不会产生影响。解释型语言实现往往比编译型实现更容易移植,并且实际的字节代码通常是可移植的。将更高级别的功能放入该语言中会更容易。它允许编译步骤更短,这意味着执行可以更快地启动。如果出现问题,它可以提供更好的诊断。

We've had byte code around for a long time. On the old Apple II, the USCD p-system was very popular, which compiled Pascal into byte code, which would be interpreted by an 8-bit 6502 that might be running at 2 MHz. Those programs did run reasonably fast.

A bytecode interpreter would generally be based on a jump table rather than a chain of if/then/else statements. In C or C++, this would involve a switch statement. Fundamentally, the interpreter would have the equivalent of an array of processing code, and use the opcode in the byte code instruction as the index of the array.

It's also possible to have byte code that's higher-level than the machine instructions, so that one byte code instruction would translate into several, sometimes numerous, machine code instructions. A byte code that was constructed for a particular language can do this fairly easily, as it only has to match the control and data structures of that particular language. This stretches out the interpretation overhead and makes the interpreter more efficient.

An interpreted language is likely to have some speed penalty when compared to a compiled language, but this is often unimportant. Many programs process input and output at human speed, and that leaves a tremendous amount of performance that can be wasted. Even a network-bound program is likely to have far more CPU power available than it needs. There are programs that can use all the CPU efficiency they can get, and for obvious reasons they tend not to be written in interpreted languages.

And, of course, there's the question of what you get for some inefficiency that may or may not make a difference. Interpreted language implementations tend to be easier to port than compiled implementations, and the actual byte code is often portable. It can be easier to put higher-level functionality in the language. It allows the compilation step to be much shorter, meaning that execution can start much faster. It may allow better diagnostics if something goes wrong.

瞄了个咪的 2024-09-24 23:37:57

<块引用>

但是 Android 不应该非常慢吗?

定义“非常慢”。这是一部电话。在拨打第二个数字之前,它必须处理“拨打第一个数字”。

在任何交互式应用程序中,限制因素始终是人类反应时间。它可能比用户慢 100 倍,但仍然比用户快。

所以,回答你的问题,是的,解释器很慢,但它们通常足够快,特别是随着硬件变得越来越快。

请记住,当 Java 被引入时,它是作为一种 Web applet 语言出售的(取代并且现在被 Javascript 取代——它也是解释性的)。 JIT编译之后才开始在服务器上流行起来。

通过使用跳转表,字节码解释器可以比一行 if() 更快:

 void (*jmp_tbl)[256] = ...;  /* array of function pointers */
 byte op = *program_counter++;
 jmp_tbl[op]();

But should not then be Android terribly slow?

Define "terribly slow". It's a phone. It has to process "Dial first digit" before you dial the second digit.

In any interactive application, the limiting factor is always human reaction time. It could be a 100 time slower and still be faster than the user.

So, to answer you question, yes, interpreters are slow, but they are usually fast enough, particularly as hardware keeps getting faster.

Remember when Java was introduced, it was sold as a web applet language (replacing and now replaced by, Javascript --- which also interpreted). It was only after JIT compilation that it became popular on servers.

Bytecode interpreters can be faster that a line of if()s by using a jump table:

 void (*jmp_tbl)[256] = ...;  /* array of function pointers */
 byte op = *program_counter++;
 jmp_tbl[op]();
哀由 2024-09-24 23:37:57

有两种不同的方法来解决这个问题。

(i) “为什么可以运行慢速代码”

正如 James 上面已经提到的,有时执行速度并不是您感兴趣的全部。对于许多在解释模式下运行的应用程序来说可以“足够快”。您必须考虑您正在编写的代码将如何使用。

(ii)“为什么解释代码效率低下”

实现解释器的方法有很多种。在你的问题中,你谈论了最幼稚的方法:基本上是一个大开关,在读取时解释每个 JVM 指令。

但您可以对此进行优化:例如,您可以查看它们的序列,并查找可以提供更有效解释的模式,而不是查看单个 JVM 指令。 Sun 的 JVM 实际上在解释器本身中进行了一些优化。在之前的工作中,一个人花了一些时间以这种方式优化解释器,并且解释后的 Java 字节码在他的更改后运行速度明显更快。

但在包含 JIT 编译器的现代 JVM 中,解释器只是 JIT 完成其工作之前的垫脚石,因此人们实际上并没有花那么多时间来优化解释器。

There are two different ways to approach this question.

(i) "why is it OK to run slow code"

As James already mentioned above, sometimes speed of execution is not all you're interested in. For lots of apps running in interpreted mode can be "fast enough". You have to take into account how the code you're writing will be used.

(ii) "why is interpreted code inneficient"

There are many ways you can implement an interpreter. In your question you talk about the most naïve approach: basically a big switch, interpreting each JVM instruction as it's read.

But you can optimize that: for example, instead of looking at a single JVM instruction, you can look at a sequence of them and look for patterns for which you have more efficient interpretations available. Sun's JVM actually does some of these optimizations in the interpreter itself. In a previous job, a guy took some time to optimize the interpreter that way and interpreted Java bytecode was running noticeably faster after his changes.

But in modern JVMs that contain a JIT compiler, the interpreter is just a stepping stone until the JIT does its job, so people don't really spend that much time optimizing the interpreter.

池予 2024-09-24 23:37:57

12 MHz 是 ATtiny,它是一个 8 位微处理器。这意味着(例如)本机“Add”指令只能将两个 8 位数字相加以获得 9 位结果。JVM 基本上是一个虚拟 32 位处理器。这意味着它的 add 指令将两个 32 位数字相加。 因此,当您比较指令速率

时,您应该预计指令速率至少会降低 4:1,而实际上很容易模拟 32 位。使用 4 个 8 位加法(带进位)进行加法,有些东西的缩放比例不太一样,根据 Atmel 自己的 应用笔记,产生 32 位结果的 16x16 乘法在约 218 个时钟周期内执行。同一个应用笔记显示了 16/16 位除法(产生 8 -bit 结果)运行在 255 个周期内

,我们可以预期 32 位版本的乘法需要约 425-450 个时钟周期,而除法则需要约 510 个周期。开销,这会进一步降低速度——在这些估计值上增加至少 10% 可能会使它们更加现实。

底线:当您将苹果与苹果进行比较时,很明显您所说的速度差异根本不是真实的(或者无论如何都不是归因于 JVM 开销)。

12 MHz would be an ATtiny, which is an 8-bit microprocessor. That means (for example) that a native 'Add" instruction can only add two 8-bit numbers together to get a 9-bit result. The JVM is basically a virtual 32-bit processor. That means its add instruction adds two 32-bit numbers together to produce a 33-bit result.

As such, when you're comparing instruction rates, you should expect a 4:1 reduction in instruction rate as an absolute minimum. In reality, while it's easy to simulate a 32-bit add with 4 8-bit adds (with carries), some things don't scale quite like that. Just for example, according to Atmel's own app note, a 16x16 multiplication producing a 32-bit result executes in ~218 clock cycles. The same app note shows a 16/16 bit division (producing an 8-bit result) running in 255 cycles.

Assuming those scale linearly, we can expect 32-bit versions of the multiplication to take ~425-450 clock cycles, and the division ~510 cycles. In reality, we should probably expect a bit of overhead, which would reduce speed still more -- adding at least 10% to those estimates probably makes them more realistic.

Bottom line: when you compare apples to apples, it becomes apparent that a whole lot of the speed difference you're talking isn't real at all (or isn't attributable JVM overhead anyway).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文