为什么需要虚拟机?
我正在阅读这个问题来找出Java虚拟机和. NET CLR 和 Benji 的回答让我想知道为什么虚拟机首先是必要的。
根据我对Benji的解释的理解,虚拟机的JIT编译器将中间代码解释为在CPU上运行的实际汇编代码。 必须这样做的原因是,CPU 通常具有不同数量的寄存器,根据 Benji 的说法,“一些寄存器是专用的,每条指令都希望其操作数位于不同的寄存器中。” 这是有道理的,因为需要像虚拟机这样的中间解释器,以便相同的代码可以在任何 CPU 上运行。
但是,如果是这样的话,那么我不明白为什么编译成机器代码的 C 或 C++ 代码能够在任何计算机上运行,只要它是正确的操作系统。 那么为什么我在使用 Pentium 的 Windows 机器上编译的 C 程序能够在使用 AMD 的另一台 Windows 机器上运行呢?
如果 C 代码可以在任何 CPU 上运行,那么虚拟机的目的是什么? 是否可以在任何操作系统上运行相同的代码? 我知道 Java 在几乎所有操作系统上都有 VM 版本,但是除了 Windows 之外,还有适用于其他操作系统的 CLR 吗?
还是我还缺少其他东西? 操作系统是否对其运行的汇编代码进行其他解释以使其适应特定的 CPU 或其他内容?
我很好奇这一切是如何工作的,所以一个清晰的解释将不胜感激。
注意:我之所以没有将查询作为 JVM vs. CLR 问题中的评论发布,是因为我还没有足够的积分来发布评论 =b。
编辑:感谢所有精彩的答案! 因此,我似乎缺少的是,尽管所有处理器都有差异,但有一个共同的标准化,主要是 X86 架构,它提供了足够多的通用功能集,以便在一个 X86 处理器上编译的 C 代码在大多数情况下都可以工作在另一个 X86 处理器上。 这进一步证明了虚拟机的合理性,更不用说我忘记了垃圾收集的重要性。
I was reading this question to find out the differences between the Java Virtual Machine and the .NET CLR and Benji's answer got me wondering why Virtual Machines are necessary in the first place.
From my understanding of Benji's explanation, the JIT compiler of a Virtual Machine interprets the intermediate code into the actual assembly code that runs on the CPU. The reason it has to do this is because CPUs often have different numbers of registers and according to Benji, "some registers are special-purpose, and each instruction expects its operands in different registers." This makes sense then that there is a need for an intermediary interpreter like the Virtual Machine so that the same code can be run on any CPU.
But, if that's the case, then what I don't understand is why C or C++ code compiled into machine code is able to run on any computer as long as it is the correct OS. Why then would a C program I compiled on my Windows machine using a Pentium be able to run on my other Windows machine using an AMD?
If C code can run on any CPU then what is the purpose of the Virtual Machine? Is it so that the same code can be run on any OS? I know Java has VM versions on pretty much any OS but is there a CLR for other OS's besides Windows?
Or is there something else I'm missing? Does the OS do some other interpretation of assembly code it runs to adapt it to the particular CPU or something?
I'm quite curious about how all this works, so a clear explanation would be greatly appreciated.
Note: The reason I didn't just post my queries as comments in the JVM vs. CLR question is because I don't have enough points to post comments yet =b.
Edit: Thanks for all the great answers! So it seems what I was missing was that although all processors have differences there is a common standardization, primarily the X86 architecture, which provides a large enough set of common features so that the C code compiled on one X86 processor will work for the most part on another X86 processor. This furthers the justification for Virtual Machines, not to mention I forgot about the importance of garbage collection.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
AMD和intel处理器使用相同的指令集和机器架构(从机器代码执行的角度来看)。
C 和 C++ 编译器编译为机器代码,并带有适合其目标操作系统的标头。 一旦编译,它们就不再以任何方式、形状或形式与编译它们的语言相关联,而只是二进制可执行文件。 (有些工件可能会显示它是从什么语言编译的,但这不是这里的重点)
因此,一旦编译,它们就会与机器(X86,intel 和 amd 指令集和架构)和操作系统相关联。
这就是为什么它们可以在任何兼容的 x86 机器和任何兼容的操作系统(对于某些软件来说是 win95 到 winvista)上运行。
但是,它们无法在 OSX 计算机上运行,即使它在英特尔处理器上运行 - 二进制文件也不兼容,除非您运行其他模拟软件(例如并行软件或带有 Windows 的虚拟机)。
除此之外,如果您想在 ARM 处理器、MIPS 或 PowerPC 上运行它们,那么您必须运行完整的机器指令集模拟器,它将 X86 的二进制机器代码解释为您运行它的任何机器。
与 .NET 进行对比。
.NET 虚拟机的构造就好像世界上有更好的处理器一样——能够理解对象、内存分配和垃圾收集以及其他高级构造的处理器。 这是一台非常复杂的机器,现在不能直接在硅中构建(具有良好的性能),但可以编写一个模拟器,使其可以在任何现有的处理器上运行。
突然之间,您可以为任何想要运行 .NET 的处理器编写一个特定于机器的模拟器,然后任何 .NET 程序都可以在其上运行。 无需担心操作系统或底层 CPU 架构 - 如果有 .NET VM,则软件将运行。
但是让我们更进一步 - 一旦有了这种通用语言,为什么不制作将任何其他书面语言转换为它的编译器呢?
因此,现在您可以拥有 C、C#、C++、Java、javascript、Basic、python、lua 或任何其他语言编译器来转换编写的代码,以便它可以在此虚拟机上运行。
你已经将机器与语言分离了 2 度,并且不需要太多的工作,你就可以让任何人编写任何代码并让它在任何机器上运行,只要存在编译器和 VM 来映射这两个分离度。
如果您仍然想知道为什么这是一件好事,请考虑一下早期的 DOS 机器,以及 Microsoft 对世界的真正贡献是什么:
Autocad 必须为它们可以打印的每台打印机编写驱动程序。 莲花1-2-3也是如此。 事实上,如果您希望软件能够打印,您必须编写自己的驱动程序。 如果有 10 台打印机和 10 个程序,则必须分别独立地编写 100 段基本相同的不同代码。
Windows 3.1 试图实现的目标(以及 GEM 和许多其他抽象层)是让打印机制造商为他们的打印机编写一个驱动程序,而程序员为 Windows 打印机类编写一个驱动程序。
现在有 10 个程序和 10 台打印机,只需编写 20 段代码,并且由于微软方面的代码对每个人都是相同的,那么来自微软的示例意味着您几乎不需要做任何工作。
现在,程序不仅限于他们选择支持的 10 台打印机,还包括制造商在 Windows 中提供驱动程序的所有打印机。
应用程序开发中也出现同样的问题。 有一些非常好的应用程序我无法使用,因为我不使用 MAC。 有大量的重复(我们真正需要多少世界级的文字处理器?)。
Java 旨在解决这个问题,但它有很多限制,其中一些还没有真正解决。
.NET 更接近,但没有人为 Windows 以外的平台开发世界一流的 VM(mono 非常接近......但还不够)。
所以...这就是我们需要虚拟机的原因。 因为我不想仅仅因为他们选择了与我不同的操作系统/机器组合而将自己限制在较小的受众中。
-亚当
The AMD and intel processors use the same instruction set and machine architecture (from the standpoint of execution of machine code).
C and C++ compilers compile to machine code, with headers appropriate to the OS they are targeted at. Once compiled they cease to associate in any way, shape, or form with the language they were compiled in and are merely binary executables. (there are artifacts taht may show what language it was compiled from, but that isn't the point here)
So once compiled, they are associated to the machine (X86, the intel and amd instruction set and architecture) and the OS.
This is why they can run on any compatible x86 machine, and any compatible OS (win95 through winvista, for some software).
However, they cannot run on an OSX machine, even if it's running on an intel processor - the binary isn't compatible unless you run additional emulation software (such as parallels, or a VM with windows).
Beyond that, if you want to run them on an ARM processor, or MIPS, or PowerPC, then you have to run a full machine instruction set emulator that interprets the binary machine code from X86 into whatever machine you're running it on.
Contrast that with .NET.
The .NET virtual machine is fabricated as though there were much better processors out in the world - processors that understand objects, memory allocation and garbage collection, and other high level constructs. It's a very complex machine and can't be built directly in silicon now (with good performance) but an emulator can be written that will allow it to run on any existing processor.
Suddenly you can write one machine specific emulator for any processor you want to run .NET on, and then ANY .NET program can run on it. No need to worry about the OS or the underlying CPU architecture - if there's a .NET VM, then the software will run.
But let's go a bit further - once you have this common language, why not make compilers that convert any other written language into it?
So now you can have a C, C#, C++, Java, javascript, Basic, python, lua, or any other language compiler that converts written code so it'll run on this virtual machine.
You've disassociated the machine from the language by 2 degrees, and with not too much work you enable anyone to write any code and have it run on any machine, as long as a compiler and a VM exists to map the two degrees of separation.
If you're still wondering why this is a good thing, consider early DOS machines, and what Microsoft's real contribution to the world was:
Autocad had to write drivers for each printer they could print to. So did lotus 1-2-3. In fact, if you wanted your software to print, you had to write your own drivers. If there were 10 printers, and 10 programs, then 100 different pieces of essentially the same code had to be written separately and independently.
What windows 3.1 tried to accomplish (along with GEM, and so many other abstraction layers) is make it so the printer manufacturer wrote one driver for their printer, and the programmer wrote one driver for the windows printer class.
Now with 10 programs and 10 printers, only 20 pieces of code have to be written, and since the microsoft side of the code was the same for everyone, then examples from MS meant that you had very little work to do.
Now a program wasn't restricted to just the 10 printers they chose to support, but all the printers whose manufacturers provided drivers for in windows.
The same issue is occurring in application development. There are really neat applications I can't use because I don't use a MAC. There is a ton of duplication (how many world class word processors do we really need?).
Java was meant to fix this, but it had many limitations, some of which aren't really solved.
.NET is closer, but no one is developing world-class VMs for platforms other than Windows (mono is so close... and yet not quite there).
So... That's why we need VMs. Because I don't want to limit myself to a smaller audience simply because they chose an OS/machine combination different from my own.
-Adam
您认为 C 代码可以在任何处理器上运行的假设是不正确的。 诸如寄存器和字节顺序之类的东西会使编译后的 C 程序在一个平台上根本无法工作,但可能在另一个平台上工作。
但是,处理器之间存在某些相似之处,例如,Intel x86 处理器和 AMD 处理器共享一组足够大的属性,因此针对一个处理器编译的大多数代码都可以在另一个处理器上运行。 但是,如果您想使用特定于处理器的属性,那么您需要一个编译器或一组库来为您执行此操作。
至于为什么您需要虚拟机,除了它会为您处理处理器差异之外,还有一个事实是,虚拟机为当今用 C++ 编译的程序(非托管)所无法提供的代码提供服务。
提供的最突出的服务是垃圾收集,由 CLR 和 JVM 提供。 这两个虚拟机都免费为您提供这项服务。 他们为你管理内存。
还提供了诸如边界检查、访问违规(虽然仍然可能,但非常困难)之类的功能。
CLR 还为您提供一种形式的代码安全性。
这些都不是作为许多不与虚拟机一起运行的其他语言的基本运行时环境的一部分提供的。
您可能通过使用库来获得其中一些服务,但这会迫使您采用库的使用模式,而在 .NET 和 Java 中,通过 CLR 和 JVM 向您提供的服务在访问方面是一致的。
Your assumption that C code can run on any processor is incorrect. There are things like registers and endianness which will make compiled C programs not work at all on one platform, while it might work on another.
However, there are certain similarities that processors share, for example, Intel x86 processors and AMD processors share a large enough set of properties that most code compiled against one will run on the other. However, if you want to use processor-specific properties, then you need a compiler or set of libraries which will do that for you.
As for why you would want a virtual machine, beyond the statement that it will handle differences in processors for you, there is also the fact that virtual machines offer services to code that are not available to programs compiled in C++ (not managed) today.
The most prominent service offered is garbage collection, offered by the CLR and the JVM. Both of these virtual machines offer you this service for free. They manage the memory for you.
Things like bounds checking, access violations (while still possible, they are extremely difficult) are also offered.
The CLR also offers a form of code security for you.
None of these are offered as part of the basic runtime environment for a number of other languages which don't operate with a virtual machine.
You might get some of them by using libraries, but then that forces you into a pattern of use with the library, whereas in .NET and Java services that are offered to you through the CLR and JVM are consistent in their access.
大多数编译器,甚至本机代码编译器,都使用某种中间语言。
这样做主要是为了减少编译器的构建成本。 世界上有很多(N)种编程语言。 世界上也有很多(M)个硬件平台。 如果编译器在不使用中间语言的情况下工作,则需要编写以支持所有硬件平台上的所有语言的“编译器”总数将是 N*M。
然而,通过定义一种中间语言,并将编译器分为前端和后端两部分,前端将源代码编译成IL,后端将IL编译成机器代码,你就可以摆脱只写的问题了。 N+M 个编译器。 这最终节省了巨大的成本。
CLR/JVM 编译器和本机代码编译器之间的最大区别在于前端和后端编译器相互链接的方式。 在本机代码编译器中,这两个组件通常组合成同一个可执行文件,并且当程序员在 IDE 中点击“构建”时,两个组件都会运行。
使用CLR / JVM编译器,前端和后端在不同的时间运行。 前端在编译时运行,生成实际交付给客户的 IL。 然后,后端体现在运行时调用的单独组件中。
因此,这就提出了另一个问题:“将后端编译延迟到运行时有什么好处”?
答案是:“这取决于”。
通过将后端编译延迟到运行时,可以交付一组可以在多个硬件平台上运行的二进制文件。 它还使程序能够利用后端编译技术的改进而无需重新部署。 它还可以为高效实现许多动态语言功能提供基础。 最后,它提供了在单独编译的动态链接库 (dll) 之间引入安全性和可靠性约束的能力,这是通过预先机器代码编译不可能实现的。
然而,也有缺点。 实施广泛的编译器优化所需的分析可能非常昂贵。 这意味着“JIT”后端通常会比前端后端进行更少的优化。 这可能会损害性能。 此外,在运行时调用编译器的需要也增加了加载程序所需的时间。 使用“预先”编译器生成的程序不存在这些问题。
Most compilers, even native code compilers, use some sort of intermediate language.
This is mainly done to reduce compiler construction costs. There are many (N) programing languages in the world. There are also many (M) hard ware platforms in the world. If compilers worked without using an intermediate language, the total number of "compilers" that would need to be written to support all languages on all hardware platforms would be N*M.
However, by defining an intermediate language and breaking a compiler up into 2 parts, a front end and a back end, with the front end compiling source code into IL and the back end compiling IL into machine code, you can get away with writing only N+M compilers. This ends up being a huge cost savings.
The big difference between CLR / JVM compilers and native code compilers is the way the front end and the back end compilers are linked to each other. In a native code compiler the two components are usually combined into the same executable, and both are run when the programmer hits "build" in the IDE.
With CLR / JVM compilers, the front end and the back end are run at different times. The front end is run at compile time, producing IL that is actually shipped to customers. The back end is then embodied in a separate component that is invoked at runtime.
So, this brings up the alternate question, "What are the benefits of delaying back end compilation until runtime"?
The answer is: "It depends".
By delaying back end compilation until runtime, it becomes possible to ship one set of binaries that can run on multiple hardware platforms. It also makes it possible for programs to take advantage of improvements in the back end compilation technology without being redeployed. It can also provide a foundation for efficiently implementing many dynamic language features. Finally, it offers the ability to introduce security and reliability constraints between separately compiled, dynamically linked libraries (dlls) that is not possible with upfront machine code compilation.
However, there are also draw backs. The analysis necessary to implement extensive compiler optimizations can be expensive. This means that "JIT" back ends will often do less optimizations than upfront backends do. This can hurt performance. Also, the need to invoke the compiler at runtime also increases the time necessary to load programs. Programs generated with "upfront" compilers don't have those problems.
从本质上讲,它允许“托管代码”,这正是它所说的——虚拟机在运行时管理代码。 这样做的三个主要好处是即时编译、托管指针/垃圾收集和安全控制。
对于即时编译,虚拟机会监视代码的执行,因此当代码运行更频繁时,它会被重新优化以运行得更快。 您无法使用本机代码执行此操作。
托管指针也更容易优化,因为虚拟机会跟踪它们,并根据它们的大小和生命周期以不同的方式管理它们。 在 C++ 中很难做到这一点,因为仅仅阅读代码你无法真正知道指针将去往何处。
安全性是不言自明的,虚拟机会阻止代码做它不应该做的事情,因为它正在监视。 我个人认为这可能是 Microsoft 选择 C# 托管代码的最大原因。
基本上我的观点是,因为虚拟机可以在代码发生时监视代码,所以它可以做一些事情,使程序员的生活更轻松,并使代码更快。
Essentially it allows for 'managed code', which means exactly what it says - the virtual machine manages the code as it runs. Three main benefits of this are just-in-time compilation, managed pointers/garbage collection, and security control.
For the just-in-time compilation one, the virtual machine watches the code execute and so as the code is run more often, it is reoptimised to run faster. You can't do this with native code.
Managed pointers are also easier to optimise because the virtual machine tracks them as they go around, managing them in different ways depending on their size and lifetime. It's difficult to do this in C++ because you can't really tell where a pointer is going to go just reading the code.
Security is a self-explanatory one, the virtual machine stops the code from doing things it shouldn't because it's watching. Personally I think that's probably the biggest reason why Microsoft chose managed code for C#.
Basically my point is, because the virtual machine can watch the code as it happens, it can do things which make life easier on the programmer and make the code faster.
首先,机器代码并不是CPU指令的最低形式。 今天的 x86 CPU 本身使用微代码将 X86 指令集解释为另一种内部格式。 唯一真正编写微代码的人是芯片开发工程师类型,他们忠实而轻松地模拟传统的 x86 指令芯片,以利用当今的技术实现最大性能。
由于其带来的功能和特性,开发人员类型总是添加额外的抽象层。 毕竟,更好的抽象可以让新应用程序的编写更加快速、可靠。 企业并不关心他们编码的内容或方式,他们只是希望可靠、快速地完成工作。 如果应用程序的 C 版本减少了几毫秒,但开发时间却增加了一倍,这真的很重要吗?
速度问题几乎是一个不言而喻的问题,因为许多为数百万人服务的企业应用程序都是用 java 等平台/语言编写的 - 例如 GMail、GMaps。 忘记哪种语言/平台最快。 更重要的是你使用正确的算法并编写有效的代码并完成工作。
Firstly machine code is not the lowest form of instructions for a cpu. Todays x86 CPUS themselves interpret the X86 instruction set into another internal format using microcode. The only people who actually program microcode are the chip developer engineer types, who faithfully and painless emulate the legacy x86 instruction chip to achieve maximum performance using todays technologies.
Developer types have always been adding additional layers of abstractions because of the power and features that they bring. After all better abstractions allow new applications to be written more quickly and reliably. Businesses dont care about what or how they code looks like they just want the job done reliably and quickly. Does it really matter if the C version of an application takes a few milliseconds less but ends up taking double the time to develop ?
The speed question is almost a non argument as many enterprise applications that serve millions of people are written in platforms/languages like java - eg GMail, GMaps. Forget about which language/platform is fastest. Whats more important is that you use the correct algorithms and write effecient code and get the job done.
AMD和Intel处理器都有x86架构,如果你想在不同的架构上运行c/c++程序,你必须使用该架构的编译器,相同的二进制可执行文件不会在不同的处理器架构上运行。
AMD and Intel processors both have x86 architecture, if you want to run c/c++ program on a different architecture you have to use a compiler for that architecture, the same binary executable won't run across different processor architectures.
单声道
Mono
以一种非常简单的方式,这是因为 Intel 和 AMD 实现了相同的汇编语言,具有相同数量的寄存器等等......
所以你的 C 编译器编译代码以在 Linux 上工作。 该程序集使用 Linux ABI,因此只要编译程序在 Linux 上运行,在 x86 汇编上,以及正确的函数签名,那么一切都很顺利。
现在尝试获取已编译的代码,并将其粘贴到Linux/PPC(例如旧iBook 上的Linux)上。 那是行不通的。 作为Java程序,因为JVM已经在Linux/PPC平台上实现了。
如今的汇编语言基本上是程序员可以编程的另一种接口。 x86(32 位)允许您访问通用整数寄存器的 eax、ebx、ecx、edx,以及浮点寄存器的 f00-f07。 在幕后,CPU 实际上还有数百个寄存器,并将这些东西混在一起以压榨性能。
In a very simplified way, that's because Intel and AMD implements the same assembly language, with the same number of register, etc etc...
So your C compiler compiles code to work on Linux. That assembly is using a Linux ABI, so as long as the compile program is being run on Linux, on x86 assembly, and the right function signature, then all is dandy.
Now try taking that compiled code, and stick it on, say Linux/PPC (e.g. Linux on an old iBook). That isn't going to work. Where as a Java program would because the JVM has been implemented on the Linux/PPC platform.
Assembly langauge nowadays is basically another interface that a programmer can program to. x86 (32-bit) allows you to access eax,ebx,ecx,edx for general purpose integer registers, and f00-f07 for floating point. Behind the scenes, the CPU actually has hundred more registers, and jumbled that stuff around to squeeze the performance out.
您的分析是正确的,java 或 C# 可以设计为直接编译以在任何机器上运行,如果这样做的话可能会更快。 但虚拟机方法可以完全控制代码运行的环境,虚拟机创建一个安全沙箱,只允许具有正确安全访问权限的命令执行潜在的破坏性代码 - 例如更改密码或更新 HD 引导扇区。 还有很多其他好处,但这才是致命的原因。 你无法在 C# 中获得 StackOverflow ...
You're right in your analysis, java or C# could have been designed to compile direct to run on any machine, and would probably be faster if they did that. But the virtual machine approach gives complete control of the environment in which your code runs, the VM creates a secure sandbox that only allows commands with the right security access to perform potentially damaging code - like changing password, or updating an HD bootsector. There are many other benefits, but that's the killer reason. You can't get a StackOverflow in C# ...
我认为你的问题的前提是有效的——你当然不是第一个问这个问题的人。 因此,请查看 http://llvm.org 以查看替代方法(现在正在运行一个项目?或赞助)由苹果公司提供)
I think the premise of your question is valid - you're certainly not the first to ask this question. So check out http://llvm.org to see an alternative approach (which is now a project being run? or sponsored by Apple)