托管 C++ 的性能与非托管/本机 C++
我正在编写一个非常高性能的应用程序,每毫秒处理和处理数百个事件。
非托管 C++ 比托管 C++ 更快吗?为什么?
托管 C++ 处理 CLR 而不是操作系统,并且 CLR 负责内存管理,这简化了代码,并且可能比“程序员”在非托管 C++ 中编写的代码更高效?还是有其他原因? 使用托管时,如果动态内存分配对程序员来说是透明的并由 CLR 处理,那么如何避免动态内存分配(这会导致性能下降)?
回到我的问题,托管 C++ 在速度方面比非托管 C++ 更高效吗?为什么?
I am writing a very high performance application that handles and processes hundreds of events every millisecond.
Is Unmanaged C++ faster than managed c++? and why?
Managed C++ deals with CLR instead of OS and CLR takes care of memory management, which simplifies the code and is probably also more efficient than code written by "a programmer" in unmanaged C++? or there is some other reason?
When using managed, how can one then avoid dynamic memory allocation, which causes a performance hit, if it is all transparent to the programmer and handled by CLR?
So coming back to my question, Is managed C++ more efficient in terms of speed than unmanaged C++ and why?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
对此没有一个答案。作为一条真正的一般规则,本机代码通常会更快,但 1)情况并非总是如此,2)有时差异太小而无需关心,3)代码编写得如何通常会比托管与非托管产生更大的差异。
托管代码在虚拟机中运行。基本上,您从生成字节码作为输出的编译器开始,然后将其提供给虚拟机。然后,虚拟机将其重新编译为机器代码并执行。在某些情况下,这可以提供一些真正的优势。举个例子,如果您有一个运行 64 位 VM 的 64 位处理器(几乎已经是给定的了),但在 64 位处理器普及之前编写的旧程序,VM 仍会将该字节代码编译为 64-位机器代码,至少对于某些代码来说可以提供相当大的速度优势。
同时,对于某些代码来说,这也可能是一个相当明显的缺点。特别是,编译器在用户等待时运行。为了适应这一点,虚拟机的编译器本身不能运行得很慢。尽管本机代码生成器有所不同,但无论您选择哪种本机编译器,都很有可能至少包含虚拟机字节码编译器中放弃的一些优化,以保持其资源使用合理。
VM 还使用垃圾收集器。垃圾收集器与手动管理内存具有相当不同的特征。对于许多手动管理器来说,分配内存相当昂贵。释放内存相当便宜,但与释放的项目数量大致呈线性关系。其他手动管理器大致相反,在释放内存时做额外的工作以使分配更快。无论哪种方式,成本结构都与典型的收集器不同。
使用垃圾收集器,分配内存通常非常便宜。对于典型的(复制)收集器,释放内存的成本主要取决于已分配且仍在(至少可能)使用的对象数量。
但分配本身也有所不同。在本机 C++ 中,您通常在堆栈上创建大多数对象,其中分配和释放内存都非常便宜。在托管代码中,您通常会动态分配更大比例的内存,并在其中进行垃圾收集。
There is no one answer to this. As a really general rule, native code will usually be faster, but 1) that's not always the case, 2) sometimes the difference is too small to care about, and 3) how well the code is written will usually make more difference than managed vs. unmanaged.
Managed code runs in a virtual machine. Basically, you start with a compiler that produces byte codes as output, then feed that to the virtual machine. The virtual machine then re-compiles it to machine code and executes that. This can provide some real advantages under some circumstances. For one example, if you have a 64-bit processor running a 64-bit VM (pretty nearly a given any more) but and old program written before 64-bit processors were common, the VM will still compile that byte code to 64-bit machine code, which can give quite a substantial speed advantage for at least some code.
At the same time, it can also be a fairly noticeable disadvantage for some code. In particular, the compiler is running while the user waits. To accommodate that, the VM's compiler can't itself run very slowly. Although native code generators differ, there's a pretty fair chance that whatever native compiler you choose includes at least a few optimizations that were foregone in the VM's bytecode compiler to keep its resource usage reasonable.
The VM also uses a garbage collector. Garbage collectors have rather different characteristics from manually managing memory. With many manual managers, allocating memory is fairly expensive. Releasing memory is fairly cheap, but roughly linear on the number of items you release. Other manual managers roughly reverse that, doing extra work when freeing memory in order to make allocation faster. Either way, the cost structure is different from a typical collector.
With a garbage collector, allocating memory is typically very cheap. With a typical (copying) collector, the cost of releasing memory depends primarily upon the number of objects that have been allocated and are still (at least potentially) in use.
The allocations themselves also differ though. In native C++, you typically create most objects on the stack, where both allocating and releasing memory is extremely cheap. In managed code, you typically allocate a much larger percentage of memory dynamically, where it's garbage collected.
这一切都取决于具体情况。
使非托管代码更快/托管代码更慢的事情:
使托管代码更快/非托管代码更慢的事情:
可能还有更多原因。
It all depends on the situation.
Things that make unmanaged code faster / managed code slower:
Things that make managed code faster / unmanaged code slower:
And probably there are many more reasons.
你可以用任何语言编写慢速代码;相反,您可以使用几乎任何语言都可能很快的不错的算法。
这里常见的答案是选择一种您已经了解的语言,使用适当的算法,然后对其进行分析以确定实际的热点。
我有点担心每毫秒数百个事件的声明。这是一个非常多的数字。您是否能够以任何语言进行您期望的处理?
作为高性能系统上的 C++ 开发人员,我倾向于相信自己分析和优化所发出代码的能力。也就是说;有非常高性能的 .NET 应用程序,其中作者竭尽全力不在关键循环内进行动态内存分配 - 主要是通过使用预先创建的对象分配池。
因此,重复我之前的评论:选择你已经知道的内容,然后进行调整。即使你走进了死胡同;您可能会更多地了解您的问题空间。
You can write slow code in any language; conversely, you can use decent algorithms that may well be fast is almost any language.
The common answer here would be to pick a language that you already know, use appropriate algorithms, then profile the heck out of it to determine the actual hot spots.
I am somewhat concerned about the hundreds of events every millisecond statement. That's an awful lot. Are you reasonably going to be able to do the processing you expect in any language?
As a C++ developer on high-performance systems, I tend to trust my ability to profile and optimize the emitted code. That said; there are very high performance .NET applications, where the writer has gone to great lengths to not do dynamic memory allocation inside the critical loops - mostly by using allocated pools of objects created beforehand.
So to repeat my previous comment: pick what you already know, then tune. Even if you hit a dead end; you will likely know much more about your problem space.
在大多数情况下,托管代码比非托管代码慢,尽管 .Net CLR 在执行代码之前始终执行 JIT 编译(程序运行时不会多次编译,但它永远不会解释代码)。
问题在于 CLR 所做的许多检查,例如,每当您尝试访问数组时,看看是否超出了数组的范围。这会减少缓冲区溢出等问题,但也意味着由于这些检查的额外开销而导致性能下降。
我见过 C# 优于 C++ 的实验,但这些实验是通过充分利用对象层次结构等的代码进行的。当涉及到数字运算并且您想充分利用您的 PC 时,您将不得不使用非托管代码。
另一点也已经提到过——当必须释放内存时,GC 会导致程序执行中出现一些不可预测的暂停。在非托管代码中进行内存管理时,您也需要这个时间,但它会更频繁地发生,每当您决定销毁一个对象时,这意味着整个程序不会立即完成所有操作,因此您不会有长时间的停顿。
Managed code is in most cases slower than Unmanaged code, even though the .Net CLR always does a JIT-compilation before executing the code (it is not compiled multiple times while the program is running but it well never interpret the code).
The problem is rather with many checks the CLR does, e.g. to see if you run over the bounds of an array whenever you try to access it. This leads to fewer problems with buffer overflows, etc. but also means a performance hit due to the added overhead of those checks.
I've seen experiments where C# outperformed C++ but those were conducted with code taking heavily advantage of object hierarchies, etc. When it comes down to number crunching and you want to get the most out of your PC you will have to go with unmanaged code.
Another point was also already mentioned - the GC leads to somewhat unpredictable pauses in the programs execution when memory must be freed. You need this time as well when doing memory management in unmanaged code but it occurs more often and whenever you decide to destroy an object which means its not all done at once for the whole program, so you don't have a long pause.
这里有很多好的答案,但从长远来看,托管代码可能具有优势的一个方面是运行时分析。由于托管编译器生成的代码是中间格式,因此可以根据实际使用情况对实际执行的机器代码进行优化。如果某个特定的功能子集被大量使用,JIT 人员可以将机器代码全部本地化到同一内存页上,从而增加局部性。如果从特定方法重复进行特定子调用,JIT 人员可以动态内联它。
这是对非托管代码的改进,在非托管代码中,内联必须提前“猜测”,过度内联是有害的,因为它会使代码大小膨胀并导致局部性问题,从而导致(非常耗时的)L2/L1 缓存未命中。该信息根本无法用于静态分析,因此只能在 JIT 环境中进行。运行时分析有很多可能的好处,例如优化循环展开等。
我并不是说 .NET JIT'er 尽可能聪明,但我知道我听说过全局分析功能,并且我众所周知,惠普和其他公司已经对运行时分析进行了大量研究。
There are many good answers here, but one aspect of managed code that may give it an advantage in the long term is runtime analysis. Since the code generated by the managed compiler is an intermediate format, the machine code that actually executes can be optimized based on actual usage. If a particular subset of functionality is heavily used, the JIT'er can localize the machine code all on the same memory page, increasing locality. If a particular sub-call is made repeatedly from a particular method, a JIT'er can dynamically inline it.
This is an improvement over unmanaged code, where inlining must be "guessed" ahead of time, and excessive inlining is harmful because it bloats code size and causes locality issues that cause (very time-expensive) L2/L1 cache misses. That information is simply not available to static analysis, so it is only possible in a JIT'ing environment. There's a goody basket of possible wins from runtime analysis such as optimized loop unwinding, etc.
I'm not claiming the .NET JIT'er is as smart as it could be, but I know I've heard about global analysis features and I know a lot of research into runtime analysis has been done at Hewlett-Packard and other companies.
首先,您的声明“每毫秒处理数百个事件”。听起来很不现实。除非你的计算机中有专门设计的时钟模块,否则我不认为你可以用通用的PC来实现这个目标(典型的分辨率约为10毫秒)。其次,Native C++ 在性能方面要好得多。在 C++ 方面可以采取很多优化来加速,而在托管代码中则不可能。另请注意,托管代码中的垃圾收集会使性能变得不可预测 - 当 GC 启动时,整个进程都会被冻结。一旦遇到问题,解决方案就更加痛苦,现在托管代码提供的所有“漂亮风格”都将消失。
至于管理代码可以针对 CPU 进行优化的能力,这是事实,但您也可以在本机 C++ 中利用 CPU 功能(SSE2、MMX 等)。根据我的经验,性能提升可以忽略不计。
First, your statement "processes hundreds of events every millisecond." sounds quite unrealistic. Unless you have a special designed clock module in the computer, I don not think that you can achieve the goal with a generic PC (typical resolution is around 10 milliseconds). Secondly, Native C++ is vast better in terms of performance. There are a lot of optimization can be taken in term of C++ to speed up, while in managed code they are not possible. Also be aware that the garbage collection in managed code makes performance unpredictable -when GC fires up the whole process gets frozen. Once you run into the problem, the solution is more painful, now all the "nice style" offered by managed code will be gone.
As for the ability that manage code can optimize for CPU, it is true but you can take advantage of CPU features (SSE2, MMX etc.) in native C++ too. Based on my experience, the performance boost is negligible.
编写快速代码,始终是一种痛苦。您可以仅针对一个平台进行优化的主要问题。这实际上是控制台、嵌入式或其他硬件始终相同的平台上的情况。在现实的 PC 世界中,情况并非如此。不同的核心、不同的指令ecc……让这成为一场噩梦。恕我直言,这是主要问题,它确实使 man/unam 代码之间产生了差异。男人。代码在运行时可以针对新平台进行乐观优化。无人代码不是,是写进石头里的。
Write fast code, is always a pain. The main issue that you can optimize just for one platform. This is really a case on Console, Embedded or other platform where Hardware is always the same. In real PC world this isnt the case. Different core, different istruction ecc ... make this a nightmare. This is the main issue, imho, that really make difference between man/unam code. Man. code can be optimistic optimizable for the new platform when its run. Unman code not, is written into the stone.
按照速度和功率的顺序,asm > C> C++ >= C++/CLI > C# >= 所有其他。但在 asm 中创建 Web 服务是一个长期的痛苦。然后,在给定的时间内,使用正确的语言完成正确的工作,并使用正确的受众来完成最好的工作。
In order of speed and power the asm > C > C++ >= C++/CLI > C# >= all others. But creating a web service in asm is a long pain. Then use the right langage for the right job and the right audience to do the best job, in the given time.
C++/CLI 不是像 Java 一样是半解释语言吗?
另外,昨天不是有人发布了一项研究表明 GC 系统总是比非 GC 慢吗?
Isn't C++/CLI a half interpreted language like Java?
Also, didn't someone post a study just yesterday that showed that GC systems are always slower than non GC?