C++ 的性能高频金融中的虚拟机语言

发布于 2024-09-08 14:29:18 字数 964 浏览 4 评论 0原文

我认为 C/C++ 与 C#/Java 性能问题已经被广泛讨论,这意味着我已经阅读了足够的证据来表明 VM 语言不一定比“接近硅”的语言慢。主要是因为 JIT 编译器可以进行静态编译语言无法进行的优化。

然而,我最近收到了一个人的简历,他声称基于 Java 的高频交易总是被 C++ 击败,而且他也曾遇到过这样的情况。

快速浏览求职网站确实表明 HFT 申请人需要 C++ 知识,并且查看 Wilmott 论坛可以了解所有从业者谈论C++。

造成这种情况有什么特殊原因吗?我本以为现代金融业务有些复杂,具有类型安全、托管内存和丰富库的 VM 语言将是首选。这样生产率就更高。另外,JIT 编译器变得越来越好。他们可以在程序运行时进行优化,因此您会认为他们使用该运行时信息来击败非托管程序的性能。

也许这些人正在用 C++ 编写关键部分并从托管环境(P/Invoke 等)调用它们?这可能吗?

最后,有人对这个中心问题有经验吗?这就是为什么在这个领域中,非托管代码无疑比托管代码更受青睐?

据我所知,高频交易人员需要尽快对传入的市场数据做出反应,但这不一定是硬实时 要求。如果你的速度很慢,你的情况会更糟,这是肯定的,但你不需要保证每个响应都有一定的速度,你只需要一个快速的平均值。

编辑

是的,到目前为止有几个很好的答案,但很笼统(众所周知的基础)。让我具体说明高频交易人员将运行什么样的程序。

主要标准是响应能力。当订单进入市场时,您希望成为第一个能够对其做出反应的人。如果你迟到了,其他人可能会在你之前完成,但每个公司的策略都略有不同,所以如果一次迭代有点慢,你可能没问题。

该程序全天运行,几乎没有用户干预。无论处理每条新的市场数据的函数是什么,每秒都会运行数十次(甚至数百次)。

这些公司通常对硬件的价格没有限制。

I thought the C/C++ vs C#/Java performance question was well trodden, meaning that I'd read enough evidence to suggest that the VM languages are not necessarily any slower than the "close-to-silicon" languages. Mostly because the JIT compiler can do optimizations that the statically compiled languages cannot.

However, I recently received a CV from a guy who claims that Java-based high frequency trading is always beaten by C++, and that he'd been in a situation where this was the case.

A quick browse on job sites indeed shows that HFT applicants need knowledge of C++, and a look at Wilmott forum shows all the practitioners talking about C++.

Is there any particular reason why this is the case? I would have thought that with modern financial business being somewhat complex, a VM language with type safety, managed memory, and a rich library would be preferred. Productivity is higher that way. Plus, JIT compilers are getting better and better. They can do optimizations as the program is running, so you'd think they's use that run-time info to beat the performance of the unmanaged program.

Perhaps these guys are writing the critical bits in C++ and and calling them from a managed environment (P/Invoke etc)? Is that possible?

Finally, does anyone have experience with the central question in this, which is why in this domain unmanaged code is without doubt preferred over managed?

As far as I can tell, the HFT guys need to react as fast as possible to incoming market data, but this is not necessarily a hard realtime requirement. You're worse off if you're slow, that's for sure, but you don't need to guarantee a certain speed on each response, you just need a fast average.

EDIT

Right, a couple of good answers thus far, but pretty general (well-trodden ground). Let me specify what kind of program HFT guys would be running.

The main criterion is responsiveness. When an order hits the market, you want to be the first to be able to react to it. If you're late, someone else might take it before you, but each firm has a slightly different strategy, so you might be OK if one iteration is a bit slow.

The program runs all day long, with almost no user intervention. Whatever function is handling each new piece of market data is run dozens (even hundreds) of times a second.

These firms generally have no limit as to how expensive the hardware is.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(15

各空 2024-09-15 14:29:18

首先,1毫秒对于高频交易来说是永恒的。如果您认为不是,那么最好多阅读一些有关该领域的内容。 (这就像距离交换机 100 英里一样。)正如任何基本排队论教科书上的公式都会告诉您的那样,吞吐量和延迟紧密地交织在一起。相同的公式将显示抖动值(如果网络结构正确并且您没有配置足够多的内核,则通常由 CPU 队列延迟的标准偏差主导)。

高频交易套利的问题之一是,一旦您决定捕获价差,就有两条腿(或更多腿)来实现利润。如果你未能击中所有的腿,你可能会留下一个你真正不想要的头寸(以及随后的损失)——毕竟你是在套利而不是投资。

除非您的策略能够预测(非常近期!!!)未来(无论您相信与否,这已经非常成功),否则您不需要头寸。如果您距离交易还有 1 毫秒,那么您的订单的很大一部分将不会被执行,您想要的订单将会被取消。最有可能的是那些只执行了一条腿的人最终会失败,或者至少不会盈利。

无论你的策略是什么,为了便于讨论,我们假设它最终的赢/输比为 55%/45%。即使盈亏比发生很小的变化,盈利能力也会发生很大的变化。

回复:“运行数十个(甚至数百个)”似乎偏离了数量级 即使查看每秒 20000 个报价点似乎也很低,尽管这可能是他所使用的工具集全天的平均值看着。

在任何给定的时间内观察到的速率都存在很大的变化。我举个例子。在我的一些测试中,我在中午查看了 7 只 OTC 股票(CSCO、GOOG、MSFT、EBAY、AAPL、INTC、DELL),该流的每秒速率范围可以从 0 mps(非常非常罕见)到每个峰值每秒几乎有 2000 笔报价和交易。 (看看为什么我认为上面的 20000 很低。)

我为这个领域构建基础设施和测量软件,我们谈论的数字是每秒 100000 和数百万。我有 C++ 生产者/消费者基础设施库,可以在生产者和消费者之间每秒推送近 5000000(500 万)条消息(32 位,2.4 GHz 内核)。这些是 64 字节消息,在生产者端具有new、construct、enqueue、synchronize,在消费者端具有synchronize、dequeue、touch every byte、run virtual destructor、free 。现在无可否认,这是一个简单的基准测试,没有 Socket IO(并且套接字 IO 可能很难看),就像端点管道阶段的端点一样。它是所有仅在空时同步的自定义同步类、自定义分配器、自定义无锁队列和列表、偶尔的 STL(带有自定义分配器),但更常见的是自定义侵入式集合(其中我有一个重要的库)。我不止一次地为这个领域的供应商提供四倍(甚至更多)的吞吐量,而无需增加套接字端点的批处理。

我有 OrderBook 和 OrderBook::Universe 类,当平均超过 22000 个工具时,新建、插入、查找、部分填充、查找、第二次填充、擦除、删除序列花费的时间不到 2us。该基准测试在插入第一次填充和最后一次填充之间连续迭代所有 22000 个乐器,因此不涉及廉价的缓存技巧。对同一本书的操作被22000本不同书籍的访问分开。这些非常不是真实数据的缓存特征。真实数据在时间上更加本地化,​​连续交易经常出现在同一本书中。

所有这些工作都涉及仔细考虑所用集合的任何算法成本中的常量和缓存特性。 (有时似乎 KO(n) KO(n*log n) 等中的 K 被忽略得有点太圆滑了)

我在 Marketdata 基础设施方面工作的事情。甚至考虑使用 java 或托管环境来完成这项工作都是不可想象的。当你可以用 C++ 获得这种性能时,我认为在托管环境中获得百万+/mps 的性能是相当困难的)我无法想象任何重要的投资银行或对冲基金(对于他们来说,250000 美元的薪水)一个顶尖的 C++ 程序员什么都不是)不使用 C++。

有人真的能从托管环境中获得 2000000+/mps 的性能吗?我认识这个领域的一些人,但没有人向我吹过这一点。我认为 2mm 在托管环境中会有一些吹嘘的权利。

据我所知,一位主要厂商的 FIX 顺序解码器每秒进行 12000000 场解码。 (3Ghz CPU)它是 C++,写它的人几乎挑战任何人想出一些东西
在受管理的环境中,速度甚至只有一半。

从技术上讲,这是一个有趣的领域,有很多有趣的性能挑战。考虑一下标的证券发生变化时的期权市场 - 可能会出现 6 个未平仓价格点,具有 3 或 4 个不同的到期日。现在每笔交易可能有 10-20 个报价。这些报价可以触发期权的价格变化。
因此,对于每笔交易,期权报价可能会有 100 或 200 次变化。这只是大量的数据——不是大型强子对撞机碰撞探测器那样的数据量,但仍然是一个挑战。这与处理击键有点不同。

甚至关于 FPGA 的争论也在继续。许多人认为,在 3GHZ 商品硬件上运行的编码良好的解析器可以击败 500MHz FPGA。但即使稍微慢一点(不是说慢),基于 FPGA 的系统也可能具有更严格的延迟分布。 (阅读“tend”——这不是一个笼统的声明)当然,如果你有一个很棒的 C++ 解析器,你可以通过 Cfront 推送它,然后通过 FPGA 图像生成器推送它......但这又是另一场争论......

Firstly, 1 ms is an eternity in HFT. If you think it is not then it would be good to do a bit more reading about the domain. (It is like being 100 miles away from the exchange.) Throughput and latency are deeply intertwined as the formulae in any elementary queuing theory textbook will tell you. The same formulae will show jitter values (frequently dominated by the standard deviation of CPU queue delay if the network fabric is right and you have not configured quite enough cores).

One of the problems with HFT arbitrage is that once you decide to capture a spread, there are two legs (or more) to realize the profit. If you fail to hit all legs you can be left with a position that you really don't want (and a subsequent loss) - after all you were arbitraging not investing.

You don't want positions unless your strategy is predicting the (VERY near term!!!) future (and this, believe it or not, is done VERY successfully). If you are 1 ms away from exchange then some significant fraction of your orders won't be executed and what you wanted will be picked off. Most likely the ones that have executed one leg will end up losers or at least not profitable.

Whatever your strategy is for argument's sake let us say it ends up a 55%/45% win/loss ratio. Even a small change in the win/loss ratio can have in big change in profitability.

re: "run dozens (even hundreds)" seems off by orders of magnitude Even looking at 20000 ticks a second seems low, though this might be the average for the entire day for the instrument set that he is looking at.

There is high variability in the rates seen in any given second. I will give an example. In some of my testing I look at 7 OTC stocks (CSCO,GOOG,MSFT,EBAY,AAPL,INTC,DELL) in the middle of the day the per second rates for this stream can range from 0 mps (very very rare) to almost almost 2000 quotes and trades per peak second. (see why I think the 20000 above is low.)

I build infrastructure and measurement software for this domain and the numbers we talk about are 100000's and millions per second. I have C++ producer/consumer infrastructure libraries that can push almost 5000000 (5 million) messages/second between producer and consumer, (32 bit,2.4 GHz cores). These are 64 byte messages with new, construct, enqueue, synchronize, on the producer side and synchronize,dequeue,touch every byte,run virtual destructor,free on the consumer side. Now admittedly that is a simple benchmark with no Socket IO (and socket IO can be ugly) as would be at the end points of the end point pipe stages. It is ALL custom synchronization classes that only synchronize when empty, custom allocators, custom lock free queues and lists, occasional STL(with custom allocators) but more often custom intrusive collections (of which I have a significant library). More than once I have given a vendor in this arena a quadruple (and more) in throughput without increased batching at the socket endpoints.

I have OrderBook and OrderBook::Universe classes that take less than 2us for new, insert, find, partial fill, find, second fill, erase, delete sequence when averaged over 22000 instruments. The benchmark iterates over all 22000 instruments serially between the insert first fill and last fill so there are no cheap caching tricks involved. Operations into the same book are separated by accesses of 22000 different books. These are very much NOT the caching characteristics of real data. Real data is much more localized in time and consecutive trades frequently hit the same book.

All of this work involves careful consideration of the constants and caching characteristics in any of the algorithmic costs of the collections used. (Sometimes it seems that the K's in KO(n) KO(n*log n) etc., etc., etc. are dismissed a bit too glibly)

I work on the Marketdata infrastructure side of things. It is inconceivable to even think of using java or a managed environment for this work. And when you can get this kind of performance with C++ and I think it is quite hard to get million+/mps performance with a managed environment) I can't imagine any of the significant investment banks or hedge funds (for whom a $250000 salary for a top notch C++ programmer is nothing) not going with C++.

Is anybody out there really getting 2000000+/mps performance out of a managed environment? I know a few people in this arena and no one ever bragged about it to me. And I think 2mm in a managed environment would have some bragging rights.

I know of one major player's FIX order decoder doing 12000000 field decodes/sec. (3Ghz CPU) It is C++ and the guy who wrote it almost challenged anybody to come up with something
in a managed environment that is even half that speed.

Technologically it is an interesting area with lots of fun performance challenges. Consider the options market when the underlying security changes - there might be say 6 outstanding price points with 3 or 4 different expiration dates. Now for each trade there were probably 10-20 quotes. Those quotes can trigger price changes in the options.
So for each trade there might be 100 or 200 changes in options quotes. It is just a ton of data - not a Large Hadron Collider collision-detector-like amount of data but still a bit of a challenge. It is a bit different than dealing with keystrokes.

Even the debate about FPGA's goes on. Many people take the position that a well coded parser running on 3GHZ commodity HW can beat a 500MHz FPGA. But even if a tiny bit slower (not saying they are) FPGA based systems can tend to have tighter delay distributions. (Read "tend" - this is not a blanket statement) Of course if you have a great C++ parser that you push through a Cfront and then push that through the FPGA image generator... But that another debate...

世态炎凉 2024-09-15 14:29:18

其中很多都归结为事实与理论之间的简单差异。人们已经提出了理论来解释为什么 Java 应该(或者至少可能) be) 比 C++ 更快。大多数争论与 Java 或 C++ 本身没有什么关系,而是与动态编译和静态编译有关,Java 和 C++ 实际上只不过是两者的示例(当然,有可能)静态编译 Java,或动态编译 C++)。这些人中的大多数都有基准来“证明”他们的主张。当对这些基准进行任何详细检查时,很快就会发现,在相当多的情况下,他们采取了相当极端的措施来获得他们想要的结果(例如,相当多的在编译 Java 时启用优化,但在编译 C++ 时禁用优化)。

将此与计算机语言基准测试游戏进行比较,其中几乎任何人都可以提交条目,因此所有代码都会被优化到合理的程度(在某些情况下,甚至是不合理的程度)。很明显,相当多的人将其视为本质上的竞争,每种语言的拥护者都尽力“证明”他们喜欢的语言是最好的。由于任何人都可以提交任何问题的实现,因此特别糟糕的提交对整体结果几乎没有影响。在这种情况下,C 和 C++ 成为明显的领导者。

更糟糕的是,这些结果可能比完全准确的情况更好地展示了 Java。特别是,使用 C 或 C++ 并且真正关心性能的人可以(并且经常会)使用英特尔的编译器而不是 g++。与 g++ 相比,这通常速度至少提高了 20%。

编辑(回应 jalf 提出的几点,但确实太长,无法合理地融入注释):

  1. 指针是优化器编写者的噩梦。这确实有点夸大了事情。指针可能会导致别名,从而在某些情况下阻止某些优化。也就是说,内联在很多时候可以防止不良影响(即,编译器可以检测是否存在别名,而不是总是在假设可能存在的情况下生成代码)。即使代码确实必须假设别名,缓存也会最大限度地减少这样做对性能的影响(即,L1 缓存中的数据仅比寄存器中的数据慢分钟)。防止别名会提高 C++ 的性能,但远没有您想象的那么多。

  2. 使用垃圾收集器的分配速度要快得多。确实,许多 C++ 实现中的默认分配器比大多数(当前)垃圾收集分配器提供的要慢。这是平衡的(至少在一定程度上),因为 C++ 中的分配往往在堆栈上,而且速度也很快,而在 GC 语言中,几乎所有分配通常都在堆上。更糟糕的是,在托管语言中,您通常为每个对象单独分配空间,而在 C++ 中,您通常为作用域中的所有对象一起分配空间。

确实,C++ 直接支持全局和逐类地替换分配器,因此当/如果分配速度确实是一个问题时,通常很容易修复。

最终,jalf 是对的:这两点无疑都支持“托管”实现。不过,应该正确看待这种改进的程度:它们不足以让动态编译的实现在很多代码上运行得更快——甚至从一开始就设计的基准测试也没有尽可能地支持它们。

Edit2:我看到 Jon Harrop 试图插入他的两分(十亿分之一)分值。对于那些不了解 Jon 的人来说,他是一名 臭名昭著 巨魔 垃圾邮件发送者 对于 ,似乎正在寻找新的土壤来播种杂草。我试图详细回复他的评论,但是(这对他来说是典型的)它仅由不合格的、不受支持的概括组成,其中实际内容很少,以至于不可能做出有意义的回复。所能做的就是向旁观者发出公平的警告,让他们知道他因不诚实、自私和最好被忽视而闻名。

A lot of it comes down to a simple difference between fact and theory. People have advanced theories to explain why Java should be (or at least might be) faster than C++. Most of the arguments have little to do with Java or C++ per se, but to dynamic versus static compilation, with Java and C++ really being little more than examples of the two (though, of course, it's possible to compile Java statically, or C++ dynamically). Most of these people have benchmarks to "prove" their claim. When those benchmarks are examined in any detail, it quickly becomes obvious that in quite a few cases, they took rather extreme measures to get the results they wanted (e.g., quite a number enable optimization when compiling the Java, but specifically disabled optimization when compiling the C++).

Compare this to the Computer Language Benchmarks Game, where pretty much anybody can submit an entry, so all of the code tends to be optimized to a reasonable degree (and, in a few cases, even an unreasonable degree). It seems pretty clear that a fair number of people treat this as essentially a competition, with advocates of each language doing their best to "prove" that their preferred language is best. Since anybody can submit an implementation of any problem, a particularly poor submission has little effect on overall results. In this situation, C and C++ emerge as clear leaders.

Worse, if anything these results probably show Java in better light than is entirely accurate. In particular, somebody who uses C or C++ and really cares about performance can (and often will) use Intel's compiler instead of g++. This will typically give at least a 20% improvement in speed compared to g++.

Edit (in response to a couple of points raised by jalf, but really too long to fit reasonably in comments):

  1. pointers being an optimizer writers nightmare. This is really overstating things (quite) a bit. Pointers lead to the possibility of aliasing, which prevents certain optimizations under certain circumstances. That said, inlining prevents the ill effects much of the time (i.e., the compiler can detect whether there's aliasing rather than always generating code under the assumption that there could be). Even when the code does have to assume aliasing, caching minimizes the performance hit from doing so (i.e., data in L1 cache is only minutely slower than data in a register). Preventing aliasing would help performance in C++, but not nearly as much as you might think.

  2. Allocation being a lot faster with a garbage collector. It's certainly true that the default allocator in many C++ implementations is slower than what most (current) garbage collected allocators provide. This is balanced (to at least a degree) by the fact that allocations in C++ tend to be on the stack, which is also fast, whereas in a GC language nearly all allocations are usually on the heap. Worse, in a managed language you usually allocate space for each object individually whereas in C++ you're normally allocating space for all the objects in a scope together.

It's also true that C++ directly supports replacing allocators both globally and on a class-by-class basis, so when/if allocation speed really is a problem it's usually fairly easy to fix.

Ultimately, jalf is right: both of these points undoubtedly do favor "managed" implementations. The degree of that improvement should be kept in perspective though: they're not enough to let dynamically compiled implementations run faster on much code -- not even benchmarks designed from the beginning to favor them as much as possible.

Edit2: I see Jon Harrop has attempted to insert his two (billionths of a) cent's worth. For those who don't know him, Jon's been a notorious troll and spammer for years, and seems to be looking for new ground into which to sow weeds. I'd try to reply to his comment in detail, but (as is typical for him) it consists solely of unqualified, unsupported generalizations containing so little actual content that a meaningful reply is impossible. About all that can be done is to give onlookers fair warning that he's become well known for being dishonest, self-serving, and best ignored.

花海 2024-09-15 14:29:18

JIT 编译器理论上可以执行很多优化,是的,但是您愿意等待多久? C++ 应用程序可能需要几个小时才能编译,因为它是离线进行的,并且用户不会坐在那里敲击手指并等待。

JIT 编译器必须在几毫秒内完成。
那么您认为哪一个可以摆脱最复杂的优化呢?

垃圾收集器也是一个因素。并不是因为它本身比手动内存管理慢(我相信它的摊余成本相当不错,绝对可以与手动内存处理相媲美),而是因为它更难以预测。它几乎可以在任何时候引入停顿,这在需要极高响应能力的系统中可能是不可接受的。

当然,这些语言适合不同的优化。 C++ 允许您编写非常紧凑的代码,几乎没有内存开销,并且许多高级操作基本上是免费的(例如,类构造)。

另一方面,在 C# 中,您浪费了大量内存。简单地实例化一个类会带来很大的开销,因为即使您的实际类是空的,也必须初始化基本 Object

C++ 允许编译器积极地去除未使用的代码。在 C# 中,大部分内容都必须存在,以便可以通过反射找到它。

另一方面,C# 没有指针,这是优化编译器的噩梦。托管语言中的内存分配比 C++ 中的内存分配要便宜得多。

两种方式都有优点,因此期望您能得到简单的“一个或另一个”答案是天真的。根据具体的源代码、编译器、操作系统、运行的硬件,其中之一可能会更快。根据您的需求,原始性能可能不是第一目标。也许您对响应能力更感兴趣,对避免不可预测的停顿更感兴趣。

一般来说,典型的 C++ 代码的执行方式与等效的 C# 代码类似。有时更快,有时更慢,但两种方式可能都没有显着差异。

但同样,这取决于具体情况。这取决于您愿意在优化上花费多少时间。如果您愿意花费尽可能多的时间,C++ 代码通常可以获得比 C# 更好的性能。这需要大量的工作。

当然,另一个原因是大多数使用 C++ 的公司已经拥有庞大的 C++ 代码库,他们并不想放弃这些代码库。他们需要它来继续工作,即使他们逐渐将(一些)新组件迁移到托管语言。

A JIT compiler could theoretically perform a lot of optimizations, yes, but how long are you willing to wait? A C++ app can take hours to compiler because it happens offline, and the user isn't sitting there tapping his fingers and waiting.

A JIT compiler has to finish within a couple of milliseconds.
So which do you think can get away with the most complex optimizations?

The garbage collector is a factor too. Not because it is slower than manual memory management per se (I believe its amortized cost is pretty good, definitely comparable to manual memory handling), but it's less predictable. It can introduce a stall at pretty much any point, which might not be acceptable in systems that are required to be extremely responsive.

And of course, the languages lend themselves to different optimizations. C++ allows you to write very tight code, with virtually no memory overhead, and where a lot of high level operations are basically free (say, class construction).

In C# on the other hand, you waste a good chunk of memory. And simply instantiating a class carries a good chunk of overhead, as the base Object has to be initialized, even if your actual class is empty.

C++ allows the compiler to strip away unused code aggressively. In C#, most of it has to be there so it can be found with reflection.

On the other hand, C# doesn't have pointers, which are an optimizing compiler's nightmare. And memory allocations in a managed language are far cheaper than in C++.

There are advantages either way, so it is naive to expect that you can get a simple "one or the other" answer. Depending on the exact source code, the compiler, the OS, the hardware it's running on, one or the other may be faster. And depending on your needs, raw performance might not be the #1 goal. Perhaps you're more interested in responsiveness, in avoiding unpredictable stalls.

In general, your typical C++ code will perform similarly to equivalent C# code. Sometimes faster, sometimes slower, but probably not a dramatic difference either way.

But again, it depends on the exact circumstances. And it depends on how much time you're willing to spend on optimization. if you're willing to spend as much time as it takes, C++ code can usually achieve better performance than C#. It just takes a lot of work.

And the other reason, of course, is that most companies who use C++ already have a large C++ code base which they don't particularly want to ditch. They need that to keep working, even if they gradually migrate (some) new components to a managed language.

月亮邮递员 2024-09-15 14:29:18

这些公司通常对硬件的价格没有限制。

如果他们也不关心软件有多昂贵,那么我认为 C++ 当然可以更快:例如,程序员可能会使用自定义分配或预分配的内存;和/或它们可以在内核中运行代码(避免环转换),或在实时操作系统上运行,和/或使其与网络协议栈紧密耦合。

These firms generally have no limit as to how expensive the hardware is.

If they also don't care how expensive the sofware is, then I'd think that of course C++ can be faster: for example, the programmer might use custom-allocated or pre-allocated memory; and/or they can run code in the kernel (avoiding ring transitions), or on a real-time O/S, and/or have it closely-coupled to the network protocol stack.

农村范ル 2024-09-15 14:29:18

除了性能之外,使用 C++ 还有其他原因。有一个巨大的现有 C 和 C++ 代码库。用其他语言重写所有这些内容是不切实际的。为了使 P/Invoke 等功能正常工作,目标代码必须设计为可以从其他地方调用。如果不出意外的话,您就必须编写某种包装器来暴露完全 C API,因为您无法 P/Invoke 到 C++ 类。

最后,P/Invoke 是一个非常昂贵的操作。

JIT 编译器变得越来越好。他们可以在程序运行时进行优化

是的,他们可以这样做。但您忘记了任何 C++ 编译器都能够进行相同的优化。当然,编译时间会更糟,但这种优化必须在运行时完成的事实是开销。在某些情况下,托管语言可以在某些任务上击败 C++,但这通常是因为它们的内存模型,而不是运行时优化的结果。严格来说,你当然可以在 C++ 中拥有这样的内存模型,编辑:例如 C# 对字符串的处理,/编辑,但很少有 C++ 程序员像 JIT 人员那样花那么多时间来优化他们的代码。

有一些性能问题是托管语言固有的缺点——即磁盘 I/O。这是一次性成本,但根据应用程序,它可能会很重要。即使使用最好的优化器,程序启动时您仍然需要从磁盘加载 30MB+ 的 JIT 编译器;而 C++ 二进制文件很少能达到这个大小。

There are reasons to use C++ other than performance. There is a HUGE existing library of C and C++ code. Rewriting all of that in alternate languages would not be practical. In order for things like P/Invoke to work correctly, the target code has to be designed to be called from elsewhere. If nothing else you'd have to write some sort of wrapper around things exposing a completely C API because you can't P/Invoke to C++ classes.

Finally, P/Invoke is a very expensive operation.

JIT compilers are getting better and better. They can do optimizations as the program is running

Yes, they can do this. But you forget that any C++ compiler is able to do the same optimizations. Sure, compile time will be worse, but the very fact that such optimizations have to be done at runtime is overhead. There are cases where managed languages can beat C++ at certain tasks, but this is usually because of their memory models and not the result of runtime optimizations. Strictly speaking, you could of course have such a memory model in C++, EDIT: such as C#'s handling of strings, /EDIT but few C++ programmers spend as much time optimizing their code as JIT guys do.

There are some performance issues that are an inherit downside to managed languages -- namely disk I/O. It's a one time cost, but depending on the application it can be significant. Even with the best optimizers, you still need to load 30MB+ of JIT compiler from disk when the program starts; whereas it's rare for a C++ binary to approach that size.

浮生面具三千个 2024-09-15 14:29:18

一个简单的事实是 C++ 是为速度而设计的。 C#/Java 不是。

以这些语言特有的无数继承层次结构(例如 IEnumerable)为例,与通用的 std::sort 或 std::for_each 的零开销相比。 C++ 的原始执行速度不一定更快,但程序员可以设计快速或零开销的系统。即使是缓冲区溢出之类的事情 - 您也无法关闭它们的检测。在 C++ 中,你有控制权。从根本上来说,C++ 是一种快速语言——您无需为不使用的东西付费。相反,在 C# 中,如果您使用 stackalloc,则不能不进行缓冲区溢出检查。您不能在堆栈上或连续地分配类。

还有整个编译时的问题,C++ 应用程序的编译和开发时间可能会更长。

The simple fact is that C++ is designed for speed. C#/Java aren't.

Take the innumerable inheritance hierarchies endemic to those languages (such as IEnumerable), compared to the zero-overhead of std::sort or std::for_each being generic. C++'s raw execution speed isn't necessarily any faster, but the programmer can design fast or zero-overhead systems. Even things like buffer overruns- you can't turn their detection off. In C++, you have control. Fundamentally, C++ is a fast language- you don't pay for what you don't use. In contrast, in C#, if you use, say, stackalloc, you can't NOT do buffer overrun checking. You can't allocate classes on the stack, or contiguously.

There's also the whole compile-time thing, where C++ apps can take much longer, both to compile, and to develop.

年华零落成诗 2024-09-15 14:29:18

这可能有点偏离主题,但几周前我观看了一个视频,您可能对此感兴趣:http://ocaml.janestreet.com/?q=node/61

它来自一家决定使用 ocaml 作为其主要交易语言的贸易公司,我认为他们的动机应该对你有启发(基本上,他们当然重视速度,但也重视强大的打字和功能风格,以实现更快的增量以及更容易理解)。

This might be kinda off topic, but I watched a video a couple of weeks ago which might appear to be of interest to you : http://ocaml.janestreet.com/?q=node/61

It comes from a trading company which decided to use ocaml as its main language for trading, and I think their motivations should be enlightening to you (basically, they valued speed of course, but also strong typing and functional style for quicker increments as well as easier comprehension).

时光病人 2024-09-15 14:29:18

我们的大部分代码最终都必须在包含 1000 台机器的网格上运行。

我认为这种环境改变了争论。例如,如果 c++ 和 c# 执行速度之间的差异为 25%,那么其他因素就会发挥作用。当它在网格上运行时,它的编码方式可能没有什么区别,因为整个过程一旦分布在机器上可能就不再是问题,也不会通过分配或购买更多机器来解决。最重要的问题和成本可能会成为“上市时间”,其中 c# 可能会成为赢家和更快的选择。

C++ 和 C# 哪个更快?

C# 六个月......

Most of our code ends up having to be run on a Grid of 1000's of machines.

I think this environment changes the argument. If the difference between c++ and c# execution speed is 25% for example then other factors come into play. When this is run on a grid it may make no difference as to how it is coded as the whole process once spread across machines may not be an issue or solved by allocating or purchasing a few more machines. The most important issue and cost may become 'time to market' where c# may prove the winner and faster option.

Which is faster c++ or c#?

C# by six months......

素手挽清风 2024-09-15 14:29:18

这不仅仅是编程语言的问题,硬件和操作系统都会相关。
通过实时操作系统、实时编程语言和高效(!)编程,您将获得最佳的整体性能。

因此,您在选择操作系统时有多种选择,在选择语言时也有多种选择。
有 C、Realtime Java、Realtime Fortran 和其他一些语言。

或者,您可能会在对 FPGA/处理器进行编程以消除操作系统成本时获得最佳结果。

你要做的最大选择是,你会忽略多少可能的性能优化,转而选择一种简化开发并且运行更稳定的语言,因为你可以做更少的错误,这将导致系统的更高可用性。这一点不容忽视。开发一个比任何其他应用程序执行速度快 5% 的应用程序是没有胜利的,而该应用程序每隔几个点就会由于一些难以发现的小错误而崩溃。

It's not only a matter of programming language, the hardware and operating system will be relevant to.
The best overall performance you will get with a realtime operating system, a realtime programming language and efficient (!) programming.

So you've quite a few possibilities in choosing an operating system, and a few in choosing the language.
There's C, Realtime Java, Realtime Fortran and a few others.

Or maybe you will have the best results in programming an FPGA/Processor to eliminate the cost of an operating system.

The greatest choice you have to do, how many possible performance optimizations you will ignore in favor of choosing a language that eases development and will run more stable, because you can do less bugs, which will result in a higher availiability of the system. This shouldn't be overlooked. You have no win in developing an application which performs 5% faster than any other application which crashes every few point due to some minor hard to find bugs.

烂柯人 2024-09-15 14:29:18

在高频交易中,延迟是比吞吐量更大的问题。考虑到数据源固有的并行性,您始终可以投入更多内核来解决问题,但无法用更多硬件来弥补响应时间。无论语言是预先编译的还是即时编译的,垃圾收集都会破坏您的延迟。存在具有保证垃圾收集延迟的实时 JVM。这是一项相当新的技术,调整起来很痛苦,而且价格昂贵得离谱,但如果你有资源,这是可以做到的。随着早期采用者为现在正在进行的研发提供资金,它在未来几年可能会变得更加主流。

In HFT, latency is a bigger issue that throughput. Given the inherent parallelism in the data source, you can always throw more cores at the problem, but you can't make up for response time with more hardware. Whether the language is compiled beforehand, or Just-In-Time, garbage collection can destroy your latency. There exist realtime JVMs with guaranteed garbage collection latency. It's a fairly new technology, a pain to tune, and ridiculously expensive, but if you have the resources, it can be done. It'll probably become much more mainstream in coming years, as the early adopters fund the R&D that's going on now.

葵雨 2024-09-15 14:29:18

C++ 最有趣的事情之一是它的性能数字不是更好,而是更可靠

它不一定比 Java/C#/... 更快,但它在不同的运行中保持一致

就像在网络中一样,有时吞吐量并不像稳定的延迟那么重要

One of the most interesting thing in C++ is that its performance numbers are not better, but more reliable.

It's not necessarily faster than Java/C#/..., but it is consistent accross runs.

Like in networking, sometimes the throughput isn't as important as a stable latency.

很酷又爱笑 2024-09-15 14:29:18

在这种情况下,除了已经说过的之外,更喜欢 C++(或更低级别)的一个重要原因是,低级别有一些适应性优势

如果硬件技术发生变化,您始终可以放入 __asm { } 块并在语言/编译器赶上之前实际使用它。

例如,仍然爪哇。

A huge reason to prefer c++ (or lower level) in this case other than what has already been said, is that there are some adaptability benefits of being low level.

If hardware technology changes, you can always drop into an __asm { } block and actually use it before languages/compilers catch up

For example, there is still no support for SIMD in Java.

愿与i 2024-09-15 14:29:18

虚拟执行引擎(.Net 的 JVM 或 CLR)不允许以高效的方式构建工作,因为流程实例无法在所需数量的线程上运行。

相比之下,普通 C++ 可以在时间关键的执行路径之外执行并行算法和构造对象。这几乎就是一切——简单而优雅。另外,使用 C++,您只需为使用的内容付费。

Virtual Execution Engines (JVM or CLR of .Net) do not permit structuring the work in time-efficient way, as process instances cannot run on as many threads as might be needed.

In contrast, plain C++ enables execution of parallel algorithms and construction of objects outside the time-critical execution paths. That’s pretty much everything – simple and elegant. Plus, with C++ you pay only for what you use.

赢得她心 2024-09-15 14:29:18

房间里的大象是 C++ 比 Java 更快的事实

我们都知道。但我们也知道,如果我们像我刚才那样坦白地陈述这一点,我们就无法假装就这个无可争议的话题进行有意义的辩论。对于您的应用程序来说,C++ 比 Java 快多少?这听起来像是一个有争议的话题,但是,唉,除非您用两种语言实现您的应用程序,否则它始终是假设的,此时就没有争论的余地了。

让我们回到您的第一次设计会议:您的项目的硬性要求是高性能。房间里的每个人都会想到“C++”和一些其他编译语言。房间里建议 Java 或 C# 的人必须用证据(即原型)来证明它的合理性,而不是用假设,不是用供应商的声明,不是用程序员八卦网站上的声明,当然也不是用“你好”世界”基准。

就目前情况而言,您必须按照已知的情况前进,而不是假设可能的情况。

The elephant in the room here is the fact that C++ is faster than Java.

We all know it. But we also know that if we state it plainly, as I just did, that we can't pretend to engage in a meaningful debate about this undebatable topic. How much faster is C++ than Java for your application? That has the ring of a debatable topic, but, alas, it will always be hypothetical unless you implement your application in both languages, at which point it there will be no room for debate.

Let's go back to your first design meeting: The hard requirement for your project is high performance. Everyone in the room will think "C++" and a handful of other compiled languages. The guy in the room who suggests Java or C# will have to justify it with evidence (i.e., a prototype), not with hypotheticals, not with claims made by the vendors, not with statements on programmer gossip sites, and certainly not with "hello world" benchmarks.

As it stands now, you have to move forward with what you know, not with what is hypothetically possible.

诗笺 2024-09-15 14:29:18

Nikie 写道:“您能解释一下使用 C++ 线程而不是 .NET 线程可以做什么吗?”

.Net 线程几乎可以执行 C++ 线程可以执行的所有操作,除了:

  1. 高效执行 COM 封装的二进制代码。例如,可能必须对应用程序开发人员保密的敏感算法。 (可能与 HFT 相关)
  2. 创建精益线程,不会用大块构建块耗尽系统资源 - 包装的操作系统 API 和同步与同步。信令操作系统原语。 (与 HFT 中性能时间优化的并行算法极其相关)
  3. 在相同的硬件和相同的延迟下将业务流程应用程序的吞吐量扩展 10 倍或更多倍。 (与 HFT 无关)
  4. 将每单位硬件并发处理的用户交互数量扩大 100 倍甚至更多。 (与 HFT 无关)

使用更多 CPU 核心并不能完全补偿 .Net 构建块对系统资源的耗尽,因为更多 CPU 核心是出现内存争用的保证。

Nikie wrote: “Could you explain what you can do with C++ threads and not with e.g. .NET threads?”

Threading with .Net could perform virtually everything C++ threading can, except:

  1. Efficient execution of COM-encapsulated binary code. For examples, sensitive algorithms that might have to be kept secret from application developers. (Might be relevant in HFT)
  2. Creation of lean threads that do not exhaust system resources with chunky building blocks – wrapped OS APIs and synchronization & signaling OS primitives. (Extremely relevant with parallel algorithms for time-optimization of performance in HFT)
  3. Scaling up the throughput of a business process application 10 or more times on the same hardware and with the same latency. (Not relevant in HFT)
  4. Scaling up 100 and more times the number of concurrently handled user interactions per unit of hardware. (Not relevant in HFT)

Using more CPU cores cannot fully compensate exhausting of system resources by the building blocks of .Net since more CPU cores are a guarantee for appearance of memory contention.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文