C++ 的性能高频金融中的虚拟机语言

发布于 2024-09-08 14:29:18 字数 964 浏览 4 评论 0原文

我认为 C/C++ 与 C#/Java 性能问题已经被广泛讨论，这意味着我已经阅读了足够的证据来表明 VM 语言不一定比“接近硅”的语言慢。主要是因为 JIT 编译器可以进行静态编译语言无法进行的优化。

然而，我最近收到了一个人的简历，他声称基于 Java 的高频交易总是被 C++ 击败，而且他也曾遇到过这样的情况。

快速浏览求职网站确实表明 HFT 申请人需要 C++ 知识，并且查看 Wilmott 论坛可以了解所有从业者谈论C++。

造成这种情况有什么特殊原因吗？我本以为现代金融业务有些复杂，具有类型安全、托管内存和丰富库的 VM 语言将是首选。这样生产率就更高。另外，JIT 编译器变得越来越好。他们可以在程序运行时进行优化，因此您会认为他们使用该运行时信息来击败非托管程序的性能。

也许这些人正在用 C++ 编写关键部分并从托管环境（P/Invoke 等）调用它们？这可能吗？

最后，有人对这个中心问题有经验吗？这就是为什么在这个领域中，非托管代码无疑比托管代码更受青睐？

据我所知，高频交易人员需要尽快对传入的市场数据做出反应，但这不一定是硬实时要求。如果你的速度很慢，你的情况会更糟，这是肯定的，但你不需要保证每个响应都有一定的速度，你只需要一个快速的平均值。

编辑

是的，到目前为止有几个很好的答案，但很笼统（众所周知的基础）。让我具体说明高频交易人员将运行什么样的程序。

主要标准是响应能力。当订单进入市场时，您希望成为第一个能够对其做出反应的人。如果你迟到了，其他人可能会在你之前完成，但每个公司的策略都略有不同，所以如果一次迭代有点慢，你可能没问题。

该程序全天运行，几乎没有用户干预。无论处理每条新的市场数据的函数是什么，每秒都会运行数十次（甚至数百次）。

这些公司通常对硬件的价格没有限制。

原文

I thought the C/C++ vs C#/Java performance question was well trodden, meaning that I'd read enough evidence to suggest that the VM languages are not necessarily any slower than the "close-to-silicon" languages. Mostly because the JIT compiler can do optimizations that the statically compiled languages cannot.

However, I recently received a CV from a guy who claims that Java-based high frequency trading is always beaten by C++, and that he'd been in a situation where this was the case.

A quick browse on job sites indeed shows that HFT applicants need knowledge of C++, and a look at Wilmott forum shows all the practitioners talking about C++.

Is there any particular reason why this is the case? I would have thought that with modern financial business being somewhat complex, a VM language with type safety, managed memory, and a rich library would be preferred. Productivity is higher that way. Plus, JIT compilers are getting better and better. They can do optimizations as the program is running, so you'd think they's use that run-time info to beat the performance of the unmanaged program.

Perhaps these guys are writing the critical bits in C++ and and calling them from a managed environment (P/Invoke etc)? Is that possible?

Finally, does anyone have experience with the central question in this, which is why in this domain unmanaged code is without doubt preferred over managed?

As far as I can tell, the HFT guys need to react as fast as possible to incoming market data, but this is not necessarily a hard realtime requirement. You're worse off if you're slow, that's for sure, but you don't need to guarantee a certain speed on each response, you just need a fast average.

EDIT

Right, a couple of good answers thus far, but pretty general (well-trodden ground). Let me specify what kind of program HFT guys would be running.

The main criterion is responsiveness. When an order hits the market, you want to be the first to be able to react to it. If you're late, someone else might take it before you, but each firm has a slightly different strategy, so you might be OK if one iteration is a bit slow.

The program runs all day long, with almost no user intervention. Whatever function is handling each new piece of market data is run dozens (even hundreds) of times a second.

These firms generally have no limit as to how expensive the hardware is.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

各空 2024-09-15 14:29:18

首先，1毫秒对于高频交易来说是永恒的。如果您认为不是，那么最好多阅读一些有关该领域的内容。（这就像距离交换机 100 英里一样。）正如任何基本排队论教科书上的公式都会告诉您的那样，吞吐量和延迟紧密地交织在一起。相同的公式将显示抖动值（如果网络结构正确并且您没有配置足够多的内核，则通常由 CPU 队列延迟的标准偏差主导）。

高频交易套利的问题之一是，一旦您决定捕获价差，就有两条腿（或更多腿）来实现利润。如果你未能击中所有的腿，你可能会留下一个你真正不想要的头寸（以及随后的损失）——毕竟你是在套利而不是投资。

除非您的策略能够预测（非常近期！！！）未来（无论您相信与否，这已经非常成功），否则您不需要头寸。如果您距离交易还有 1 毫秒，那么您的订单的很大一部分将不会被执行，您想要的订单将会被取消。最有可能的是那些只执行了一条腿的人最终会失败，或者至少不会盈利。

无论你的策略是什么，为了便于讨论，我们假设它最终的赢/输比为 55%/45%。即使盈亏比发生很小的变化，盈利能力也会发生很大的变化。

回复：“运行数十个（甚至数百个）”似乎偏离了数量级 即使查看每秒 20000 个报价点似乎也很低，尽管这可能是他所使用的工具集全天的平均值看着。

在任何给定的时间内观察到的速率都存在很大的变化。我举个例子。在我的一些测试中，我在中午查看了 7 只 OTC 股票（CSCO、GOOG、MSFT、EBAY、AAPL、INTC、DELL），该流的每秒速率范围可以从 0 mps（非常非常罕见）到每个峰值每秒几乎有 2000 笔报价和交易。（看看为什么我认为上面的 20000 很低。）

我为这个领域构建基础设施和测量软件，我们谈论的数字是每秒 100000 和数百万。我有 C++ 生产者/消费者基础设施库，可以在生产者和消费者之间每秒推送近 5000000（500 万）条消息（32 位，2.4 GHz 内核）。这些是 64 字节消息，在生产者端具有new、construct、enqueue、synchronize，在消费者端具有synchronize、dequeue、touch every byte、run virtual destructor、free 。现在无可否认，这是一个简单的基准测试，没有 Socket IO（并且套接字 IO 可能很难看），就像端点管道阶段的端点一样。它是所有仅在空时同步的自定义同步类、自定义分配器、自定义无锁队列和列表、偶尔的 STL（带有自定义分配器），但更常见的是自定义侵入式集合（其中我有一个重要的库）。我不止一次地为这个领域的供应商提供四倍（甚至更多）的吞吐量，而无需增加套接字端点的批处理。

我有 OrderBook 和 OrderBook::Universe 类，当平均超过 22000 个工具时，新建、插入、查找、部分填充、查找、第二次填充、擦除、删除序列花费的时间不到 2us。该基准测试在插入第一次填充和最后一次填充之间连续迭代所有 22000 个乐器，因此不涉及廉价的缓存技巧。对同一本书的操作被22000本不同书籍的访问分开。这些非常不是真实数据的缓存特征。真实数据在时间上更加本地化，连续交易经常出现在同一本书中。

所有这些工作都涉及仔细考虑所用集合的任何算法成本中的常量和缓存特性。（有时似乎 KO(n) KO(n*log n) 等中的 K 被忽略得有点太圆滑了）

我在 Marketdata 基础设施方面工作的事情。甚至考虑使用 java 或托管环境来完成这项工作都是不可想象的。当你可以用 C++ 获得这种性能时，我认为在托管环境中获得百万+/mps 的性能是相当困难的）我无法想象任何重要的投资银行或对冲基金（对于他们来说，250000 美元的薪水）一个顶尖的 C++ 程序员什么都不是）不使用 C++。

有人真的能从托管环境中获得 2000000+/mps 的性能吗？我认识这个领域的一些人，但没有人向我吹过这一点。我认为 2mm 在托管环境中会有一些吹嘘的权利。

据我所知，一位主要厂商的 FIX 顺序解码器每秒进行 12000000 场解码。（3Ghz CPU）它是 C++，写它的人几乎挑战任何人想出一些东西
在受管理的环境中，速度甚至只有一半。

从技术上讲，这是一个有趣的领域，有很多有趣的性能挑战。考虑一下标的证券发生变化时的期权市场 - 可能会出现 6 个未平仓价格点，具有 3 或 4 个不同的到期日。现在每笔交易可能有 10-20 个报价。这些报价可以触发期权的价格变化。
因此，对于每笔交易，期权报价可能会有 100 或 200 次变化。这只是大量的数据——不是大型强子对撞机碰撞探测器那样的数据量，但仍然是一个挑战。这与处理击键有点不同。

甚至关于 FPGA 的争论也在继续。许多人认为，在 3GHZ 商品硬件上运行的编码良好的解析器可以击败 500MHz FPGA。但即使稍微慢一点（不是说慢），基于 FPGA 的系统也可能具有更严格的延迟分布。（阅读“tend”——这不是一个笼统的声明）当然，如果你有一个很棒的 C++ 解析器，你可以通过 Cfront 推送它，然后通过 FPGA 图像生成器推送它......但这又是另一场争论......

Firstly, 1 ms is an eternity in HFT. If you think it is not then it would be good to do a bit more reading about the domain. (It is like being 100 miles away from the exchange.) Throughput and latency are deeply intertwined as the formulae in any elementary queuing theory textbook will tell you. The same formulae will show jitter values (frequently dominated by the standard deviation of CPU queue delay if the network fabric is right and you have not configured quite enough cores).

One of the problems with HFT arbitrage is that once you decide to capture a spread, there are two legs (or more) to realize the profit. If you fail to hit all legs you can be left with a position that you really don't want (and a subsequent loss) - after all you were arbitraging not investing.

You don't want positions unless your strategy is predicting the (VERY near term!!!) future (and this, believe it or not, is done VERY successfully). If you are 1 ms away from exchange then some significant fraction of your orders won't be executed and what you wanted will be picked off. Most likely the ones that have executed one leg will end up losers or at least not profitable.

Whatever your strategy is for argument's sake let us say it ends up a 55%/45% win/loss ratio. Even a small change in the win/loss ratio can have in big change in profitability.

re: "run dozens (even hundreds)" seems off by orders of magnitude Even looking at 20000 ticks a second seems low, though this might be the average for the entire day for the instrument set that he is looking at.

There is high variability in the rates seen in any given second. I will give an example. In some of my testing I look at 7 OTC stocks (CSCO,GOOG,MSFT,EBAY,AAPL,INTC,DELL) in the middle of the day the per second rates for this stream can range from 0 mps (very very rare) to almost almost 2000 quotes and trades per peak second. (see why I think the 20000 above is low.)

I build infrastructure and measurement software for this domain and the numbers we talk about are 100000's and millions per second. I have C++ producer/consumer infrastructure libraries that can push almost 5000000 (5 million) messages/second between producer and consumer, (32 bit,2.4 GHz cores). These are 64 byte messages with new, construct, enqueue, synchronize, on the producer side and synchronize,dequeue,touch every byte,run virtual destructor,free on the consumer side. Now admittedly that is a simple benchmark with no Socket IO (and socket IO can be ugly) as would be at the end points of the end point pipe stages. It is ALL custom synchronization classes that only synchronize when empty, custom allocators, custom lock free queues and lists, occasional STL(with custom allocators) but more often custom intrusive collections (of which I have a significant library). More than once I have given a vendor in this arena a quadruple (and more) in throughput without increased batching at the socket endpoints.

I have OrderBook and OrderBook::Universe classes that take less than 2us for new, insert, find, partial fill, find, second fill, erase, delete sequence when averaged over 22000 instruments. The benchmark iterates over all 22000 instruments serially between the insert first fill and last fill so there are no cheap caching tricks involved. Operations into the same book are separated by accesses of 22000 different books. These are very much NOT the caching characteristics of real data. Real data is much more localized in time and consecutive trades frequently hit the same book.

All of this work involves careful consideration of the constants and caching characteristics in any of the algorithmic costs of the collections used. (Sometimes it seems that the K's in KO(n) KO(n*log n) etc., etc., etc. are dismissed a bit too glibly)

I work on the Marketdata infrastructure side of things. It is inconceivable to even think of using java or a managed environment for this work. And when you can get this kind of performance with C++ and I think it is quite hard to get million+/mps performance with a managed environment) I can't imagine any of the significant investment banks or hedge funds (for whom a $250000 salary for a top notch C++ programmer is nothing) not going with C++.

Is anybody out there really getting 2000000+/mps performance out of a managed environment? I know a few people in this arena and no one ever bragged about it to me. And I think 2mm in a managed environment would have some bragging rights.

I know of one major player's FIX order decoder doing 12000000 field decodes/sec. (3Ghz CPU) It is C++ and the guy who wrote it almost challenged anybody to come up with something
in a managed environment that is even half that speed.

Technologically it is an interesting area with lots of fun performance challenges. Consider the options market when the underlying security changes - there might be say 6 outstanding price points with 3 or 4 different expiration dates. Now for each trade there were probably 10-20 quotes. Those quotes can trigger price changes in the options.
So for each trade there might be 100 or 200 changes in options quotes. It is just a ton of data - not a Large Hadron Collider collision-detector-like amount of data but still a bit of a challenge. It is a bit different than dealing with keystrokes.

Even the debate about FPGA's goes on. Many people take the position that a well coded parser running on 3GHZ commodity HW can beat a 500MHz FPGA. But even if a tiny bit slower (not saying they are) FPGA based systems can tend to have tighter delay distributions. (Read "tend" - this is not a blanket statement) Of course if you have a great C++ parser that you push through a Cfront and then push that through the FPGA image generator... But that another debate...

C++ 的性能高频金融中的虚拟机语言

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（15）

关于作者

相关话题

热门标签

推荐作者

qq_FjTq5B

18273202778

WordPress小学生

〃温暖了心ぐ

迷乱花海

niuniu

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。