哪种编程语言适合计算密集型交易投资组合模拟?

发布于 2024-09-05 15:27:16 字数 1045 浏览 7 评论 0原文

我正在构建一个交易投资组合管理系统,负责非高频交易投资组合的生成、优化和模拟(处理 1 分钟或 3 分钟的数据柱,而不是报价数据)。

我计划使用 Amazon Web 服务来承担应用程序的全部负载。

我正在考虑四种语言选择。

  1. Java
  2. C++
  3. C#
  4. Python

这里是项目范围的极端范围。事情不会是这样,也许永远不会,但它在要求的范围内:

  • 每周模拟 10,000,000 个交易系统。
  • (每个交易系统都应该有自己的数据挖掘方法,包括计算成本极高的特征选择算法。想象一下使用包装器的 500-5000 个特征。无论如何,这些都不会经常运行,但这仍然是一个考虑因素
  • )时间生产具有 100,000 个交易策略的投资组合
  • 从全球每个股票/期货市场获取 1 分钟或 3 分钟的数据(约 100,000 个)
  • 具有多达 100,000 个策略的投资组合优化。 (相当密集的算法)

速度是一个问题,但我相信Java可以处理负载。

我只是想确保Java能够轻松地处理上述要求。我不想用 C++ 来做这个项目,但如果需要的话我会的。

C# 在那里的原因是因为我认为它是 Java 的一个很好的替代品,尽管我根本不喜欢 Windows,并且如果所有东西都一样的话我更喜欢 Java。

Python - 我读过一些关于 PyPy 和 pyscho 的文章,声称 Python 可以通过 JIT 编译进行优化,以接近 C 的速度运行……这几乎是它出现在这个列表中的唯一原因,除了 Python 是一个伟大的语言,可能是最令人愉快的编码语言,这根本不是这个项目的一个因素,而是一个额外的好处。

总结一下:

  • 实时生产
  • 大量系统的
  • 每周模拟投资组合的每周/每月优化
  • 大量连接来收集数据

不处理毫秒甚至秒的交易。唯一需要考虑的是,当分布在必要数量的 EC2 服务器上时,Java 是否可以处理这种负载。

非常感谢你们的智慧。

I am building a trading portfolio management system that is responsible for production, optimization, and simulation of non-high frequency trading portfolios (dealing with 1min or 3min bars of data, not tick data).

I plan on employing Amazon web services to take on the entire load of the application.

I have four choices that I am considering as language.

  1. Java
  2. C++
  3. C#
  4. Python

Here is the scope of the extremes of the project scope. This isn't how it will be, maybe ever, but it's within the scope of the requirements:

  • Weekly simulation of 10,000,000 trading systems.
  • (Each trading system is expected to have its own data mining methods, including feature selection algorithms which are extremely computationally-expensive. Imagine 500-5000 features using wrappers. These are not run often by any means, but it's still a consideration)
  • Real-time production of portfolio w/ 100,000 trading strategies
  • Taking in 1 min or 3 min data from every stock/futures market around the globe (approx 100,000)
  • Portfolio optimization of portfolios with up to 100,000 strategies. (rather intensive algorithm)

Speed is a concern, but I believe that Java can handle the load.

I just want to make sure that Java CAN handle the above requirements comfortably. I don't want to do the project in C++, but I will if it's required.

The reason C# is on there is because I thought it was a good alternative to Java, even though I don't like Windows at all and would prefer Java if all things are the same.

Python - I've read somethings on PyPy and pyscho that claim python can be optimized with JIT compiling to run at near C-like speeds... That's pretty much the only reason it is on this list, besides that fact that Python is a great language and would probably be the most enjoyable language to code in, which is not a factor at all for this project, but a perk.

To sum up:

  • real time production
  • weekly simulations of a large number of systems
  • weekly/monthly optimizations of portfolios
  • large numbers of connections to collect data from

There is no dealing with millisecond or even second based trades. The only consideration is if Java can possibly deal with this kind of load when spread out of a necessary amount of EC2 servers.

Thank you guys so much for your wisdom.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

小苏打饼 2024-09-12 15:27:16

选择您最熟悉的语言。如果您对它们都了解,并且速度是真正关心的问题,请选择 C。

Pick the language you are most familiar with. If you know them all equally and speed is a real concern, pick C.

俏︾媚 2024-09-12 15:27:16

虽然我是 Python 的忠实粉丝,而且就我个人而言,我并不是 Java 的狂热爱好者,但在这种情况下,我不得不承认 Java 是正确的选择。

对于许多项目来说,Python 的性能并不是问题,但在您的情况下,即使是很小的性能损失也会很快累积起来。我知道这不是实时模拟,但即使对于批处理,它仍然是一个需要考虑的因素。如果事实证明负载对于一台虚拟服务器而言太大,那么速度提高一倍的实施将使您的虚拟服务器成本减半。

对于许多项目,我还认为 Python 可以让你更快地开发解决方案,但在这里我不确定情况是否如此。 Java 拥有世界一流的开发工具和用于并行处理和跨服务器部署的顶级企业级框架,而 Python 在该领域拥有解决方案,而 Java 显然具有优势。 Java 还提供 Python 无法比拟的架构选项,例如 Javaspace。

我认为 C 和 C++ 对于这样的项目强加了太多的开发开销。它们是可行的,因为如果您非常熟悉这些语言,我确信这是可行的,但除了更高性能的潜力之外,它们没有其他可以带来的东西。

C#只是Java的重写。如果您是 Windows 开发人员,并且如果您更喜欢 Windows,那么我会使用 C# 而不是 Java,这并不是一件坏事,但如果您不关心 Windows,则没有理由关心 C#。

While I am a huge fan of Python and personaly I'm not a great lover of Java, in this case I have to concede that Java is the right way to go.

For many projects Python's performance just isn't a problem, but in your case even minor performance penalties will add up extremely quickly. I know this isn't a real-time simulation, but even for batch processing it's still a factor to take into consideration. If it turns out the load is too big for one virtual server, an implementation that's twice as fast will halve your virtual server costs.

For many projects I'd also argue that Python will allow you to develop a solution faster, but here I'm not sure that would be the case. Java has world-class development tools and top-drawer enterprise grade frameworks for parallell processing and cross-server deployment and while Python has solutions in this area, Java clearly has the edge. You also have architectural options with Java that Python can't match, such as Javaspaces.

I would argue that C and C++ impose too much of a development overhead for a project like this. They're viable inthat if you are very familiar with those languages I'm sure it would be doable, but other than the potential for higher performance, they have nothing else to bring to the table.

C# is just a rewrite of Java. That's not a bad thing if you're a Windows developer and if you prefer Windows I'd use C# rather than Java, but if you don't care about Windows there's no reason to care about C#.

居里长安 2024-09-12 15:27:16

我会选择 Java 来完成这项任务。在RAM方面,Java和C++的区别在于,在Java中,每个Object有8个字节的开销(使用Sun 32位JVM或带有压缩指针的Sun 64位JVM)。因此,如果有数百万个物体在周围飞行,这可能会产生影响。就速度而言,Java 和 C++ 在该规模上几乎相等。

所以对我来说更重要的是开发时间。如果你在 C++ 中犯了一个错误,你会得到一个分段错误(有时你甚至没有得到它)​​,而在 Java 中你会得到一个带有堆栈跟踪的漂亮异常。我一直更喜欢这个。

在 C++ 中,您可以拥有原始类型的集合,而 Java 则没有。您必须使用外部库来获取它们。

如果您有实时要求,Java 垃圾收集器可能会很麻烦,因为即使在具有 24 个核心的计算机上,也需要花费几分钟来收集 20 GB 堆。但如果您在运行时没有创建太多临时对象,那也应该没问题。只是您的程序可以在您不期望的时候暂停垃圾收集。

I would pick Java for this task. In terms of RAM, the difference between Java and C++ is that in Java, each Object has an overhead of 8 Bytes (using the Sun 32-bit JVM or the Sun 64-bit JVM with compressed pointers). So if you have millions of objects flying around, this can make a difference. In terms of speed, Java and C++ are almost equal at that scale.

So the more important thing for me is the development time. If you make a mistake in C++, you get a segmentation fault (and sometimes you don't even get that), while in Java you get a nice Exception with a stack trace. I have always preferred this.

In C++ you can have collections of primitive types, which Java hasn't. You would have to use external libraries to get them.

If you have real-time requirements, the Java garbage collector may be a nuisance, since it takes some minutes to collect a 20 GB heap, even on machines with 24 cores. But if you don't create too many temporary objects during runtime, that should be fine, too. It's just that your program can make that garbage collection pause whenever you don't expect it.

雪化雨蝶 2024-09-12 15:27:16

为什么您的系统只有一种语言?如果我是你,我会用 Python 构建整个系统,但 C 或 C++ 将用于性能关键组件。通过这种方式,您将拥有一个非常灵活且可扩展的系统,并且具有足够快的性能。您甚至可以找到自动生成包装器的工具(例如 SWIG、Cython)。 Python 和 C/C++/Java/Fortran 并不相互竞争;他们是互补的。

Why only one language for your system? If I were you, I will build the entire system in Python, but C or C++ will be used for performance-critical components. In this way, you will have a very flexible and extendable system with fast-enough performance. You can find even tools to generate wrappers automatically (e.g. SWIG, Cython). Python and C/C++/Java/Fortran are not competing each other; they are complementing.

梦忆晨望 2024-09-12 15:27:16

用您喜欢的语言写下来。对我来说,这听起来像蟒蛇。当您开始运行系统时,您可以对其进行分析并查看瓶颈所在。一旦你做了一些基本的优化,如果仍然不可接受,你可以用 C 重写部分内容。

可以考虑用 Iron Python 编写它,以利用 .net 中的 clr 和 dlr。然后您可以利用 .net 4 和并行扩展。如果说有什么可以提高性能的话,那就是 .net 做得非常好的线程技术。

编辑:

只是想弄清楚这部分。从描述来看,听起来并行处理/多线程是大部分性能提升的来源。

Write it in your preferred language. To me that sounds like python. When you start running the system you can profile it and see where the bottlenecks are. Once you do some basic optimisations if it's still not acceptable you can rewrite portions in C.

A consideration could be writing this in iron python to take advantage of the clr and dlr in .net. Then you can leverage .net 4 and parallel extensions. If anything will give you performance increases it'll be some flavour of threading which .net does extremely well.

Edit:

Just wanted to make this part clear. From the description, it sounds like parallel processing / multithreading is where the majority of the performance gains are going to come from.

清秋悲枫 2024-09-12 15:27:16

查看数字代码的内部循环很有用。毕竟您将在这个循环中花费大部分 CPU 时间。

如果内循环是矩阵运算,那么我建议使用 python 和 scipy,但是内循环如果不是矩阵运算,那么我会担心 python 很慢。 (或者也许我会使用 swig 或 boost::python 将 c++ 包装在 python 中)

python 的好处是它很容易调试,并且不必一直编译,从而节省大量时间。这对于您花费大量时间进行深层内部编程的项目特别有用。

It is useful to look at the inner loop of your numerical code. After all you will spend most of your CPU-time inside this loop.

If the inner loop is a matrix operation, then I suggest python and scipy, but of the inner loop if not a matrix operation, then I would worry about python being slow. (Or maybe I would wrap c++ in python using swig or boost::python)

The benefit of python is that it is easy to debug, and you save a lot of time by not having to compile all the time. This is especially useful for a project where you spend a lot of time programming deep internals.

云朵有点甜 2024-09-12 15:27:16

我会选择 pypy。如果没有,http://lolcode.com/

I would go with pypy. If not, http://lolcode.com/.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文