维护具有大量线程的 Java 应用程序时我需要了解什么？

发布于 2024-11-02 09:39:59 字数 1017 浏览 1 评论 0原文

背景信息

我有一个进行数据分析的分布式处理应用程序。它旨在并行处理多组实时更新的数据。作为设计的一部分，分析已分解为多个分析节点。每个节点获取源数据并对其进行处理以创建其他数据，然后这些数据又可以被其他节点使用。我们目前对一个数据集进行全套分析需要大约 200 个节点。

在当前的设计中，每个节点都运行自己的线程。现在，大多数时候这些线程都处于休眠状态。每当数据更新时，它们就会像瀑布一样依次唤醒，然后又重新进入睡眠状态。该应用程序目前正在生产中运行，运行 40 组数据，每组数据需要 200 个节点，使用 8000 个线程。当没有数据传入时，服务器上没有负载。当数据在最繁忙的时候进入时，服务器的 CPU 利用率会飙升至约 25%。这一切都在项目的设计和生产参数范围内。

现在，下一步，我们将 40 组数据扩展到 200 组。每组需要 200 个节点，这意味着总共 40000 个节点，即 40000 个线程。这超出了我们服务器的最大 PID，因此我请求我们的服务器管理员增加上限。他们做到了，应用程序工作了，但他们给了我一些关于线程数量的反馈。我并不否认线程的数量是不寻常的，但这是我们设计的这个阶段所期望和保证的。

我计划对设计进行一些小调整，以将线程与节点分开。这将允许我们配置一个线程来运行多个节点，并减少线程数量。对于不经常更新的数据集，让每个节点中一个线程执行数据更新对性能的影响很小。对于每秒更新数百次的数据集，我们可以将每个节点配置为在自己的线程上运行。事实上，我并不怀疑这种设计的改变将会发生——这只是一个时间问题。同时，我希望获得尽可能多的有关使用此设计的后果的信息。

问题

在一台机器上运行超过 40,000 个线程的成本是多少？让 JVM/Linux 操作系统管理这么多线程会损失多少性能？请记住，它们都已正确配置为在不工作时休眠。所以，我只是谈论由线程数量过多引起的额外开销和问题。

请注意 - 我知道我可以减少线程数量，并且我知道进行此设计更改是个好主意。我会尽快完成，但必须与其他工作和设计考虑因素进行平衡。我问这个问题是为了收集信息以便做出正确的决定。非常感谢您对此性质的想法和评论。

原文

Background Information

I have a distributed processing application that does data analysis. It is designed to do parallel processing of many sets of data updated in real time. As part of the design, the analysis has been broken up into analytic nodes. Each node takes source data and processes it to create other data, which can then in turn be used by other nodes. To do our current full set of analysis on one data set requires about 200 nodes.

In the current design, each node runs with its own thread. Now, most of the time these threads are asleep. They wake up each in turn like a waterfall whenever data is updated, and then they go back to sleep. The application is currently in production running on 40 sets of data, each requiring 200 nodes, using 8000 threads. When there is no data coming in, there is no load on the server. When the data comes in at its busiest times, the server spikes to about 25% CPU. This is all within the design and production parameters of the project.

Now for the next step, we are scaling the 40 sets of data to 200. Each set requires 200 nodes which means a total of 40000 nodes, which is 40000 threads. This exceeds the max PID of our server, so I requested that our server admins increase the cap. They did it, and the application works, but they gave me some push-back about the number of threads. I'm not denying that the number of threads is unusual, but it is expected and warranted by this stage of our design.

I am planning some small tweaks to the design to separate the thread from the node. This would allow us to configure one thread to run multiple nodes, and reduce our thread count. For data sets that do not get updated frequently, there will be very little performance effect of having one thread execute the data updates in every node. For data sets that are updated hundreds of times per second, we can configure each node to run on its own thread. In fact, I don't doubt that this design change will be made -- it's only a matter of when. In the meantime, I'd like as much information as I can about the consequences of using this design.

Question

What are the costs of running with over 40,000 threads on one machine? How much performance am I losing by having the JVM / Linux OS manage this many threads? Please remember that they are all configured properly to sleep when there is no work. So, I'm just talking about extra overhead and problems caused by the sheer number of threads.

Please note - I know that I can reduce the number of threads, and I know that it's a good idea to make this design change. I'll do it as soon as I can, but it has to be balanced against other work and design considerations. I'm asking this question to gather information in order to make a good decision. Your thoughts and comments to this nature are much appreciated.

分享到QQ

分享到微博