维护具有大量线程的 Java 应用程序时我需要了解什么?

发布于 2024-11-02 09:39:59 字数 1017 浏览 1 评论 0原文

背景信息

我有一个进行数据分析的分布式处理应用程序。它旨在并行处理多组实时更新的数据。作为设计的一部分,分析已分解为多个分析节点。每个节点获取源数据并对其进行处理以创建其他数据,然后这些数据又可以被其他节点使用。我们目前对一个数据集进行全套分析需要大约 200 个节点。

在当前的设计中,每个节点都运行自己的线程。现在,大多数时候这些线程都处于休眠状态。每当数据更新时,它们就会像瀑布一样依次唤醒,然后又重新进入睡眠状态。该应用程序目前正在生产中运行,运行 40 组数据,每组数据需要 200 个节点,使用 8000 个线程。当没有数据传入时,服务器上没有负载。当数据在最繁忙的时候进入时,服务器的 CPU 利用率会飙升至约 25%。这一切都在项目的设计和生产参数范围内。

现在,下一步,我们将 40 组数据扩展到 200 组。每组需要 200 个节点,这意味着总共 40000 个节点,即 40000 个线程。这超出了我们服务器的最大 PID,因此我请求我们的服务器管理员增加上限。他们做到了,应用程序工作了,但他们给了我一些关于线程数量的反馈。我并不否认线程的数量是不寻常的,但这是我们设计的这个阶段所期望和保证的。

我计划对设计进行一些小调整,以将线程与节点分开。这将允许我们配置一个线程来运行多个节点,并减少线程数量。对于不经常更新的数据集,让每个节点中一个线程执行数据更新对性能的影响很小。对于每秒更新数百次的数据集,我们可以将每个节点配置为在自己的线程上运行。事实上,我并不怀疑这种设计的改变将会发生——这只是一个时间问题。同时,我希望获得尽可能多的有关使用此设计的后果的信息。

问题

在一台机器上运行超过 40,000 个线程的成本是多少?让 JVM/Linux 操作系统管理这么多线程会损失多少性能?请记住,它们都已正确配置为在不工作时休眠。所以,我只是谈论由线程数量过多引起的额外开销和问题。

请注意 - 我知道我可以减少线程数量,并且我知道进行此设计更改是个好主意。我会尽快完成,但必须与其他工作和设计考虑因素进行平衡。我问这个问题是为了收集信息以便做出正确的决定。非常感谢您对此性质的想法和评论。

Background Information

I have a distributed processing application that does data analysis. It is designed to do parallel processing of many sets of data updated in real time. As part of the design, the analysis has been broken up into analytic nodes. Each node takes source data and processes it to create other data, which can then in turn be used by other nodes. To do our current full set of analysis on one data set requires about 200 nodes.

In the current design, each node runs with its own thread. Now, most of the time these threads are asleep. They wake up each in turn like a waterfall whenever data is updated, and then they go back to sleep. The application is currently in production running on 40 sets of data, each requiring 200 nodes, using 8000 threads. When there is no data coming in, there is no load on the server. When the data comes in at its busiest times, the server spikes to about 25% CPU. This is all within the design and production parameters of the project.

Now for the next step, we are scaling the 40 sets of data to 200. Each set requires 200 nodes which means a total of 40000 nodes, which is 40000 threads. This exceeds the max PID of our server, so I requested that our server admins increase the cap. They did it, and the application works, but they gave me some push-back about the number of threads. I'm not denying that the number of threads is unusual, but it is expected and warranted by this stage of our design.

I am planning some small tweaks to the design to separate the thread from the node. This would allow us to configure one thread to run multiple nodes, and reduce our thread count. For data sets that do not get updated frequently, there will be very little performance effect of having one thread execute the data updates in every node. For data sets that are updated hundreds of times per second, we can configure each node to run on its own thread. In fact, I don't doubt that this design change will be made -- it's only a matter of when. In the meantime, I'd like as much information as I can about the consequences of using this design.

Question

What are the costs of running with over 40,000 threads on one machine? How much performance am I losing by having the JVM / Linux OS manage this many threads? Please remember that they are all configured properly to sleep when there is no work. So, I'm just talking about extra overhead and problems caused by the sheer number of threads.

Please note - I know that I can reduce the number of threads, and I know that it's a good idea to make this design change. I'll do it as soon as I can, but it has to be balanced against other work and design considerations. I'm asking this question to gather information in order to make a good decision. Your thoughts and comments to this nature are much appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

没︽人懂的悲伤 2024-11-09 09:39:59

在一台机器上运行超过 40,000 个线程的成本是多少?让 JVM/Linux 操作系统管理这么多线程会损失多少性能?请记住,它们都已正确配置为在不工作时休眠。所以,我只是谈论由线程数量过多引起的额外开销和问题。

在JVM空间中,每个线程都需要一个线程栈(默认256kb)以及Thread对象和连接对象。可以使用 -Xss 选项更改默认线程堆栈,但我相信
64kb 是下限。 (40,000 x 256kb 是 10Gb ...)

在 Linux 上,每个线程还占用一个操作系统线程描述符,这将在线程不执行时帮助线程注册上下文......以及其他内容。这些描述符是预先分配的,我相信它们没有被分页。这是您的管理员需要增加的资源。

无论线程是唤醒还是睡眠,都会使用这些资源。

另一个问题是,使用 wait/notifyAll 进行同步时需要小心一些。如果有很多线程在等待同一个对象,那么当每个线程被唤醒时,notifyAll 将导致一系列活动。 (但是您可以通过不在同一个对象上等待大量线程来避免这种情况。)

请参阅 Oracle Java 线程页面,了解有关使用大量线程的后果的更多信息。


我的感觉是40000个线程太多了。理想的线程数量与您拥有的物理处理器/核心的数量成正比。虽然拥有大量线程不一定会导致性能下降,但您将占用大量资源,这可能会产生间接性能问题;例如,更长的 GC 时间、潜在的虚拟机抖动。

对于您的应用程序来说,更好的架构是实现线程池和工作队列,以将工作分配给数量少得多的活动线程。

What are the costs of running with over 40,000 threads on one machine? How much performance am I losing by having the JVM / Linux OS manage this many threads? Please remember that they are all configured properly to sleep when there is no work. So, I'm just talking about extra overhead and problems caused by the sheer number of threads.

In the JVM space, each thread needs a thread stack (default 256kb) and the Thread object and connected objects. The default thread stack can be changed using the -Xss option, but I believe
that 64kb is the lower limit. (40,000 x 256kb is 10Gb ...)

On Linux, each thread also occupies an OS thread descriptor which will help the thread's register context when the thread is not executing ... and other stuff. These descriptors are preallocated, and I believe they are not paged. This is the resource that your admins needed to increase.

These resources are used whether the thread is awake or sleeping.

Another issue is that you need to be a bit careful about synchronizing using wait / notifyAll. If there are lots of threads waiting on the same object, then a notifyAll will cause a flurry of activity as each thread gets woken up. (But you can avoid this by not having lots threads waiting on the same object.)

See the Oracle Java Threading page for more info on the consequences of using huge numbers of threads.


My feeling is that 40,000 threads is excessive. The ideal number of threads is proportional to the number of physical processors / cores you have. While you won't necessarily see a decrease in performance by having huge numbers of threads, you will be tying down lots of resources, and that could have indirect performance issues; e.g. longer GC times, potential VM thrashing.

A better architecture for your application would be to implement a thread pool and work queues to farm the work out to a much smaller number of active threads.

只是我以为 2024-11-09 09:39:59

现在你说线程在没有工作的时候会休眠。多久会有一次工作?有多少个工作单元同时进行?如果该数字大于处理器的数量,并且所述工作主要基于 CPU,那么您实际上会看到整体性能下降。

但让我们假设在任何给定时间完成的工作量是处理器的数量。如果是这种情况,我看到的第一个问题就是将发生的上下文切换量。 Java 中的上下文切换(一般基于)大约有 100 条指令。如果您的所有线程在短时间内都被切换(唤醒)以完成某些工作,那么我们正在谈论> 4,000,000 条额外指令。

有关上下文切换成本的更多信息,因为它们可能比任何事情都更影响您的程序。此文档的摘录解释了切换时验证线程本地缓存的成本

当一个新线程切换进来时,
它所需要的数据不太可能存在于
本地处理器缓存,所以一个上下文
切换导致缓存混乱
未命中,因此线程运行得有点慢
当他们第一次时更慢
已安排。这是原因之一
调度程序给每个可运行的
线程一定的最小时间量
即使有许多其他线程
等待

除此之外,您还需要分配额外的堆栈空间,以及用于 40,000 个线程对象的堆(对于线程来说,浅堆只有大约 7 兆)。

Now you said that threads will sleep when there is no work. How often will there be work? How many units of work are being done concurrently? If that number is greater then the number of processors, and the work as stated is mostly CPU based, you will actually see overall performance degradation.

But lets assume the amount of work done at any given time is the number of processors. If that's the case, the number one issue I can see is the amount of context switching that will occur. A context switch in Java (generally based) is around 100 instrucitons. If all your threads in a short period of time get switched in (awaken) to do some of their work, then we are talking > 4,000,000 extra instructions.

A bit more information on the cost of context switch, as they will probably effect your program more then anything. An excerpt from this document explains the cost of validating the thread's local cache when switching in

When a new thread is switched in, the
data it needs is unlikely to be in the
local processor cache, so a context
switch causes a flurry of cache
misses, and thus threads run a little
more slowly when they are first
scheduled. This is one of the reasons
that schedulers give each runnable
thread a certain minimum time quantum
even when many other threads are
waiting

Aside from that you have the added stack space needed to be allocated as well has heap for the 40,000 thread objects (which is only around 7 megs of shallow heap for the threads).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文