java线程与java进程性能下降

发布于 2024-11-27 10:55:25 字数 1328 浏览 1 评论 0原文

在这里，我将重点关注我遇到降级的自定义应用程序（不需要关于线程对进程的快速性的一般讨论）。

我有 Java 上的 MPI 应用程序，它使用迭代方法解决了一些问题。下面的应用程序示意图让我们将其称为 MyProcess(n)，其中“n”是进程数：

double[] myArray = new double[M*K];

for(int iter = 0;iter<iterationCount;++iter)
{
   //some communication between processes

   //main loop
   for(M) 
     for(K)
     {
        //linear sequence of arithmetical instructions
     }

   //some communication between processes
}

为了提高性能，我决定使用 Java 线程（让我们将其称为 MyThreads(n)）。代码几乎相同 - myArray 变成矩阵，其中每行包含适当线程的数组。

double[][] myArray = new double[threadNumber][M*K];


public void run()
{
  for(int iter = 0;iter<iterationCount;++iter)
  {
     //some synchronization primitives

     //main loop
     for(M) 
       for(K)
       {
          //linear sequence of arithmetical instructions

          counter++;
       }

     // some synchronization primitives
  }
}

使用 Executors.newFixedThreadPool(threadNumber) 创建并启动线程。

问题是，虽然对于 MyProcess(n)，我们获得了足够的性能（n 在 [1,8] 中），但在 MyThreads(n) 的情况下，性能本质上会下降（在我的系统上，性能下降了 n 倍）。

硬件：Intel(R) Xeon(R) CPU X5355（2 个处理器，每个处理器 4 个内核）

Java 版本：1.5（使用 d32 选项）。

起初我以为线程上有不同的工作负载，但事实并非如此，变量“counter”显示，MyThreads(n) 的不同运行之间的迭代次数（[1,8] 中的 n）是相同的。

这不是同步错误，因为我临时注释了所有同步原语。

任何建议/想法将不胜感激。

谢谢。

原文

Here I would focus on custom application where I got degradation (no need for general discussion about fastness of threads against processes).

I've got MPI application on Java which solve some problem using iteration method. The schematic view to application bellow lets call it MyProcess(n), where "n" is the number of processes:

double[] myArray = new double[M*K];

for(int iter = 0;iter<iterationCount;++iter)
{
   //some communication between processes

   //main loop
   for(M) 
     for(K)
     {
        //linear sequence of arithmetical instructions
     }

   //some communication between processes
}

To improve performance I've decided to use Java threads (lets call it MyThreads(n)). The code is almost the same – myArray becomes matrix, where each row contains array for appropriate thread.

double[][] myArray = new double[threadNumber][M*K];


public void run()
{
  for(int iter = 0;iter<iterationCount;++iter)
  {
     //some synchronization primitives

     //main loop
     for(M) 
       for(K)
       {
          //linear sequence of arithmetical instructions

          counter++;
       }

     // some synchronization primitives
  }
}

Threads created and started using Executors.newFixedThreadPool(threadNumber).

The problem is that while for MyProcess(n) we got adequate performance(n in [1,8]), in case of MyThreads(n) performance degrades essentially(on my system by factor of n).

Hardware: Intel(R) Xeon(R) CPU X5355(2 processors, 4 cores on each)

Java version: 1.5(using d32 option).

At first I thought that got different workloads on threads, but no, variable “counter” shows, that number of iterations between different run of MyThreads(n) (n in [1,8]) are identical.

And it isn’t synchronization fault, because I have temporary comment all synchronization primitives.

Any suggestions/ideas would be appreciated.

Thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

半仙 2024-12-04 10:55:25

我在你的代码中看到两个问题。

首先是缓存问题。由于您尝试在多线程/进程中执行此操作，因此我假设您的 M * K 结果会很大；那么当你这样做时，

    double[][] myArray = new double[threadNumber][M*K];

你实际上是在创建一个大小为 threadNumber 的双指针数组；每个都指向大小为 M*K 的双精度数组。这里有趣的一点是，数组的 threadNumber 计数不一定分配到同一内存块上。它们只是双指针，可以分配在 JVM 堆内的任何位置。因此，当多个线程运行时，您可能会遇到大量缓存未命中，并且最终会多次读取内存，最终减慢程序速度。

如果以上是根本原因，您可以尝试扩大 JVM 堆大小，然后

    double[] myArray = new double[threadNumber * M * K];

让线程在同一数组的不同段上运行。您应该能够更好地看到性能。

其次是同步问题。请注意，双精度（或任何原始）数组不是易失性的。因此，不能保证 1 个线程上的结果对其他线程可见。如果您使用同步块，这可以解决问题，因为同步的副作用是确保跨线程的可见性；如果没有，当你读写数组时，请务必使用Unsafe.putXXXVolatile()和Unsafe.getXXXVolatile()，以便你可以对数组进行易失性操作。

更进一步，Unsafe 还可以用于创建连续的内存段，您可以用它来保存数据结构并获得更好的性能。在你的情况下，我认为1）已经做到了。

There are 2 issues I see in your piece of code.

Firstly caching problem. Since you try to do this in multi thread/process I'd assume your M * K results in a large number; then when you do

    double[][] myArray = new double[threadNumber][M*K];

You are essentially creating an array of double pointer with size threadNumber; each pointing to a double array of size M*K. The interesting point here is that the threadNumber count of arrays are not necessarily allocated onto the same block of memory. They are just double pointers which can be allocated anywhere inside JVM heap. As a result, when multiple threads run, you might encounter a lot of cache miss and you end up reading memory many times, eventually slow down your program.

If the above is the root cause, you can try enlarge your JVM heap size, and then do

    double[] myArray = new double[threadNumber * M * K];

And have the threads operating on different segment of the same array. You should be able to see performance better.

Secondly synchronization issue. Note that double (or any primitive) array is NOT volatile. Thus your result on 1 thread isn't guaranteed to be visible to other threads. If you are using synchronization block this resolves the issue, as a side effect of synchronization is make sure visibility across threads; If not, when you are reading and writing the array, please always make sure you use Unsafe.putXXXVolatile() and Unsafe.getXXXVolatile() so that you can do volatile operations on arrays.

To take this further, Unsafe can also be used to create a continuous segment of memory which you can used to hold your data structure and achieve better performance. In your case I think 1) already do the trick.

回复收藏 0 原文

~没有更多了~