多线程并不比单线程快（简单循环测试）

发布于 2024-09-25 01:52:21 字数 1927 浏览 5 评论 0原文

我正在尝试一些多线程结构，但不知何故，多线程似乎并不比单线程快。我将其范围缩小到一个非常简单的测试，其中包含一个嵌套循环 (1000x1000)，其中系统仅计数。
下面我发布了单线程和多线程的代码以及它们的执行方式。
结果是，单个线程在大约 110 毫秒内完成循环，而两个线程也大约需要112 毫秒。
我不认为问题在于多线程的开销。如果我只将两个 Runnable 之一提交给 ThreadPoolExecutor，它的执行时间是单个线程的一半，这是有道理的。但是添加第二个 Runnable 会使速度慢 10 倍。两个 3.00Ghz 核心均 100% 运行。
我认为这可能是特定于电脑的，因为其他人的电脑在多线程上显示出双倍速度的结果。但那么，我能做什么呢？我有 Intel Pentium 4 3.00GHz（2 个 CPU）和 Java jre6。

测试代码：

// Single thread:
long start = System.nanoTime(); // Start timer
final int[] i = new int[1];     // This is to keep the test fair (see below)
int i = 0;
for(int x=0; x<10000; x++)
{
    for(int y=0; y<10000; y++)
    {
        i++; // Just counting...
    }
}
int i0[0] = i;
long end = System.nanoTime();   // Stop timer

此代码执行时间约为110 ms。

// Two threads:

start = System.nanoTime(); // Start timer

// Two of the same kind of variables to count with as in the single thread.
final int[] i1 = new int [1];
final int[] i2 = new int [1];

// First partial task (0-5000)
Thread t1 = new Thread() {
    @Override
    public void run() 
    {
        int i = 0;
        for(int x=0; x<5000; x++)
            for(int y=0; y<10000; y++)
                i++;
        i1[0] = i;
    }
};

// Second partial task (5000-10000)  
Thread t2 = new Thread() {
    @Override
    public void run() 
    {
        int i = 0;
        for(int x=5000; x<10000; x++)
            for(int y=0; y<10000; y++)
                i++;
        int i2[0] = i;
    }
};

// Start threads
t1.start();
t2.start();

// Wait for completion
try{
    t1.join();
    t2.join();
}catch(Exception e){
    e.printStackTrace();
}

end = System.nanoTime(); // Stop timer

此代码的执行时间约为112 毫秒。

编辑：我将 Runnables 更改为线程，并摆脱了 ExecutorService（为了简化问题）。

编辑：尝试了一些建议

原文

I'm experimenting with some multithreading constructions, but somehow it seems that multithreading is not faster than a single thread. I narrowed it down to a very simple test with a nested loop (1000x1000) in which the system only counts.
Below I posted the code for both single threading and multithreading and how they are executed.
The result is that the single thread completes the loop in about 110 ms, while the two threads also take about 112 ms.
I don't think the problem is the overhead of multithreading. If I only submit one of both Runnables to the ThreadPoolExecutor, it executes in half the time of the single thread, which makes sense. But adding that second Runnable makes it 10 times slower. Both 3.00Ghz cores are running 100%.
I think it may be pc-specific, as someone else's pc showed double-speed results on the multithreading. But then, what can I do about it? I have a Intel Pentium 4 3.00GHz (2 CPUs) and Java jre6.

Test code:

// Single thread:
long start = System.nanoTime(); // Start timer
final int[] i = new int[1];     // This is to keep the test fair (see below)
int i = 0;
for(int x=0; x<10000; x++)
{
    for(int y=0; y<10000; y++)
    {
        i++; // Just counting...
    }
}
int i0[0] = i;
long end = System.nanoTime();   // Stop timer

This code is executed in about 110 ms.

// Two threads:

start = System.nanoTime(); // Start timer

// Two of the same kind of variables to count with as in the single thread.
final int[] i1 = new int [1];
final int[] i2 = new int [1];

// First partial task (0-5000)
Thread t1 = new Thread() {
    @Override
    public void run() 
    {
        int i = 0;
        for(int x=0; x<5000; x++)
            for(int y=0; y<10000; y++)
                i++;
        i1[0] = i;
    }
};

// Second partial task (5000-10000)  
Thread t2 = new Thread() {
    @Override
    public void run() 
    {
        int i = 0;
        for(int x=5000; x<10000; x++)
            for(int y=0; y<10000; y++)
                i++;
        int i2[0] = i;
    }
};

// Start threads
t1.start();
t2.start();

// Wait for completion
try{
    t1.join();
    t2.join();
}catch(Exception e){
    e.printStackTrace();
}

end = System.nanoTime(); // Stop timer

This code is executed in about 112 ms.

Edit: I changed the Runnables to Threads and got rid of the ExecutorService (for simplicity of the problem).

Edit: tried some suggestions

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

深府石板幽径 2024-10-02 01:52:21

您绝对不想继续轮询 Thread.isAlive() - 这会无缘无故地消耗大量 CPU 周期。请改用 Thread.join() 。

另外，让线程直接增加结果数组、缓存行等可能不是一个好主意。更新局部变量，并在计算完成后进行一次存储。

编辑：

完全忽略了您使用的是 Pentium 4。据我所知，P4 没有多核版本 - 为了给人多核的错觉，它有超线程：两个逻辑核心共享一个物理核心的执行单元核心。如果您的线程依赖于相同的执行单元，您的性能将与单线程性能相同（或更差！）。例如，您需要在一个线程中进行浮点计算，在另一个线程中进行整数计算，以获得性能改进。

P4 HT 实现受到了很多批评，较新的实现（最近的 core2）应该更好。

回复收藏 0 原文

叹沉浮 2024-10-02 01:52:21

尝试稍微增加数组的大小。不，真的。

在同一线程中顺序分配的小对象往往最初是顺序分配的。这可能位于同一个缓存行中。如果您有两个核心访问同一缓存行（然后 micro-benhcmark 本质上只是对同一地址进行一系列写入），那么它们将不得不争夺访问权。

java.util.concurrent 中有一个类，它有一堆未使用的 long 字段。它们的目的是将不同线程可能频繁使用的对象分离到不同的缓存行中。

回复收藏 0 原文

日裸衫吸 2024-10-02 01:52:21

我对这种差异一点也不感到惊讶。您正在使用 Java 的并发框架来创建线程（尽管我看不到任何保证会创建两个线程，因为第一个作业可能会在第二个作业开始之前完成。

幕后可能会发生各种锁定和同步简而言之，我确实认为问题在于多线程的开销。

回复收藏 0 原文

戴着白色围巾的女孩 2024-10-02 01:52:21

你没有对 i 做任何事情，所以你的循环可能只是被优化掉了。

回复收藏 0 原文

阿楠 2024-10-02 01:52:21

您是否使用 Runtime.getRuntime().availableProcessors() 检查了 PC 上的可用内核数量？

回复收藏 0 原文

时光沙漏 2024-10-02 01:52:21

您的代码只是增加一个变量 - 无论如何，这是一个非常快的操作。您并没有从这里使用多线程中获得太多好处。当线程 1 必须等待某些外部响应或执行一些更复杂的计算时，性能提升更加明显，同时您的主线程或其他线程可以继续处理并且不会等待。如果您计数更多或使用更多线程，您可能会获得更多收益（安全数字可能是计算机中的 CPU/内核数量）。

回复收藏 0 原文

~没有更多了~