为什么这个 OpenMP 程序比单线程慢？

发布于 2024-11-19 12:14:44 字数 812 浏览 2 评论 0原文

请看一下这段代码。

单线程程序：http://pastebin.com/KAx4RmSJ。编译为：

g++ -lrt -O2 main.cpp -o nnlv2

使用 openMP 的多线程： http://pastebin.com/fbe4gZSn 编译为：

g++ -lrt -fopenmp -O2 main_openmp.cpp -o nnlv2_openmp

我在双核系统上测试了它（所以我们有两个线程并行运行）。但多线程版本比单线程版本慢（并且显示时间不稳定，尝试运行几次）。怎么了？我哪里做错了？

一些测试：

单线程：

Layers Neurons Inputs --- Time (ns)

10 200 200 --- 1898983

10 500 500 --- 11009094

10 1000 1000 --- 48116913

多线程：

Layers Neurons Inputs --- Time (ns)

10 200 200 --- 2518262

10 500 500 --- 13861504

10 1000 1000 --- 53446849

我不明白出了什么问题。

原文

Please look at this code.

Single-threaded program: http://pastebin.com/KAx4RmSJ. Compiled with:

g++ -lrt -O2 main.cpp -o nnlv2

Multithread with openMP: http://pastebin.com/fbe4gZSn
Compiled with:

g++ -lrt -fopenmp -O2 main_openmp.cpp -o nnlv2_openmp

I tested it on a dual core system (so we have two threads running in parallel). But multi-threaded version is slower than the single-threaded one (and shows unstable time, try to run it few times). What's wrong? Where did I make mistake?

Some tests:

Single-thread:

Layers Neurons Inputs --- Time (ns)

10 200 200 --- 1898983

10 500 500 --- 11009094

10 1000 1000 --- 48116913

Multi-thread:

Layers Neurons Inputs --- Time (ns)

10 200 200 --- 2518262

10 500 500 --- 13861504

10 1000 1000 --- 53446849

I don't understand what is wrong.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

怎樣才叫好 2024-11-26 12:14:44

您的目标是学习 OpenMP，还是让您的程序更快？如果是后者，则更值得编写乘加代码、减少传递次数并合并 SIMD。

步骤 1：组合循环并使用乘加：

// remove the variable 'temp' completely
for(int i=0;i<LAYERS;i++)
{
  for(int j=0;j<NEURONS;j++)
  {
    outputs[j] = 0;

    for(int k=0,l=0;l<INPUTS;l++,k++)
    {
      outputs[j] += inputs[l] * weights[i][k];
    }

    outputs[j] = sigmoid(outputs[j]);
  }

  std::swap(inputs, outputs);
}

Is your goal here to study OpenMP, or to make your program faster? If the latter, it would be more worthwhile to write multiply-add code, reduce the number of passes, and incorporate SIMD.

Step 1: Combine loops and use multiply-add:

// remove the variable 'temp' completely
for(int i=0;i<LAYERS;i++)
{
  for(int j=0;j<NEURONS;j++)
  {
    outputs[j] = 0;

    for(int k=0,l=0;l<INPUTS;l++,k++)
    {
      outputs[j] += inputs[l] * weights[i][k];
    }

    outputs[j] = sigmoid(outputs[j]);
  }

  std::swap(inputs, outputs);
}

回复收藏 0 原文