openMP 和 SSE,我的程序没有加速
这是我的并行运行代码的一部分:
timer.Start();
for(int i = 0; i < params.epochs; ++i)
{
#pragma omp for
for(int j = 0; j < min_net; ++j)
{
std::pair<CVectorSSE,CVectorSSE>& sample = data_set[j];
nets[j]->Approximate(sample.first,net_outputs[j]);
out_gradients[j].SetDifference(net_outputs[j],sample.second);
nets[j]->BackPropagateGradient(out_gradients[j],net_gradients[j]);
}
}
timer.Stop();
epochs = 100
我有 AMD athlon X2 5000+
当我在没有 omp 指令的情况下启动此代码时,时间是相同的...... 当我运行两个程序时查看任务管理器/性能(有/没有 omp) 在这两种情况下都使用了 2 个核心...所以看起来 VS (VS 2008) 以某种方式优化了像 omp 这样的代码???
并行循环内的代码使用SSE指令...... 我想知道也许在多核进程中只有一个 SSE 单元,但这会很愚蠢...... 所以也许有人可以告诉我我做错了什么? 我知道这取决于循环内的代码,但如果内部的代码是并行的,那么它必须加速...
好吧,我肯定做错了什么 - 看看这段代码:
time_t start;
time_t stop;
start = time(NULL);
#pragma omp for
for(int i = 0; i < 10; ++i)
{
Sleep(1000);
}
stop = time(NULL);
cout<<difftime(stop,start)<<endl;
没有 omp 它应该休眠 10 秒(10* 1000毫秒) 使用 omp 它应该休眠少于 10 秒,因为 2 个线程可以同时休眠,对吗? 但它又睡了 10 秒 - 这怎么可能?
Here is a part of my code which runs parallel:
timer.Start();
for(int i = 0; i < params.epochs; ++i)
{
#pragma omp for
for(int j = 0; j < min_net; ++j)
{
std::pair<CVectorSSE,CVectorSSE>& sample = data_set[j];
nets[j]->Approximate(sample.first,net_outputs[j]);
out_gradients[j].SetDifference(net_outputs[j],sample.second);
nets[j]->BackPropagateGradient(out_gradients[j],net_gradients[j]);
}
}
timer.Stop();
epochs = 100
I have AMD athlon X2 5000+
When I launch this code without omp directive the time is same...
And when I look on task manager / performance when runing both programs (with/without omp)
in both cases 2 cores are used... So it seems that VS (VS 2008) somehow optimizes code like omp???
The code inside parallel loop uses SSE instructions...
I was wondering that maybe in multicore procs there is only one SSE unit but it would be stupid...
So maybe some1 can tell me what i am doing wrong?
I know that it depends on my code inside the loop but if this code inside is parallel then it MUST speed up...
Okay I am definitly doing something wrong - look at this code:
time_t start;
time_t stop;
start = time(NULL);
#pragma omp for
for(int i = 0; i < 10; ++i)
{
Sleep(1000);
}
stop = time(NULL);
cout<<difftime(stop,start)<<endl;
without omp it should sleep for 10 secs (10*1000ms)
with omp it should sleep less than 10 secs because 2 threads can sleep in one time right?
BUT it sleeps again 10 secs - how it is possible?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我在 Linux 上使用 gcc 尝试了第二个示例。我的程序在 Core i3 上运行了 3 秒。我猜您遇到的问题是您没有正确配置 OpenMP。 GCC 需要一个选项 -fopenmp 来启用 OpenMP。 VS 可能需要类似的配置。
I tried the second example on Linux with gcc. My program runs for 3 secs on Core i3. I guess the problem you are having is that you have not configured OpenMP correctly. GCC need an option -fopenmp to enable OpenMP. Similar configuration may be necessary for VS.