openMP 和 SSE，我的程序没有加速

发布于 2024-11-07 04:51:47 字数 1212 浏览 2 评论 0原文

这是我的并行运行代码的一部分：

timer.Start();
        for(int i = 0; i < params.epochs; ++i)
        {
            #pragma omp for
            for(int j = 0; j < min_net; ++j)
            {
                std::pair<CVectorSSE,CVectorSSE>& sample = data_set[j];
                nets[j]->Approximate(sample.first,net_outputs[j]);
                out_gradients[j].SetDifference(net_outputs[j],sample.second);
                nets[j]->BackPropagateGradient(out_gradients[j],net_gradients[j]);
            }

        }
        timer.Stop();

epochs = 100
我有 AMD athlon X2 5000+
当我在没有 omp 指令的情况下启动此代码时，时间是相同的...... 当我运行两个程序时查看任务管理器/性能（有/没有 omp）在这两种情况下都使用了 2 个核心...所以看起来 VS (VS 2008) 以某种方式优化了像 omp 这样的代码???
并行循环内的代码使用SSE指令...... 我想知道也许在多核进程中只有一个 SSE 单元，但这会很愚蠢...... 所以也许有人可以告诉我我做错了什么？我知道这取决于循环内的代码，但如果内部的代码是并行的，那么它必须加速...

好吧，我肯定做错了什么 - 看看这段代码：

time_t start;
time_t stop;

start = time(NULL);
#pragma omp for
for(int i = 0; i < 10; ++i)
{
    Sleep(1000);
}
stop = time(NULL);

cout<<difftime(stop,start)<<endl;

没有 omp 它应该休眠 10 秒（10* 1000毫秒）使用 omp 它应该休眠少于 10 秒，因为 2 个线程可以同时休眠，对吗？但它又睡了 10 秒 - 这怎么可能？

原文

Here is a part of my code which runs parallel:

timer.Start();
        for(int i = 0; i < params.epochs; ++i)
        {
            #pragma omp for
            for(int j = 0; j < min_net; ++j)
            {
                std::pair<CVectorSSE,CVectorSSE>& sample = data_set[j];
                nets[j]->Approximate(sample.first,net_outputs[j]);
                out_gradients[j].SetDifference(net_outputs[j],sample.second);
                nets[j]->BackPropagateGradient(out_gradients[j],net_gradients[j]);
            }

        }
        timer.Stop();

epochs = 100
I have AMD athlon X2 5000+
When I launch this code without omp directive the time is same...
And when I look on task manager / performance when runing both programs (with/without omp)
in both cases 2 cores are used... So it seems that VS (VS 2008) somehow optimizes code like omp???
The code inside parallel loop uses SSE instructions...
I was wondering that maybe in multicore procs there is only one SSE unit but it would be stupid...
So maybe some1 can tell me what i am doing wrong?
I know that it depends on my code inside the loop but if this code inside is parallel then it MUST speed up...

Okay I am definitly doing something wrong - look at this code:

time_t start;
time_t stop;

start = time(NULL);
#pragma omp for
for(int i = 0; i < 10; ++i)
{
    Sleep(1000);
}
stop = time(NULL);

cout<<difftime(stop,start)<<endl;

without omp it should sleep for 10 secs (10*1000ms)
with omp it should sleep less than 10 secs because 2 threads can sleep in one time right?
BUT it sleeps again 10 secs - how it is possible?

分享到QQ

分享到微博