英特尔并行工作室 2011 - 并行求和
我有一个看起来像这样的串行代码:
sum = a;
sum += b;
sum += c;
sum += d;
我想将其并行化为类似的东西:
temp1 = a + b and in the same time temp2 = c + d
sum = temp1 + temp2
如何使用英特尔并行工作室工具来做到这一点?
谢谢!!!
I have a serial code that looks something like that:
sum = a;
sum += b;
sum += c;
sum += d;
I would like to parallelize it to something like that:
temp1 = a + b and in the same time temp2 = c + d
sum = temp1 + temp2
How do I do it using Intel parallel studio tools?
Thanks!!!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
假设所有变量都是整型或浮点类型,则并行化此代码绝对没有意义(从由不同线程/核心执行的意义上来说),因为开销将远远高于从中获得的任何好处。此示例中适用的并行性是在单个 CPU 上的多个计算单元和/或矢量化级别。如今,优化编译器已经足够复杂,可以自动利用这一点,而无需更改代码;但是,如果您愿意,可以显式使用临时变量,如问题的第二部分所示。
如果您只是出于好奇而询问:英特尔 Parallel Studio 提供了多种并行化代码的方法。例如,让我们将 Cilk 关键字与 C++11 lambda 函数一起使用:
不要指望从中获得性能(见上文),除非您使用具有计算量大的重载
operator+
的类。Assuming that all variables are of integral or floating point types, there is absolutely no sense to parallelize this code (in the sense of executing by different threads/cores), as the overhead will be much much higher than any benefit out of it. The applicable parallelism in this example is at the level of multiple computation units and/or vectorization on a single CPU. Optimizing compilers are sophisticated enough nowadays to exploit this automatically, without code changes; however if you wish you may explicitly use temporary variables, as in the second part of the question.
And if you ask just out of curiosity: Intel Parallel Studio provides several ways to parallelize code. For example, let's use Cilk keywords together with C++11 lambda functions:
Don't expect to get performance out of that (see above), unless you use classes with a computational-heavy overloaded
operator+
.