如何优化SYCL内核

发布于 2025-02-12 00:42:30 字数 1153 浏览 1 评论 0原文

我正在大学学习SYCL,我对代码的性能有疑问。 特别是我有此c/c ++代码:

”

我需要在具有并行化的SYCL内核中翻译它,我这样做:

#include <sycl/sycl.hpp>
#include <vector>
#include <iostream>
using namespace sycl;
constexpr int size = 131072; // 2^17
int main(int argc, char** argv) {
  // Create a vector with size elements and initialize them to 1
  std::vector<float> dA(size); 
  try {
    queue gpuQueue{ gpu_selector{} };
    buffer<float, 1> bufA(dA.data(), range<1>(dA.size()));
    gpuQueue.submit([&](handler& cgh) {
                    accessor inA{ bufA,cgh };
                    cgh.parallel_for(range<1>(size),
                                     [=](id<1> i) { inA[i] = inA[i] + 2; }
                    );
    });
    gpuQueue.wait_and_throw();
  }
  catch (std::exception& e) { throw e; }
}

所以我的问题是关于c值,在此中情况我直接使用了两个值,但这会在我运行代码时会影响性能吗?我需要创建一个变量,或者以这种方式是正确的,并且性能良好?

I'm studying SYCL at university and I have a question about performance of a code.
In particular I have this C/C++ code:

c code

And I need to translate it in a SYCL kernel with parallelization and I do this:

#include <sycl/sycl.hpp>
#include <vector>
#include <iostream>
using namespace sycl;
constexpr int size = 131072; // 2^17
int main(int argc, char** argv) {
  // Create a vector with size elements and initialize them to 1
  std::vector<float> dA(size); 
  try {
    queue gpuQueue{ gpu_selector{} };
    buffer<float, 1> bufA(dA.data(), range<1>(dA.size()));
    gpuQueue.submit([&](handler& cgh) {
                    accessor inA{ bufA,cgh };
                    cgh.parallel_for(range<1>(size),
                                     [=](id<1> i) { inA[i] = inA[i] + 2; }
                    );
    });
    gpuQueue.wait_and_throw();
  }
  catch (std::exception& e) { throw e; }
}

So my question is about c value, in this case I use directly the value two but this will impact on the performance when I'll run the code? I need to create a variable or in this way is correct and the performance are good?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

扛刀软妹 2025-02-19 00:42:30

有趣的问题。在这种情况下,值2将是SYCL内核中指令中的字面意义 - 我认为这是尽可能高的效率!您有一个轻微的并发症,您将隐式铸件从intfloat。我的猜测是,您可能会在设备组装中以float文字 2.0 。您的SYCL设备不必在运行时或类似的东西中从内存或铸造中获取该2,它只是存在于说明中。

同样,如果您有:

constexpr int c = 2;
// the rest of your code
[=](id<1> i) { inA[i] = inA[i] + c; }
// etc

编译器几乎可以肯定足够聪明,可以将c的常数值传播到内核代码中。因此,同样,2.0字面的说明最终出现在说明中。

我用DPC ++编制了您的示例,并提取了LLVM IR,并找到了以下行:

  %5 = load float, float addrspace(4)* %arrayidx.ascast.i.i, align 4, !tbaa !17
  %add.i = fadd float %5, 2.000000e+00
  store float %add.i, float addrspace(4)* %arrayidx.ascast.i.i, align 4, !tbaa !17

这显示了float Load&amp;在同一地址存储/从同一地址存储,并在两者之间使用“添加2.0”指令。如果我修改以使用我所展示的变量c,则获得相同的LLVM IR。

结论:您已经达到了最大的效率,并且编译器很聪明!

Interesting question. In this case the value 2 will be a literal in the instruction in your SYCL kernel - this is as efficient as it gets, I think! There's the slight complication that you have an implicit cast from int to float. My guess is that you'll probably end up with a float literal 2.0 in your device assembly. Your SYCL device won't have to fetch that 2 from memory or cast at runtime or anything like that, it just lives in the instruction.

Equally, if you had:

constexpr int c = 2;
// the rest of your code
[=](id<1> i) { inA[i] = inA[i] + c; }
// etc

The compiler is almost certainly smart enough to propagate the constant value of c into the kernel code. So, again, the 2.0 literal ends up in the instruction.

I compiled your example with DPC++ and extracted the LLVM IR, and found the following lines:

  %5 = load float, float addrspace(4)* %arrayidx.ascast.i.i, align 4, !tbaa !17
  %add.i = fadd float %5, 2.000000e+00
  store float %add.i, float addrspace(4)* %arrayidx.ascast.i.i, align 4, !tbaa !17

This shows a float load & store to/from the same address, with an 'add 2.0' instruction in between. If I modify to use the variable c like I demonstrated, I get the same LLVM IR.

Conclusion: you've already achieved maximum efficiency, and compilers are smart!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文