如何使用 C++ OpenCL 内核中的模板?

发布于 2024-10-08 15:57:37 字数 844 浏览 0 评论 0原文

我是 OpenCL 的新手。

我有一个使用模板的算法。它与 OpenMP 并行化配合得很好,但现在数据量已经增长,处理它的唯一方法是重写它以使用 OpenCL。 我可以轻松地使用 MPI 为集群构建它,但类似 ​​Tesla 的 GPU 比集群便宜得多:)

有没有办法在 OpenCL 内核中使用 C++ 模板?

是否可以通过 C++ 编译器或某些工具以某种方式扩展模板,然后使用更改后的内核函数?

编辑。解决方法的想法是以某种方式从模板中的 C++ 代码生成与 C99 兼容的代码。

我发现了以下关于 Comeau 的信息:

Comeau C++ 4.3.3 是一个完整且真正的编译器,可以执行完整的语法检查、完整的语义检查、完整的错误检查和所有其他编译器职责。输入的 C++ 代码被转换为内部编译器树和符号表,看起来与 C++ 或 C 完全不同。此外,它还会生成内部专有的中间形式。但 Comeau C++ 4.3.3 没有使用专有的后端代码生成器,而是生成 C 代码作为其输出。除了 C++ 的技术优势之外,Comeau C++ 4.3.3 等产品的 C 生成方面也被认为是 C++ 成功的原因,因为由于 C 编译器的通用性,它能够被引入大量平台。

C 编译器仅用于获得本机代码生成。这意味着 Comeau C++ 是为与每个各自平台上的特定 C 编译器一起使用而定制的。请注意,剪裁必须由科莫完成。否则,生成的 C 代码毫无意义,因为它绑定到特定平台(其中平台至少包括 CPU、操作系统和 C 编译器),而且生成的 C 代码不是独立的。因此,它不能单独使用(请注意,这既是使用 Comeau C++ 时的技术要求,也是法律要求),这就是为什么通常没有选项来查看生成的 C 代码:它几乎总是没有帮助的,并且编译过程,包括其生成,应被视为翻译的内部阶段。

I'm a novice in OpenCL.

I have an algorithm which uses templates. It worked well with OpenMP parallelization but now the amount of data has grown and the only way to process it is to rewrite it to use OpenCL.
I can easily use MPI to build it for cluster but Tesla-like GPU is much cheaper than cluster :)

Is there any way to use C++ templates in OpenCL kernel?

Is it possible to somehow expand templates by C++ compiler or some tool and after that use so changed kernel function?

EDIT. The idea of a workaround is to somehow generate C99-compatible code from C++ code from the template.

I found a following about Comeau:

Comeau C++ 4.3.3 is a full and true compiler that performs full syntax checking, full semantic checking, full error checking and all other compiler duties. Input C++ code is translated into internal compiler trees and symbol tables looking nothing like C++ or C. As well, it generates an internal proprietary intermediate form. But instead of using a proprietary back end code generator, Comeau C++ 4.3.3 generates C code as its output. Besides the technical advantages of C++, the C generating aspects of products like Comeau C++ 4.3.3 have been touted as a reason for C++'s success since it was able to be brought to a large number of platforms due to the common availability of C compilers.

The C compiler is used merely and only for the sake of obtaining native code generation. This means that Comeau C++ is tailored for use with specific C compilers on each respective platform. Please note that it is a requirement that tailoring must be done by Comeau. Otherwise, the generated C code is meaningless as it is tied to a specific platform (where platform includes at least the CPU, OS, and C compiler) and furthermore, the generated C code is not standalone. Therefore, it cannot be used by itself (note that this is both a technical and legal requirement when using Comeau C++), and this is why there is not normally an option to see the generated C code: it's almost always unhelpful and the compile process, including its generation, should be considered as internal phases of translation.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

半仙 2024-10-15 15:57:37

有一种用纯 C 语言模拟模板的老方法。
它基于多次包含单个文件(没有包含保护)。
由于 OpenCL 具有功能齐全的预处理器并允许包含文件,因此可以使用此技巧。

这是一个很好的解释:
http://arnold.uthar.net/index.php?n=Work.TemplatesC

仍然比 C++ 模板混乱得多:代码必须分成几个部分,并且必须显式实例化模板的每个实例。另外,您似乎无法做一些有用的事情,例如将阶乘实现为递归模板。

代码示例

让我们将这个想法应用到 OpenCL 中。假设我们想通过牛顿-拉夫森迭代计算平方根倒数(通常不是一个好主意)。然而,浮点类型和迭代次数可能会有所不同。

首先,我们需要一个辅助头(“templates.h”):

#ifndef TEMPLATES_H_
#define TEMPLATES_H_

#define CAT(X,Y,Z) X##_##Y##_##Z   //concatenate words
#define TEMPLATE(X,Y,Z) CAT(X,Y,Z)

#endif

然后,我们在“NewtonRaphsonRsqrt.cl”中编写模板函数:

#include "templates.h"

real TEMPLATE(NewtonRaphsonRsqrt, real, iters) (real x, real a) {
    int i;
    for (i = 0; i<iters; i++) {
        x *= ((real)1.5 - (0.5*a)*x*x);
    }
    return x;
}

在您的主.cl文件中,实例化该模板如下:

#define real float
#define iters 2
#include "NewtonRaphsonRsqrt.cl"  //defining NewtonRaphsonRsqrt_float_2

#define real double
#define iters 3
#include "NewtonRaphsonRsqrt.cl"  //defining NewtonRaphsonRsqrt_double_3

#define real double
#define iters 4
#include "NewtonRaphsonRsqrt.cl"  //defining NewtonRaphsonRsqrt_double_4

然后可以像这样使用它:

double prec = TEMPLATE(NewtonRaphsonRsqrt, double, 4) (1.5, 0.5);
float approx = TEMPLATE(NewtonRaphsonRsqrt, float, 2) (1.5, 0.5);

There is an old way to emulate templates in pure C language.
It is based on including a single file several times (without include guard).
Since OpenCL has fully functional preprocessor and allows including files, this trick can be used.

Here is a good explanation:
http://arnold.uthar.net/index.php?n=Work.TemplatesC

It is still much messier than C++ templates: the code has to be splitted into several parts, and you have to explicitly instantiate each instance of template. Also, it seems that you cannot do some useful things like implementing factorial as a recursive template.

Code example

Let's apply the idea to OpenCL. Suppose that we want to calculate inverse square root by Newton-Raphson iteration (generally not a good idea). However, the floating point type and the number of iterations may vary.

First of all, we need a helper header ("templates.h"):

#ifndef TEMPLATES_H_
#define TEMPLATES_H_

#define CAT(X,Y,Z) X##_##Y##_##Z   //concatenate words
#define TEMPLATE(X,Y,Z) CAT(X,Y,Z)

#endif

Then, we write template function in "NewtonRaphsonRsqrt.cl":

#include "templates.h"

real TEMPLATE(NewtonRaphsonRsqrt, real, iters) (real x, real a) {
    int i;
    for (i = 0; i<iters; i++) {
        x *= ((real)1.5 - (0.5*a)*x*x);
    }
    return x;
}

In your main .cl file, instantiate this template as follows:

#define real float
#define iters 2
#include "NewtonRaphsonRsqrt.cl"  //defining NewtonRaphsonRsqrt_float_2

#define real double
#define iters 3
#include "NewtonRaphsonRsqrt.cl"  //defining NewtonRaphsonRsqrt_double_3

#define real double
#define iters 4
#include "NewtonRaphsonRsqrt.cl"  //defining NewtonRaphsonRsqrt_double_4

And then can use it like this:

double prec = TEMPLATE(NewtonRaphsonRsqrt, double, 4) (1.5, 0.5);
float approx = TEMPLATE(NewtonRaphsonRsqrt, float, 2) (1.5, 0.5);
爱你不解释 2024-10-15 15:57:37

我编写了一个实验性的 C++ 到 OpenCL C 源代码转换工具。该工具将 C++ 源代码(甚至一些 STL)编译为 LLVM 字节码,并使用 LLVM“C”后端的修改版本将字节码反汇编为 OpenCL“C”。

请参阅 http://dimitri-christodoulou.blogspot .com/2013/12/writing-opencl-kernels-in-c.html

例如,这段使用 C++11 的 std::enable_if 的代码可以转换为 OpenCL 'C',然后在 GPU 上执行:

#include <type_traits>

template<class T>
T foo(T t, typename std::enable_if<std::is_integral<T>::value >::type* = 0)
{
    return 1;
}

template<class T>
T foo(T t, typename std::enable_if<std::is_floating_point<T>::value >::type* = 0)
{
    return 0;
}

extern "C" void _Kernel_enable_if_int_argument(int* arg0, int* out)
{
    out[0] = foo(arg0[0]);
}

I have written an experimental C++ to OpenCL C source transformation tool. The tool compiles C++ source (even some STL) into LLVM byte-code, and uses a modified version of the LLVM 'C' back-end to disassemble the byte-code into OpenCL 'C'.

Please see http://dimitri-christodoulou.blogspot.com/2013/12/writing-opencl-kernels-in-c.html

For example, this code using C++11's std::enable_if can be converted into OpenCL 'C' and then executed on the GPU:

#include <type_traits>

template<class T>
T foo(T t, typename std::enable_if<std::is_integral<T>::value >::type* = 0)
{
    return 1;
}

template<class T>
T foo(T t, typename std::enable_if<std::is_floating_point<T>::value >::type* = 0)
{
    return 0;
}

extern "C" void _Kernel_enable_if_int_argument(int* arg0, int* out)
{
    out[0] = foo(arg0[0]);
}
叹沉浮 2024-10-15 15:57:37

您可以查看 VexCL,它使用表达式模板生成 OpenCL 内核。您可以获得一些有关如何使 OpenCL 与模板良好配合的想法。

另一个正在积极开发的库是 Boost.Compute,它是 OpenCL 之上的一层,允许通用C++ 代码。

总体思路是将内核创建为或多或少的 C 字符串,并将其传递给 OpenCL 运行时进行编译和执行。

You can have a look at VexCL which uses expression templates to generate OpenCL kernels. You can get some ideas on how to make OpenCL to work nicely with templates.

Another library that is being actively worked on is Boost.Compute which is a layer on top of OpenCL to allow generic C++ code.

The general idea is to create the kernel as a C string more or less and pass it down to the OpenCL runtime for compilation and execution.

没有心的人 2024-10-15 15:57:37

如果您真的决心完成它,您可以重新选择您的 C++ 编译器来生成 NVidia PTX(并且 Clang 很可能很快就能以任何方式做到这一点)。但这样您就可以将代码绑定到 NVidia 硬件。

另一种方法是基于当前的 CBE 为 LLVM 实现自定义后端,这将生成纯 OpenCL 代码而不是 C。

If you're really determined to get it done, you could re-target your C++ compiler of a choice to generate NVidia PTX (and Clang is likely to be able to do it soon any way). But this way you'd bind your code to the NVidia hardware.

Another way is to implement a custom backend for LLVM, based on the current CBE, which will generate pure OpenCL code instead of C.

谁的年少不轻狂 2024-10-15 15:57:37

请注意,新的 SYCL Khronos 标准原生支持 OpenCL 中的 C++ 模板。

Note that the new SYCL Khronos standard has native support for C++ templates in OpenCL.

非要怀念 2024-10-15 15:57:37

PyOpenCL 现在使用 Mako 作为模板引擎。 http://www.makotemplates.org/

PyOpenCL is now using Mako as it's template engine. http://www.makotemplates.org/

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文