不同内存空间中的 OpenCL 结构声明
在 OpenCL 中,以下结构声明之间的后果和差异是什么。如果它们是非法的,为什么?
struct gr_array
{
int ndims;
__global m_integer* dim_size;
__global m_real* data;
};
typedef struct gr_array g_real_array;
struct lr_array
{
int ndims;
__local m_integer* dim_size;
__local m_real* data;
};
typedef struct lr_array l_real_array;
__ kernel temp(...){
__local g_real_array A;
g_real_array B;
__local l_real_array C;
l_real_array D;
}
我的问题是结构(和成员)将分配在哪里?谁可以访问它们?这是一个好的做法吗?
编辑
?
struct r_array
{
__local int ndims;
};
typedef struct r_array real_array;
__ kernel temp(...){
__local real_array A;
real_array B;
}
如果工作项修改了结构体 B 中的 ndims,那么该更改对工作组中的其他工作项是否可见
In OpenCL what will be the consequences and differences between the following struct declarations. And if they are illegal, why?
struct gr_array
{
int ndims;
__global m_integer* dim_size;
__global m_real* data;
};
typedef struct gr_array g_real_array;
struct lr_array
{
int ndims;
__local m_integer* dim_size;
__local m_real* data;
};
typedef struct lr_array l_real_array;
__ kernel temp(...){
__local g_real_array A;
g_real_array B;
__local l_real_array C;
l_real_array D;
}
My question is where will the structures be allocated (and the members)? who can access them? And is this a good practice or not?
EDIT
how about this
struct r_array
{
__local int ndims;
};
typedef struct r_array real_array;
__ kernel temp(...){
__local real_array A;
real_array B;
}
if a work-item modifies ndims in struct B, is the change visible to other work-items in the work-group?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我已经将您的代码重写为有效的 CL,或者至少是可以编译的 CL。这里:
一一地,这是如何分解的:
A 在本地空间中。它是一个由一个 int 和两个指针组成的结构体。这些指针指向全局空间中的数据,但它们本身分配在本地空间中。
B在私人空间;它是一个自动变量。它由一个 int 和两个指向全局内存中内容的指针组成。
C 位于本地空间。它包含一个 int 和两个指向本地空间中内容的指针。
D,此时您可能已经猜到了。它位于私有空间中,包含一个 int 和两个指向本地空间中内容的指针。
我不能说这两种方法是否更适合您的问题,因为您还没有描述您想要实现的目标。
编辑:我意识到我没有解决你问题的第二部分——谁可以访问结构字段。
那么,您可以访问变量在范围内的任何位置的字段。我猜测您认为在 g_float_array 中标记为全局的字段位于全局空间(l_float_array 的本地空间)中。但它们只是指向全球(或本地)空间中的东西。
所以,你可以像这样使用它们:
顺便说一句,如果你在运行 Lion 的 Mac 上破解 CL,你可以使用“离线”CL 编译器来编译 .cl 文件,这使得尝试这种东西变得很容易。容易一点。它位于此处:
有一些示例代码 这里。
I've rewritten your code as valid CL, or at least CL that will compile. Here:
One by one, here's how this breaks down:
A is in local space. It's a struct that is composed of one int and two pointers. These pointers point to data in global space, but are themselves allocated in local space.
B is in private space; it's an automatic variable. It is composed of an int and two pointers that point to stuff in global memory.
C is in local space. It contains an int and two pointers to stuff in local space.
D, you can probably guess at this point. It's in private space, and contains an int and two pointers that point to stuff in local space.
I cannot say if either is preferable for your problem, since you haven't described what your are trying to accomplish.
EDIT: I realized I didn't address the second part of your question -- who can access the structure fields.
Well, you can access the fields anywhere the variable is in scope. I'm guessing that you were thinking that the fields you had marked as global in g_float_array were in global space (an local space for l_float_array). But they're just pointing to stuff in global (or local) space.
So, you'd use them like this:
By the way -- if you're hacking CL on a Mac running Lion, you can compile .cl files using the "offline" CL compiler, which makes experimenting with this kind of stuff a bit easier. It's located here:
There is some sample code here.
它可能行不通,因为当前的 GPU 对于 OpenCL 内核和普通程序有不同的内存空间。你必须显式调用才能在两个空间之间传输数据,这通常是程序的瓶颈(因为PCI-X显卡的带宽相当低)。
It probably won't work, because the current GPU-s have different memory spaces for OpenCL kernels and for the ordinary program. You have to make explicit calls to transmit data between both spaces, and it is often the bottleneck of the program (because the bandwidth of PCI-X graphics card is quite low).