与 PyOpenCL 的结构对齐
更新:我的内核中的 int4
是错误的。
我正在使用 pyopencl,但无法使结构对齐正常工作。在下面的代码中,调用内核两次,b
值正确返回(如 1),但 c
值具有一些“随机”值。
换句话说:我正在尝试读取结构的两个成员。我可以读第一个,但不能读第二个。为什么?
无论我使用 numpy 结构化数组还是使用结构打包,都会出现同样的问题。注释中的 _-attribute__
设置也没有帮助。
我怀疑我在代码的其他地方做了一些愚蠢的事情,但看不到它。任何帮助表示赞赏。
import struct as s
import pyopencl as cl
import numpy as n
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
for use_struct in (True, False):
if use_struct:
a = s.pack('=ii',1,2)
print(a, len(a))
a_dev = cl.Buffer(ctx, cl.mem_flags.WRITE_ONLY, len(a))
else:
# a = n.array([(1,2)], dtype=n.dtype('2i4', align=True))
a = n.array([(1,2)], dtype=n.dtype('2i4'))
print(a, a.itemsize, a.nbytes)
a_dev = cl.Buffer(ctx, cl.mem_flags.WRITE_ONLY, a.nbytes)
b = n.array([0], dtype='i4')
print(b, b.itemsize, b.nbytes)
b_dev = cl.Buffer(ctx, cl.mem_flags.READ_ONLY, b.nbytes)
c = n.array([0], dtype='i4')
print(c, c.itemsize, c.nbytes)
c_dev = cl.Buffer(ctx, cl.mem_flags.READ_ONLY, c.nbytes)
prg = cl.Program(ctx, """
typedef struct s {
int4 f0;
int4 f1 __attribute__ ((packed));
// int4 f1 __attribute__ ((aligned (4)));
// int4 f1;
} s;
__kernel void test(__global const s *a, __global int4 *b, __global int4 *c) {
*b = a->f0;
*c = a->f1;
}
""").build()
cl.enqueue_copy(queue, a_dev, a)
event = prg.test(queue, (1,), None, a_dev, b_dev, c_dev)
event.wait()
cl.enqueue_copy(queue, b, b_dev)
print(b)
cl.enqueue_copy(queue, c, c_dev)
print(c)
输出(我必须在剪切+粘贴时重新格式化,因此可能会稍微弄乱换行符;我还添加了注释来指示各种打印值是什么):
# first using struct
/home/andrew/projects/personal/kultrung/env/bin/python3.2 /home/andrew/projects/personal/kultrung/src/kultrung/test6.py
b'\x01\x00\x00\x00\x02\x00\x00\x00' 8 # the struct packed values
[0] 4 4 # output buffer 1
[0] 4 4 # output buffer 2
/home/andrew/projects/personal/kultrung/env/lib/python3.2/site-packages/pyopencl/cache.py:343: UserWarning: Build succeeded, but resulted in non-empty logs: Build on <pyopencl.Device 'Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz' at 0x1385a20> succeeded, but said:
Build started Kernel <test> was successfully vectorized Done. warn("Build succeeded, but resulted in non-empty logs:\n"+message)
[1] # the first value (correct)
[240] # the second value (wrong)
# next using numpy
[[1 2]] 4 8 # the numpy struct
[0] 4 4 # output buffer
[0] 4 4 # output buffer
/home/andrew/projects/personal/kultrung/env/lib/python3.2/site-packages/pyopencl/__init__.py:174: UserWarning: Build succeeded, but resulted in non-empty logs: Build on <pyopencl.Device 'Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz' at 0x1385a20> succeeded, but said:
Build started Kernel <test> was successfully vectorized Done. warn("Build succeeded, but resulted in non-empty logs:\n"+message)
[1] # first value (ok)
[67447488] # second value (wrong)
Process finished with exit code 0
update: the int4
in my kernel was wrong.
I am using pyopencl but am unable to get struct alignment to work correctly. In the code below, which calls the kernel twice, the b
value is returned correctly (as 1), but the c
value has some "random" value.
In other words: I am trying to read two members of a struct. I can read the first but not the second. Why?
The same issue occurs whether I use numpy structured arrays or pack with struct. And the _-attribute__
settings in the comments don't help either.
I suspect I am doing something stupid elsewhere in the code, but can't see it. Any help appreciated.
import struct as s
import pyopencl as cl
import numpy as n
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
for use_struct in (True, False):
if use_struct:
a = s.pack('=ii',1,2)
print(a, len(a))
a_dev = cl.Buffer(ctx, cl.mem_flags.WRITE_ONLY, len(a))
else:
# a = n.array([(1,2)], dtype=n.dtype('2i4', align=True))
a = n.array([(1,2)], dtype=n.dtype('2i4'))
print(a, a.itemsize, a.nbytes)
a_dev = cl.Buffer(ctx, cl.mem_flags.WRITE_ONLY, a.nbytes)
b = n.array([0], dtype='i4')
print(b, b.itemsize, b.nbytes)
b_dev = cl.Buffer(ctx, cl.mem_flags.READ_ONLY, b.nbytes)
c = n.array([0], dtype='i4')
print(c, c.itemsize, c.nbytes)
c_dev = cl.Buffer(ctx, cl.mem_flags.READ_ONLY, c.nbytes)
prg = cl.Program(ctx, """
typedef struct s {
int4 f0;
int4 f1 __attribute__ ((packed));
// int4 f1 __attribute__ ((aligned (4)));
// int4 f1;
} s;
__kernel void test(__global const s *a, __global int4 *b, __global int4 *c) {
*b = a->f0;
*c = a->f1;
}
""").build()
cl.enqueue_copy(queue, a_dev, a)
event = prg.test(queue, (1,), None, a_dev, b_dev, c_dev)
event.wait()
cl.enqueue_copy(queue, b, b_dev)
print(b)
cl.enqueue_copy(queue, c, c_dev)
print(c)
The output (I had to reformat while cut+pasting, so may have messed up line breaks slightly; I've also added comments indicating what the various print values are):
# first using struct
/home/andrew/projects/personal/kultrung/env/bin/python3.2 /home/andrew/projects/personal/kultrung/src/kultrung/test6.py
b'\x01\x00\x00\x00\x02\x00\x00\x00' 8 # the struct packed values
[0] 4 4 # output buffer 1
[0] 4 4 # output buffer 2
/home/andrew/projects/personal/kultrung/env/lib/python3.2/site-packages/pyopencl/cache.py:343: UserWarning: Build succeeded, but resulted in non-empty logs: Build on <pyopencl.Device 'Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz' at 0x1385a20> succeeded, but said:
Build started Kernel <test> was successfully vectorized Done. warn("Build succeeded, but resulted in non-empty logs:\n"+message)
[1] # the first value (correct)
[240] # the second value (wrong)
# next using numpy
[[1 2]] 4 8 # the numpy struct
[0] 4 4 # output buffer
[0] 4 4 # output buffer
/home/andrew/projects/personal/kultrung/env/lib/python3.2/site-packages/pyopencl/__init__.py:174: UserWarning: Build succeeded, but resulted in non-empty logs: Build on <pyopencl.Device 'Intel(R) Core(TM)2 CPU T5600 @ 1.83GHz' at 0x1385a20> succeeded, but said:
Build started Kernel <test> was successfully vectorized Done. warn("Build succeeded, but resulted in non-empty logs:\n"+message)
[1] # first value (ok)
[67447488] # second value (wrong)
Process finished with exit code 0
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在 OpenCL 程序中,尝试在结构体本身上使用 Packed 属性,而不是在其中一个成员上:
这可能是因为您只在结构体的单个成员上有
packed
属性,所以可能不会已经包装了整个结构。In the OpenCL program, try the packed attribute on the struct itself, instead of one of the members:
It might be that because you only had the
packed
attribute on a single member of the struct, it might not have been packing the entire structure.好吧,我不知道我从哪里得到
int4
- 我认为它一定是一个英特尔扩展。切换到 AMD 并使用int
作为内核类型可以按预期工作。一旦我已经清理干净了。ok, i don't know where i got
int4
from - i think it must be an intel extension. switching to AMD withint
as the kernel type works as expected. i'll post more at http://acooke.org/cute/Somesimple0.html once i have cleaned things up.