GPU 上的排序列表交集
我知道如何使用 O(n+m) 算法在 CPU 上交叉两个排序列表,其中 n 和 m 是两个列表的长度。然而,是否有一种好的算法可以在 GPU 上交叉两个列表来避免写入冲突。我担心在相交时,两个线程可能会尝试写入同一输出缓冲区,从而导致冲突。我不是在寻找图书馆。我想知道基本思想+一些代码(如果可能的话)
i know how to intersect two sorted lists on the CPU using an O(n+m) algorithm where n and m are the length of the 2 lists. However, is there a good algorithm for intersecting two lists on the GPU that avoids write conflicts. I am scared that while intersecting, two threads may try to write to the same output buffer resulting in conflict. I am not looking for a library. I want to know the basic idea + some code if possible
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我知道您可能不希望您的代码与库绑定。不过,我认为 Thrust 有一种算法可以完全满足您的需求,前提是您在传统的 C 数组中处理列表。
在这里查看
thrust::merge
http:// /wiki.thrust.googlecode.com/hg/html/group__merging.html-- 编辑 --
对您的问题进行更多思考后,直接在 GPU 上相交两个列表似乎非常有用写起来很复杂CUDA。
以下代码片段是一个替代解决方案(我之前的解决方案和@jmsu 的合并)。这是如何使两个以降序存储的整数列表相交的示例。这些列表存储在设备内存中,但计算不能在内核中执行。因此,如果可能的话,您需要在内核调用之间使用它:
I understand that you might not want your code to be tied to a library. However, I think Thrust has an algorithm that does exactly what you want, provided that you handle your list in a traditional C-array.
Have a look at
thrust::merge
here http://wiki.thrust.googlecode.com/hg/html/group__merging.html-- edit --
Having thought a little more on your question, intersecting two lists directly on the GPU seems very complex to write with CUDA.
The following code snippet is an alternative solution (a merge of my previous solution and of @jmsu's). This is an example how to intersect two integers lists stored in a decreasing order. The lists are stored in the device memory, but the computation cannot be performed within a kernel. Thus, you need to use it between kernel calls, if it is possible:
做这样的事情怎么样:
如果数组 B 中存在值 A[i],则将其存储在 C[i] 中,否则 C[i]:=DUMMY。
然后进行并行数组压缩?
有一些工具可以做到这一点 - 例如检查此处 - 一个库以及一篇描述所用算法的论文。
How about doing something like this:
If value A[i] exists in the array B, store it in C[i], otherwise C[i]:=DUMMY.
And then perform parallel array compaction?
There are tools to do exactly that - check here for example - a library and a paper describing the algorithm used.
遵循 Thrust 的想法,正如 @jHackTheRipper 所说,你不需要整个 Thrust,你可以只使用你需要的。
要使用 Thrust 执行此操作,您可以使用 Thrust::set_intersection
http://wiki.thrust.googlecode.com/hg/html/group_set_operations.html#ga17277fec1491c8a916c9908a5ae40807
Thrust 文档中的示例代码:
执行它在GPU上,您可以将数组复制到设备内存并使用thrust::device_ptr,或者更好的方法是使用推力::设备向量。它们与 STL 向量兼容。
我还没有检查过代码,但应该与此接近。 Thrust 网站提供了很好的示例来帮助您入门。
Following the Thrust idea, as @jHackTheRipper said you don't need the whole of Thrust, you can use only what you need.
To do it with Thrust you can use the thrust::set_intersection
http://wiki.thrust.googlecode.com/hg/html/group_set_operations.html#ga17277fec1491c8a916c9908a5ae40807
Example code from the Thrust documentation:
To execute it on the GPU you can copy the arrays to device memory and use thrust::device_ptr or the better way is to use thrust::device_vector. These are compatible with STL vectors.
I haven't checked the code but it should be something close to this. The thrust website has great examples to get you started.