GLSL 中的快速排序?
我正在考虑使用 GLSL 着色器将大量处理移植到 GPU。 我偶然发现的直接问题之一是,在其中一个步骤中,算法需要维护一个元素列表,对它们进行排序并取出几个最大的元素(哪个数字取决于数据)。 在 CPU 上,这只需使用 STL 矢量和 qsort() 即可完成,但在 GLSL 中我没有这样的设施。 有没有办法弥补这个不足呢?
I'm considering porting a large chunk of processing to the GPU using a GLSL shader. One of the immediate problems I stumbled across is that in one of the steps, the algorithm needs to maintain a list of elements, sort them and take the few largest ones (which number is dependent on the data). On the CPU this is simply done using an STL vector and qsort() but in GLSL I don't have such facilities. Is there a way to deal with this deficiency?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
披露:我真的不知道 GLSL——我一直在使用 AMD Stream SDK 进行 GPGPU 编程,它有不同的编程语言。
从您对 Bjorn 的回答的评论来看,我认为您对使用 GPU 对大型数据库进行排序不感兴趣,例如创建反向电话簿或其他什么,但相反,您有一个小数据集,并且每个片段都有自己的数据集进行排序。 更像是尝试进行中值像素过滤?
我只能笼统地说:
对于小数据集,排序算法确实不重要。 虽然人们一生都在担心对于非常大的数据库来说哪种排序算法是最好的,但对于小 N 来说,是否使用快速排序、堆排序、基数排序、希尔排序、优化冒泡排序、未优化冒泡排序实际上并不重要,等等。至少对CPU来说没有多大关系。
GPU 是 SIMD 设备,因此它们喜欢让每个内核以锁步执行相同的操作。 计算很便宜,但分支很昂贵,并且每个内核以不同方式分支的数据依赖分支非常非常非常昂贵。
因此,如果每个内核都有自己的小数据集要排序,并且要排序的数据数量取决于数据,并且每个内核的数字可能不同,那么您最好选择最大大小(如果可以的话),填充具有无穷大或某个大数字的数组,并且让每个内核执行完全相同的排序,这将是未优化的无分支冒泡排序,如下所示:
伪代码(因为我不知道 GLSL),排序为 9 分
Disclosure: I really don't know GLSL -- I've been doing GPGPU programming with the AMD Stream SDK, which has different programming language.
From you comment on Bjorn's answer, I gather that you are not interested in using the GPU to sort a huge database -- like creating a reverse phone book or whatever, but instead, you have a small dataset and each fragment has it's own dataset to sort. More like trying to do median pixel filtering?
I can only say in general:
For small datasets, the sort algorithm really doesn't matter. While people have spent careers worrying about which is the best sort algorithm for very large databases, for small N it really doesn't matter whether you use Quick sort, Heap Sort, Radix Sort, Shell Sort, Optimized Bubble Sort, Unoptimized Bubble sort, etc. At least it doesn't matter much on a CPU.
GPUs are SIMD devices, so they like to have each kernel executing the same operations in lock step. Calculations are cheap but branches are expensive and data-dependent branches where each kernel branchs a different way is very, very, very, expensive.
So if each kernel has it's own small dataset to sort, and the # of data to sort is data dependent and it could be a different number for each kernel, you're probably better off picking a maximum size (if you can), padding the arrays with Infinity or some large number, and having each kernel perform the exact same sort, which would be an unoptimized branchless bubble sort, something like this:
Pseudocode (since I don't know GLSL), sort of 9 points
你看过这篇文章吗?
https://developer.nvidia.com/gpugems/GPUGems2/gpugems2_chapter46.html
我不确定您是否在寻找快速排序算法或快速排序算法。 文章中的算法使用归并排序...
Have you seen this article?
https://developer.nvidia.com/gpugems/GPUGems2/gpugems2_chapter46.html
I wasn't sure you were looking for a Quicksort algorithm or a quick sorting algorithm. The algorithm in the article uses merge sort...
我对 GPU 编程没有任何了解。
我会使用堆排序而不是快速排序,因为你说你只需要查看前几个值。 堆可以在
O(n)
时间内构建,但获取顶部值需要log(n)
。 因此,如果您需要的值的数量明显小于元素总数,您可以获得一些性能。I haven't got any knowledge about GPU programming.
I'd use heapsort rather than quicksort, because you said you only need to look at the top few values. The heap can be constructed in
O(n)
time, but getting the top value islog(n)
. Therefore if you the number of values you need is significantly smaller than the total number of elements you could gain some performance.