CUDA 中的高效图像金字塔?
在 CUDA 中进行图像金字塔最有效的方法是什么?我已经编写了自己的内核来做到这一点,但想象我们可以做得更好。
使用 OpenGL 互操作和硬件 mipmap 绑定到 OpenGL 纹理可能会快得多。有关如何执行此操作或其他操作的任何指示
What's the most efficient way to do image pyramiding in CUDA? I have written my own kernels to do so but imagine we can do better.
Binding to an OpenGL texture using OpenGL interop and using the hardware mipmapping would probably be much faster. Any pointers on how to do this or other
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
MipMap 在 OpenGL/DirectX 中访问/初始化时设置。如果您分配比初始纹理宽(或高)50% 的纹理,并使用该内核对纹理进行下采样并将结果写入原始纹理旁边,则 CUDA 内核可以执行相同的操作。当每个线程评估下一个下采样图像中的像素时,内核可能会工作得最好。由您决定采样方案并选择适当的权重来组合像素。首先尝试双线性,然后一旦它工作,您可以设置三线性(三次)或其他采样方案,如各向异性等。简单采样(线性和三次)可能会更有效,因为会发生合并内存访问(请参阅 CUDA SDK 编程)指导)。您可能需要平铺内核执行,因为并行调用的线程数有限(像素太多,线程太少=使用平铺来分块并行执行)。您可能会发现 Mesa3D 作为参考很有用(它是一个开源项目) OpenGL 的实现)。
MipMaps are setup when accessed/initialized in OpenGL/DirectX. A CUDA kernel can do the same thing if you allocate a texture 50% wider (or higher) than the initial texture and use the kernel to down-sample the texture and write the result beside the original texture. The kernel will probably work best where each thread evaluates a pixel in the next down-sampled image. It's up to you to determine the sampling-scheme and choose appropriate weights for combining the pixels. Try bilinear to start with, then once it's working you can setup trilinear (cubic) or other sampling schemes like anisotropic etc. Simple sampling (linear and cubic) will likely be more efficient since coalesced memory access will occur (refer to the CUDA SDK programming guide). You will probably need to tile the kernel execution since the thread-count is limited for parallel invokation (too many pixels, too few threads = use tiling to chunk parallel execution).You might find Mesa3D useful as a reference (it's an open-source implementation of OpenGL).