对 2D 数组使用纹理缓存/Image2D 的缺点?
当访问全局内存中的 2D 数组时,使用纹理缓存有很多好处,例如过滤并且不必太关心内存访问模式。 CUDA 编程指南仅指出了一个缺点:
但是,在同一内核调用中,纹理缓存与全局内存写入并不保持一致,因此对同一内核调用中通过全局写入写入的地址的任何纹理提取都会返回未定义的数据.
如果我不需要这样做,因为我从不写入我读取的内存,那么使用纹理缓存(或 Image2D,因为我在 OpenCL 中工作)而不是普通的全局缓存时是否有任何缺点/陷阱/问题记忆?在某些情况下,使用纹理缓存会降低性能吗?
When accessing 2D arrays in global memory, using the Texture Cache has many benefits, like filtering and not having to care as much for memory access patterns. The CUDA Programming Guide is only naming one downside:
However, within the same kernel call, the texture cache is not kept coherent with respect to global memory writes, so that any texture fetch to an address that has been written to via a global write in the same kernel call returns undefined data.
If I don't have a need for that, because I never write to the memory I read from, are there any downsides/pitfalls/problems when using the Texture Cache (or Image2D, as I am working in OpenCL) instead of plain global memory? Are there any cases where I will lose performance by using the Texture Cache?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
纹理可以比“裸”全局内存访问更快、相同或更慢。使用纹理预测性能没有通用的经验法则,因为加速(或缺乏加速)是由代码中的数据使用模式和所使用的纹理硬件决定的。
在最坏的情况下,缓存命中率非常低,使用纹理比正常内存访问慢。每个线程必须首先发生缓存未命中,然后触发全局内存获取。由此产生的总延迟将高于直接从内存读取。我几乎总是为我正在开发的任何严肃代码编写两个版本,其中纹理可能有用(一种有,一种没有),然后对它们进行基准测试。通常可以开发启发式方法来根据输入选择要使用的版本。 CUBLAS 广泛使用这种策略。
Textures can be faster, the same speed, or slower than "naked" global memory access. There are no general rules of thumb for predicting performance using textures, as the speed up (or lack of speed up) is determined by data usage patterns within your code and the texture hardware being used.
In the worst case, where cache hit rates are very low, using textures is slower that normal memory access. Each thread has to firstly have a cache miss, then trigger a global memory fetch. The resulting total latency will be higher than a direct read from memory. I almost always write two versions of any serious code I am developing where textures might be useful (one with and one without), and then benchmark them. Often it is possible to develop heuristics to select which version to use based on inputs. CUBLAS uses this strategy extensively.