cuda中的float与int

发布于 2024-09-14 19:21:20 字数 111 浏览 4 评论 0原文

在 CUDA 中使用 float 代替 int 更好吗?

浮动是否会减少银行冲突并确保合并? (或者与此无关?)

Is it better to use a float instead of an int in CUDA?

Does a float decrease bank conflicts and insure coalescence? (or has it nothing to do with this?)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

七七 2024-09-21 19:21:21

读取共享内存时的 Bank 冲突与读取的数据量有关。因此,由于 intfloat 大小相同(至少我认为它们在所有 CUDA 平台上都是如此),因此没有区别。

合并通常指的是全局内存访问——同样,这与读取的字节数有关,而不是与数据类型有关。

Bank conflicts when reading shared memory are all about the amount of data read. So, since int and float are the same size (at least I think they are on all CUDA platforms), there's no difference.

Coalescence usually refers to global memory accesses - and again, this is to do with the number of bytes read, not the datatype.

于我来说 2024-09-21 19:21:21

intfloat 都是四个字节,因此在合并您的数据时使用它们没有任何区别(如果您以相同的方式访问它们)。全局内存访问或共享内存访问上的存储体冲突。

话虽如此,使用 float 可能会获得更好的性能,因为设备旨在尽可能快地处理它们,int 通常用于控制和索引等。因此性能较低。当然,它实际上比这更复杂 - 如果除了浮点数什么都没有,那么整数硬件将闲置,这将是一种浪费。

Both int and float are four bytes, so it doesn't make any difference (if you're accessing them both the same way) which you use in terms of coalescing your global memory accesses or bank conflicts on shared memory accesses.

Having said that, you may have better performance with floats since the devices are designed to crunch them as fast as possible, ints are often used for control and indexes etc. and hence have lower performance. Of course it's really more complicated than that - if you had nothing but floats then the integer hardware would sit idle which would be a waste.

邮友 2024-09-21 19:21:21

存储体冲突和合并都与内存访问模式有关(扭曲内的线程是否以统一的步幅读/写到不同的位置)。因此,这些问题与数据类型(float、int、double 等)无关。

请注意,数据类型确实对计算性能有影响。单精度浮点比双精度等更快。GPU 中强大的 FPU 通常意味着进行定点计算是不必要的,甚至可能是有害的。

Bank conflicts and coalescence are all about memory access patterns (whether the threads within a warp all read/write to different locations with uniform stride). Thus, these concerns are independent of data type (float, int, double, etc.)

Note that data type does have an impact on the computation performance. Single precision float is faster than double precision etc. The beefy FPUs in the GPUs generally means that doing calculations in fixed point is unnecessary and may even be detrimental.

末が日狂欢 2024-09-21 19:21:21

查看 CUDA 开发人员指南的“数学函数”部分。使用设备运行时函数(内部函数)可以为各种类型提供更好的性能。您可以在更少的时钟周期内在一个操作中执行多个操作。

对于C.1节的一些功能,设备运行时组件中存在一个不太准确但更快的版本;它有相同的名字
以 __ 为前缀(例如 __sinf(x))..编译器有一个选项
(-use_fast_math ) 强制表中的每个函数编译为其内在对应项...有选择地替换数学函数
仅在值得的情况下通过调用内部函数进行调用
性能提升以及属性更改(例如减少)
准确性和不同的特殊情况处理是可以容忍的。

  • 例如,不要使用 =>使用:x/y => __fdividef(x, y); sinf(x) => __sinf(x)

你可能会发现更多像 x+c*y 这样的方法是用一个函数执行的。

Take a look at the "Mathematical Functions" section of CUDA Developers Guide. Using device runtime functions (intrinsic functions) may provide better performance for various types. You may perform multiple operations in one operation within less clock cycles.

For some of the functions of SectionC.1,a less accurate, but faster version exists inthe device runtime component; it has the same name
prefixed with __ (such as __sinf(x)).. The compiler has an option
(-use_fast_math ) that forces each function in Table to compile to its intrinsic counterpart... selectively replace mathematical function
calls by calls to intrinsic functions only where it is merited by the
performance gains and where changed properties such as reduced
accuracy and different special case handling can be tolerated.

  • For example instead of using => use: x/y => __fdividef(x, y); sinf(x) => __sinf(x)

And you may find more methods like x+c*y being performed with one function..

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文