高性能计算术语:GF/s 是什么?
我正在阅读这篇Dobb 博士关于 CUDA 的文章
在我的系统中,全局内存带宽略高于 60 GB/s。 这非常好,直到您认为该带宽必须服务 128 个硬件线程——每个线程都可以提供大量数据 浮点运算。由于 32 位浮点值 占用四 (4) 个字节,全局内存带宽有限的应用 在此硬件上只能提供大约 15 GF/s ——或者 仅占可用性能的一小部分。
问题:GF/s 意味着每秒千兆次失败?
I'm reading this Dr Dobb's Article on CUDA
In my system, the global memory bandwidth is slightly over 60 GB/s.
This is excellent until you consider that this bandwidth must service
128 hardware threads -- each of which can deliver a large number of
floating-point operations. Since a 32-bit floating-point value
occupies four (4) bytes, global memory bandwidth limited applications
on this hardware will only be able to deliver around 15 GF/s -- or
only a small percentage of the available performance capability.
Question: GF/s means Giga flops per second??
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
每秒千兆次失败就是这样!
Giga flops per second would be it!
GF/s 或 GFLOPS 是 GigaFlops 或每秒 10^9 浮点运算。 (GF/s 是 GigaFLOP/S = GigaFLOPS 的缩写,请参见 此处“千兆浮点运算 (GF/s) = 10^9 浮点运算”或此处“每秒千兆次浮点运算 (GF/s)”)。
我很清楚 GF/s 不是 GFLOPS/s(不是加速度)。
您应该记住,CPU 和 GPU 上的浮点操作通常以不同的方式进行计数。对于大多数CPU来说,通常会计算64位浮点格式运算。对于 GPU - 32 位,因为 GPU 在 32 位浮点上有更多的性能。
哪些类型的操作被计算在内?加法、减法和乘法是。加载和存储数据不计算在内。但是加载和存储数据对于从内存获取数据/向内存获取数据是必要的,有时它会限制实际应用中实现的 FLOPS(您引用的文章提到了这种情况,“内存带宽有限的应用程序”,当 CPU/GPU 可以提供大量 FLOPS 时)但内存无法如此快地读取所需数据)
某些芯片或计算机的 FLOPS 是如何计算的?有两种不同的指标,一种是该芯片 FLOPS 的理论上限。它是通过将核心数量、芯片频率和每个 CPU 时钟周期的浮点运算相乘来计算的(Core2 为 4,Sandy Bridge CPU 为 8)。
其他指标类似于现实世界的失败次数,通过运行 LINPACK 基准测试(求解巨大的线性方程组)来计算。该基准测试大量使用矩阵-矩阵乘法,并且是现实世界失败的近似值。超级计算机 Top500 是通过 LINPACK 基准的并行版本 HPL 来衡量的。对于单个 CPU,linpack 的理论失败率高达 90-95%,对于大型集群,该范围在 50-85% 之间。
GF/s or GFLOPS is GigaFlops or 10^9 FLoating Operations Per Second. (GF/s is bit unusual abbreviation of GigaFLOP/S = GigaFLOPS, see e.g. here "Gigaflops (GF/s) = 10^9 flops" or here "gigaflops per second (GF/s)").
And it is clear for me that GF/s is not GFLOPS/s (not an acceleration).
You should remember that floating operation on CPU and on GPU usually counted in different way. For most CPU, 64-bit floating point format operations are counted usually. And for GPU - 32 bit, because GPU have much more performance in 32bit floating point.
What types of operations are counted? Addition, subtraction and multiplication are. Loading and storing data are not counted. But loading and storing data is necessary to get data from/to memory and sometimes it will limit FLOPS achieved in real application (the article you cited says about this case, "memory bandwidth limited application", when CPU/GPU can deliver lot of FLOPS but memory can't read needed data so fast)
How FLOPS are counted for some chip or computer? There are two different metrics, one is for theoretical upper limit of FLOPS for this chip. It is counted by multipliing cores number, frequency of chip and floating point operations per CPU tick (it was 4 for Core2 and is 8 for Sandy Bridge CPUs).
Other metric is something like real-world flops, which are counted by running LINPACK benchmark (solving a huge linear system of equations). This benchmark uses matrix-matrix multiplication a lot and is kind of approximation of real-world flops. Top500 of supercomupters are measured by parallel version of LINPACK banchmark, the HPL. For single CPU, linpack can have up to 90-95% of theoretical flops, and for huge clusters it is in 50-85% range.
本例中的 GF 是 GigaFLOPS,但 FLOPS 是“每秒浮点运算次数”。我相当确定作者并不是说 F/s 是“每秒浮点运算”,所以 GF/s 实际上是一个错误。 (除非你谈论的是一台在运行时提高性能的计算机,我猜)作者可能指的是 GFLOPS。
GF in this case is GigaFLOPS, but FLOPS is "floating point operations per second". I'm fairly certain that the author does not mean F/s to be "floating point operations per second per second", so GF/s is actually an error. (Unless you are talking about a computer that increases performance at runtime, I guess) The author probably means GFLOPS.