FLOPS is floating point operations per second. To measure FLOPS you first need code that performs such operations. If you have such code, what you can measure is its execution time. You also need to sum up or estimate (not measure!) all floating point operations and divide that over the measured wall time. You should count all ordinary operations like additions,subtractions,multiplications,divisions (yes, even though they are slower and better avoided, they are still FLOPs..). Be careful how you count! What you see in your source code is most likely not what the compiler produces after all the optimisations. To be sure you will likely have to look at the assembly..
FLOPS is not the same as Operations per second. So even though some architectures have a single MAD (multiply-and-add) instruction, those still count as two FLOPs. Similarly the SSE instructions. You count them as one instruction, though they perform more than one FLOP.
FLOPS are not entirely meaningless, but you need to be careful when comparing your FLOPS to sb. elses FLOPS, especially the hardware vendors. E.g. NVIDIA gives the peak FLOPS performance for their cards assuming MAD operations. So unless your code has those, you will not ever get this performance. Either rethink the algorithm, or modify the peak hardware FLOPS by a correct factor, which you need to figure out for your own algorithm! E.g., if your code only performs multiplication, you would divide it by 2. Counting right might get your code from suboptimal to quite efficient without changing a single line of code..
You can use the CPU performance counters to get the CPU to itself count the number of floating point operations it uses for your particular program. Then it is the simple matter of dividing this by the run time. On Linux the perf tools allow this to be done very easily, I have a writeup on the details of this on my blog here:
FLOP 没有明确定义。 mul FLOPS 与 add FLOPS 不同。您必须提出自己的定义,或者从众所周知的基准中获取定义。
FLOP's are not well defined. mul FLOPS are different than add FLOPS. You have to either come up with your own definition or take the definition from a well-known benchmark.
Usually you use some well-known benchmark. Things like MIPS and megaFLOPS don't mean much to start with, and if you don't restrict them to specific benchmarks, even that tiny bit of meaning is lost.
Typically, for example, integer speed will be quoted in "drystone MIPS" and floating point in "Linpack megaFLOPS". In these, "drystone" and "Linpack" are the names of the benchmarks used to do the measurements.
IOPS are I/O operations. They're much the same, though in this case, there's not quite as much agreement about which benchmark(s) to use (though SPC-1 seems fairly popular).
This is a highly architecture specific question, for a naive/basic/start start I would recommend to find out how many Operations 1 multiplication take's on your specific hardware then do a large matrix multiplication , and see how long it takes. Then you can eaisly estimate the FLOP of your particular hardware
the industry standard of measuring flops is the well known Linpack or HPL high performance linpack, try looking at the source or running those your self
I would also refer to this answer as an excellent reference
发布评论
评论(5)
FLOPS 是每秒浮点运算次数。要测量 FLOPS,您首先需要执行此类操作的代码。如果你有这样的代码,你可以测量的是它的执行时间。您还需要总结或估计(而不是测量!)所有浮点运算,并将其除以测量的实际时间。您应该计算所有普通操作,例如加法、减法、乘法、除法(是的,尽管它们速度较慢且最好避免,但它们仍然是 FLOPs..)。小心你的计数方式!您在源代码中看到的很可能不是编译器在所有优化后生成的结果。为了确保您可能需要查看程序集。FLOPS
与每秒操作数不同。因此,即使某些架构具有单个 MAD(乘加)指令,它们仍然算作两个 FLOP。上交所指令也类似。尽管它们执行不止一次 FLOP,但您将它们视为一条指令。
FLOPS 并非完全没有意义,但是在将 FLOPS 与某人进行比较时需要小心。其他人的FLOPS,尤其是硬件供应商。例如,NVIDIA 在假设 MAD 操作的情况下为其卡提供了峰值 FLOPS 性能。因此,除非您的代码具有这些功能,否则您将永远无法获得这种性能。要么重新考虑算法,要么用正确的因子修改峰值硬件 FLOPS,您需要为自己的算法计算出这个因子!例如,如果您的代码只执行乘法,您可以将其除以 2。正确计数可能会使您的代码从次优变为非常高效,而无需更改一行代码。
FLOPS is floating point operations per second. To measure FLOPS you first need code that performs such operations. If you have such code, what you can measure is its execution time. You also need to sum up or estimate (not measure!) all floating point operations and divide that over the measured wall time. You should count all ordinary operations like additions,subtractions,multiplications,divisions (yes, even though they are slower and better avoided, they are still FLOPs..). Be careful how you count! What you see in your source code is most likely not what the compiler produces after all the optimisations. To be sure you will likely have to look at the assembly..
FLOPS is not the same as Operations per second. So even though some architectures have a single MAD (multiply-and-add) instruction, those still count as two FLOPs. Similarly the SSE instructions. You count them as one instruction, though they perform more than one FLOP.
FLOPS are not entirely meaningless, but you need to be careful when comparing your FLOPS to sb. elses FLOPS, especially the hardware vendors. E.g. NVIDIA gives the peak FLOPS performance for their cards assuming MAD operations. So unless your code has those, you will not ever get this performance. Either rethink the algorithm, or modify the peak hardware FLOPS by a correct factor, which you need to figure out for your own algorithm! E.g., if your code only performs multiplication, you would divide it by 2. Counting right might get your code from suboptimal to quite efficient without changing a single line of code..
您可以使用 CPU 性能计数器让 CPU 自行计算其用于特定程序的浮点运算数量。然后就是将其除以运行时间的简单问题。在 Linux 上,性能工具可以非常轻松地完成此操作,我在我的博客上对此进行了详细介绍:
http://www.bnikolic.co.uk/blog/hpc-howto-measure-flops.html
You can use the CPU performance counters to get the CPU to itself count the number of floating point operations it uses for your particular program. Then it is the simple matter of dividing this by the run time. On Linux the perf tools allow this to be done very easily, I have a writeup on the details of this on my blog here:
http://www.bnikolic.co.uk/blog/hpc-howto-measure-flops.html
FLOP 没有明确定义。 mul FLOPS 与 add FLOPS 不同。您必须提出自己的定义,或者从众所周知的基准中获取定义。
FLOP's are not well defined. mul FLOPS are different than add FLOPS. You have to either come up with your own definition or take the definition from a well-known benchmark.
通常您会使用一些众所周知的基准。像 MIPS 和 megaFLOPS 这样的东西一开始并没有多大意义,如果你不将它们限制在特定的基准上,即使是那一点点意义也会消失。
通常,例如,整数速度将在“drystone MIPS”中引用,而浮点数将在“Linpack megaFLOPS”中引用。其中,“drystone”和“Linpack”是用于进行测量的基准的名称。
IOPS 是 I/O 操作。它们非常相似,但在本例中,对于使用哪个基准还没有达成一致意见(尽管 SPC-1 似乎相当流行)。
Usually you use some well-known benchmark. Things like MIPS and megaFLOPS don't mean much to start with, and if you don't restrict them to specific benchmarks, even that tiny bit of meaning is lost.
Typically, for example, integer speed will be quoted in "drystone MIPS" and floating point in "Linpack megaFLOPS". In these, "drystone" and "Linpack" are the names of the benchmarks used to do the measurements.
IOPS are I/O operations. They're much the same, though in this case, there's not quite as much agreement about which benchmark(s) to use (though SPC-1 seems fairly popular).
这是一个高度特定于体系结构的问题,对于天真的/基本/入门开始,我建议找出您的特定硬件上需要多少个操作 1 乘法,然后进行大型矩阵乘法,并查看需要多长时间。然后您可以轻松估计特定硬件的 FLOP。
测量 flop 的行业标准是众所周知的 Linpack 或 HPL 高性能 linpack,尝试查看源代码或自行运行它们
我还将这个答案称为优秀的参考
This is a highly architecture specific question, for a naive/basic/start start I would recommend to find out how many Operations 1 multiplication take's on your specific hardware then do a large matrix multiplication , and see how long it takes. Then you can eaisly estimate the FLOP of your particular hardware
the industry standard of measuring flops is the well known Linpack or HPL high performance linpack, try looking at the source or running those your self
I would also refer to this answer as an excellent reference