什么算作失败？

发布于 2024-09-16 09:02:30 字数 206 浏览 11 评论 0原文

假设我有一个 C 程序，伪式是：

For i=0 to 10
    x++
    a=2+x*5
next

FLOP 数是 (1 [x++] + 1 [x*5] + 1 [2+(x+5))] * 10[loop], for 30失败？我很难理解什么是失败。

请注意，[...] 指示我从何处获取“操作”计数。

原文

Say I have a C program that in pseudoish is:

For i=0 to 10
    x++
    a=2+x*5
next

Is the number of FLOPs for this (1 [x++] + 1 [x*5] + 1 [2+(x+5))] * 10[loop], for 30 FLOPS? I am having trouble understanding what a flop is.

Note the [...] are indicating where I am getting my counts for "operations" from.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

谁把谁当真 2024-09-23 09:02:30

出于 FLOPS 测量的目的，通常仅包括加法和乘法。除法、倒数、平方根和超越函数之类的东西太昂贵而无法包含在单个操作中，而加载和存储之类的东西又太微不足道了。

换句话说，循环体包含 2 个加法和 1 个乘法，因此（假设 x 是浮点）每个循环迭代是 3 次操作；如果你运行循环 10 次，你就完成了 30 次操作。

请注意，测量 MIPS 时，您的循环将超过 3 条指令，因为它还包括 FLOPS 测量不计算在内的加载和存储。

回复收藏 0 原文

烟凡古楼 2024-09-23 09:02:30

FLOPS 代表每秒的浮动操作数。如果您正在处理整数，那么您的代码中没有任何浮点运算。

回复收藏 0 原文

梦断已成空 2024-09-23 09:02:30

海报已经明确表示 FLOPS（详细信息此处）与浮点有关（而不是到整数）每秒操作，因此您不仅要计算正在执行的操作数量，还要计算在什么时间段内执行的操作。

如果“x”和“a”是浮点数，则您正在尝试计算代码中的操作数，但您必须检查目标代码以确保实际使用了多少浮点指令。例如，如果随后不使用“a”，则优化编译器可能不会费心去计算它。

此外，某些浮点运算（例如加法）可能比其他浮点运算（例如乘法）快得多，因此在同一台机器上，仅包含浮点加法的循环可以比仅包含浮点乘法的循环以更高的 FLOPS 运行。

回复收藏 0 原文

财迷小姐 2024-09-23 09:02:30

FLOP（根据 Martinho Fernandes 的评论，小写 s 表示 FLOP 的复数）指的是机器语言浮点指令，因此这取决于您的代码编译为多少条指令。

首先，如果所有这些变量都是整数，那么这段代码中就不会出现 FLOP。但是，我们假设您的语言将所有这些常量和变量识别为单精度浮点变量（使用单精度可以更轻松地加载常量）。

该代码可以编译为（在 MIPS 上）：

Assignment of variables: x is in $f1, a is in $f2, i is in $f3.
All other floating point registers are compiler-generated temporaries.
$f4 stores the loop exit condition of 10.0
$f5 stores the floating point constant 1.0
$f6 stores the floating point constant 2.0
$t1 is an integer register used for loading constants
    into the floating point coprocessor.

     lui $t1, *upper half of 0.0*
     ori $t1, $t1,  *lower half of 0.0*
     lwc1 $f3, $t1
     lui $t1, *upper half of 10.0*
     ori $t1, $t1,  *lower half of 10.0*
     lwc1 $f4, $t1
     lui $t1, *upper half of 1.0*
     ori $t1, $t1,  *lower half of 1.0*
     lwc1 $f5, $t1
     lui $t1, *upper half of 2.0*
     ori $t1, $t1,  *lower half of 2.0*
     lwc1 $f6, $t1
st:  c.gt.s $f3, $f4
     bc1t end
     add.s $f1, $f1, $f5
     lui $t1, *upper half of 5.0*
     ori $t1, $t1,  *lower half of 5.0*         
     lwc1 $f2, $t1
     mul.s $f2, $f2, $f1
     add.s $f2, $f2, $f6
     add.s $f3, $f3, $f5
     j st
end: # first statement after the loop

因此根据 Gabe 的定义，循环内有 4 个 FLOP（3x add.s 和 1x mul.s）。如果您还计算循环比较c.gt.s，则有 5 次 FLOP。将该值乘以 10，得出程序总共使用 40（或 50）次 FLOP。

更好的优化编译器可能会认识到 a 的值不在循环内使用，因此它只需要计算 a 的最终值。它可以生成如下所示的代码

     lui $t1, *upper half of 0.0*
     ori $t1, $t1,  *lower half of 0.0*
     lwc1 $f3, $t1
     lui $t1, *upper half of 10.0*
     ori $t1, $t1,  *lower half of 10.0*
     lwc1 $f4, $t1
     lui $t1, *upper half of 1.0*
     ori $t1, $t1,  *lower half of 1.0*
     lwc1 $f5, $t1
     lui $t1, *upper half of 2.0*
     ori $t1, $t1,  *lower half of 2.0*
     lwc1 $f6, $t1
st:  c.gt.s $f3, $f4
     bc1t end
     add.s $f1, $f1, $f5
     add.s $f3, $f3, $f5
     j st
end: lui $t1, *upper half of 5.0*
     ori $t1, $t1,  *lower half of 5.0*         
     lwc1 $f2, $t1
     mul.s $f2, $f2, $f1
     add.s $f2, $f2, $f6

：在这种情况下，循环内有 2 次加法和 1 次比较（乘以 10 即可获得 20 或 30 次 FLOP），再加上循环外 1 次乘法和 1 次加法。因此，您的程序现在需要 22 或 32 次 FLOP，具体取决于我们是否计算比较。

FLOPs (the lowercase s indicates the plural of FLOP, per Martinho Fernandes comment) are referring to machine language floating point instructions, so it depends how many instructions your code compiles down to.

First off, if all of these variables are integers, then there are no FLOPs in this code. Let's assume, however, that your language recognizes all of these constants and variables as single-precision floating point variables (using single-precision makes loading the constants easier).

This code could compile to (on MIPS):

Assignment of variables: x is in $f1, a is in $f2, i is in $f3.
All other floating point registers are compiler-generated temporaries.
$f4 stores the loop exit condition of 10.0
$f5 stores the floating point constant 1.0
$f6 stores the floating point constant 2.0
$t1 is an integer register used for loading constants
    into the floating point coprocessor.

     lui $t1, *upper half of 0.0*
     ori $t1, $t1,  *lower half of 0.0*
     lwc1 $f3, $t1
     lui $t1, *upper half of 10.0*
     ori $t1, $t1,  *lower half of 10.0*
     lwc1 $f4, $t1
     lui $t1, *upper half of 1.0*
     ori $t1, $t1,  *lower half of 1.0*
     lwc1 $f5, $t1
     lui $t1, *upper half of 2.0*
     ori $t1, $t1,  *lower half of 2.0*
     lwc1 $f6, $t1
st:  c.gt.s $f3, $f4
     bc1t end
     add.s $f1, $f1, $f5
     lui $t1, *upper half of 5.0*
     ori $t1, $t1,  *lower half of 5.0*         
     lwc1 $f2, $t1
     mul.s $f2, $f2, $f1
     add.s $f2, $f2, $f6
     add.s $f3, $f3, $f5
     j st
end: # first statement after the loop

So according to Gabe's definition, there are 4 FLOPs inside the loop (3x add.s and 1x mul.s). There are 5 FLOPs if you also count the loop comparision c.gt.s. Multiply this by 10 for a total of 40 (or 50) FLOPs used by the program.

A better optimizing compiler might recognize that the value of a isn't used inside the loop, so it only needs to compute the final value of a. It could generate code that looks like

     lui $t1, *upper half of 0.0*
     ori $t1, $t1,  *lower half of 0.0*
     lwc1 $f3, $t1
     lui $t1, *upper half of 10.0*
     ori $t1, $t1,  *lower half of 10.0*
     lwc1 $f4, $t1
     lui $t1, *upper half of 1.0*
     ori $t1, $t1,  *lower half of 1.0*
     lwc1 $f5, $t1
     lui $t1, *upper half of 2.0*
     ori $t1, $t1,  *lower half of 2.0*
     lwc1 $f6, $t1
st:  c.gt.s $f3, $f4
     bc1t end
     add.s $f1, $f1, $f5
     add.s $f3, $f3, $f5
     j st
end: lui $t1, *upper half of 5.0*
     ori $t1, $t1,  *lower half of 5.0*         
     lwc1 $f2, $t1
     mul.s $f2, $f2, $f1
     add.s $f2, $f2, $f6

In this case, you have 2 adds and 1 comparision inside the loop (mutiplied by 10 gives you 20 or 30 FLOPs), plus 1 multiplication and 1 addition outside the loop. Thus, your program now takes 22 or 32 FLOPs depending whether we count comparisions.

回复收藏 0 原文