使用 IEEE 浮点优化 - 保证数学恒等式?
我在 IEEE 浮点规则方面遇到了一些麻烦,这些规则阻止了看起来很明显的编译器优化。例如,
char foo(float x) {
if (x == x)
return 1;
else
return 0;
}
无法优化为仅返回 1,因为 NaN == NaN 为 false。好吧,好吧,我想。
但是,我想这样写,以便优化器实际上可以为我修复问题。是否存在适用于所有浮点数的数学恒等式?例如,如果 !(x - x) 意味着编译器可以假设它始终保持不变(尽管情况也并非如此),我愿意编写 !(x - x)。
我在网上看到了一些对此类身份的引用,例如 此处,但我还没有找到任何有组织的信息,包括 IEEE 754 标准的光扫描。
如果我能让优化器假设 isnormal(x) 而不生成额外的代码(在 gcc 或 clang 中),那也很好。
显然,我实际上并不打算在源代码中编写 (x == x),但我有一个专为内联而设计的函数。该函数可以声明为 foo(float x, float y),但通常 x 是 0,或者 y 是 0,或者 x 和 y 都是 z,等等。浮点数表示屏幕上的几何坐标。在这些情况下,如果我在不使用函数的情况下手动编码,我永远不会区分 0 和 (x - x),我只会手动优化一些愚蠢的东西。所以,我真的不关心编译器在内联我的函数后做什么的 IEEE 规则,我只想让编译器忽略它们。舍入差异也不是很重要,因为我们基本上是在屏幕上进行绘图。
我不认为 -ffast-math 对我来说是一个选项,因为该函数出现在头文件中,并且使用该函数的 .c 文件使用 -ffast-math 进行编译是不合适的。
I am having some trouble with IEEE floating point rules preventing compiler optimizations that seem obvious. For example,
char foo(float x) {
if (x == x)
return 1;
else
return 0;
}
cannot be optimized to just return 1 because NaN == NaN is false. Okay, fine, I guess.
However, I want to write such that the optimizer can actually fix stuff up for me. Are there mathematical identities that hold for all floats? For example, I would be willing to write !(x - x) if it meant the compiler could assume that it held all the time (though that also isn't the case).
I see some reference to such identities on the web, for example here, but I haven't found any organized information, including in a light scan of the IEEE 754 standard.
It'd also be fine if I could get the optimizer to assume isnormal(x) without generating additional code (in gcc or clang).
Clearly I'm not actually going to write (x == x) in my source code, but I have a function that's designed for inlining. The function may be declared as foo(float x, float y), but often x is 0, or y is 0, or x and y are both z, etc. The floats represent onscreen geometric coordinates. These are all cases where if I were coding by hand without use of the function I'd never distinguish between 0 and (x - x), I'd just hand-optimize stupid stuff away. So, I really don't care about the IEEE rules in what the compiler does after inlining my function, and I'd just as soon have the compiler ignore them. Rounding differences are also not very important since we're basically doing onscreen drawing.
I don't think -ffast-math is an option for me, because the function appears in a header file, and it is not appropriate that the .c files that use the function compile with -ffast-math.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
另一个可能对您有用的参考是 Yossarian King 撰写的《游戏编程宝石》第 2 卷中关于浮点优化的非常好的文章。您可以阅读文章此处。它非常详细地讨论了 IEEE 格式,考虑了实现和架构,并提供了许多优化技巧。
Another reference that might be of some use for you is a really nice article on floating-point optimization in Game Programming Gems volume 2, by Yossarian King. You can read the article here. It discusses the IEEE format in quite detail, taking into account implementations and architecture, and provides many optimization tricks.
我认为你总是会努力让计算机浮点数算术表现得像数学实数算术,并且建议你不要出于任何原因这样做。我建议您在尝试比较 2 个 fp 数字的相等性时犯了类型错误。由于绝大多数 fp 数都是近似值,因此您应该接受这一点并使用近似相等作为测试。
计算机整数用于数值的相等性测试。
好吧,这就是我的想法,如果你愿意的话,你可以继续与机器作战(好吧,实际上是所有机器)。
现在,回答您问题的某些部分:
-- 对于您从实数算术中熟悉的每个数学恒等式,浮点数领域都有反例,无论是 IEEE 还是其他;
——“聪明”的编程几乎总是比简单的编程让编译器更难优化代码;
-- 看起来你正在做一些图形编程:最后,你的概念空间中的点的坐标将被映射到屏幕上的像素;像素始终具有整数坐标;您从概念空间到屏幕空间的转换定义了您的近似相等函数
问候
马克
I think that you are always going to struggle to make computer floating-point-number arithmetic behave like mathematical real-number arithmetic, and suggest that you don't for any reason. I suggest that you are making a type error trying to compare the equality of 2 fp numbers. Since fp numbers are, in the overwhelming majority, approximations, you should accept this and use approximate-equality as your test.
Computer integers exist for equality testing of numerical values.
Well, that's what I think, you go ahead and fight the machine (well, all the machines actually) if you wish.
Now, to answer some parts of your question:
-- for every mathematical identity you are familiar with from real-number arithmetic, there are counter examples in the domain of floating-point numbers, whether IEEE or otherwise;
-- 'clever' programming almost always makes it more difficult for a compiler to optimise code than straightforward programming;
-- it seems that you are doing some graphics programming: in the end the coordinates of points in your conceptual space are going to be mapped to pixels on a screen; pixels always have integer coordinates; your translation from conceptual space to screen space defines your approximate-equality function
Regards
Mark
如果您可以假设此模块中使用的浮点数不是 Inf/NaN,则可以使用
-ffinite-math-only
(在 GCC 中)编译它。这可能会“改进”代码生成,例如您发布的示例。If you can assume that floating-point numbers used in this module will not be Inf/NaN, you can compile it with
-ffinite-math-only
(in GCC). This may "improve" the codegen for examples like the one you posted.您可以比较按位相等。尽管您可能会因为某些等效但按位不同的值而受到影响,但它会捕获您提到的真正相等的所有情况。我不确定编译器会识别你所做的事情并在内联时删除它(我相信这就是你所追求的),但这可以很容易地检查。
You could compare for bitwise equality. Although you might get bitten for some values that are equivalent but bitwise different, it will catch all those cases where you have a true equality as you mentioned. And I am not sure the compiler will recognize what you do and remove it when inlining (which I believe is what you are after), but that can easily be checked.
当您以显而易见的方式尝试并对其进行分析时,发生了什么?或检查生成的汇编?
如果函数内联了调用站点已知的值,则优化器可以使用此信息。例如:
foo(0, y)
。您可能会对您不必做的工作感到惊讶,但至少分析或查看编译器对代码实际执行的操作将为您提供更多信息并帮助您找出位置继续下一步。
也就是说,如果您知道优化器本身无法弄清楚的某些事情,您可以编写该函数的多个版本,并指定您想要调用的版本。这有点麻烦,但至少对于内联函数来说,它们将在一个标头中一起指定。它也比下一步要容易得多,下一步是使用内联汇编来完成您想要的事情。
What happened when you tried it the obvious way and profiled it? or examined the generated asm?
If the function is inlined with values known at the call site, the optimizer has this information available. For example:
foo(0, y)
.You may be surprised at the work you don't have to do, but at the very least profiling or looking at what the compiler actually does with the code will give you more information and help you figure out where to proceed next.
That said, if you know certain things that the optimizer can't figure out itself, you can write multiple versions of the function, and specify the one you want to call. This is something of a hassle, but at least with inline functions they will all be specified together in one header. It's also quite a bit easier than the next step, which is using inline asm to do exactly what you want.