POWF的性能更高(10,log10f(x))是否有更高的实现
我需要将浮子截断至最接近10的功率。例如,1.1将截断至1.0和4.7e3将截断至1E3。我目前正在使用看似复杂的powf(10,floorf(log10f(x)))
。我想知道是否有更好的性能(如更快的执行速度)解决方案吗?我的目标CPU架构既是X86-64和ARM64。
#include <stdio.h>
#include <math.h>
int main()
{
float x = 1.1e5f;
while (x > 1e-6f)
{
float y = powf(10,floorf(log10f(x)));
printf("%e ==> %g\n", x, y);
x /= 5.0f;
}
}
运行时,这会产生
1.100000e+05 ==> 100000
2.200000e+04 ==> 10000
4.400000e+03 ==> 1000
8.800000e+02 ==> 100
1.760000e+02 ==> 100
3.520000e+01 ==> 10
7.040000e+00 ==> 1
1.408000e+00 ==> 1
2.816000e-01 ==> 0.1
5.632000e-02 ==> 0.01
1.126400e-02 ==> 0.01
2.252800e-03 ==> 0.001
4.505600e-04 ==> 0.0001
9.011199e-05 ==> 1e-05
1.802240e-05 ==> 1e-05
3.604480e-06 ==> 1e-06
I have a need to truncate a float to the nearest power of 10. For example, 1.1 would truncate to 1.0 and 4.7e3 would truncate to 1e3. I am currently doing it with the seemingly complicated powf(10,floorf(log10f(x)))
. I am wondering whether there is a better performing (as in faster execution speed) solution? My target CPU architecture is both x86-64 and arm64.
#include <stdio.h>
#include <math.h>
int main()
{
float x = 1.1e5f;
while (x > 1e-6f)
{
float y = powf(10,floorf(log10f(x)));
printf("%e ==> %g\n", x, y);
x /= 5.0f;
}
}
when run, this produces
1.100000e+05 ==> 100000
2.200000e+04 ==> 10000
4.400000e+03 ==> 1000
8.800000e+02 ==> 100
1.760000e+02 ==> 100
3.520000e+01 ==> 10
7.040000e+00 ==> 1
1.408000e+00 ==> 1
2.816000e-01 ==> 0.1
5.632000e-02 ==> 0.01
1.126400e-02 ==> 0.01
2.252800e-03 ==> 0.001
4.505600e-04 ==> 0.0001
9.011199e-05 ==> 1e-05
1.802240e-05 ==> 1e-05
3.604480e-06 ==> 1e-06
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
可以使用查找表来加快计算。该技术应适用于所有正常浮点数。如果没有一些专用逻辑,则亚正态分数,而NAN将无法使用,而无穷大可以通过表格中的极端值来处理。
尽管我希望这种技术实际上要比原始实施更快,但仍需要测量。
该代码使用C ++ 20
std :: bit_cast
从float
值提取指数。如果没有,则存在其他较旧技术,例如frexpf
。It is possible to use a lookup table to speed up the computation. This technique should work for all normal floating point numbers. Subnormal numbers and NaN won't work without some dedicated logic, 0 and infinity can be handled by extreme values in the table.
Although I expect this technique to be actually faster than original implementation, measurements are needed.
The code uses C++20
std::bit_cast
to extract the exponent from thefloat
value. If not available, other older techniques likefrexpf
exist.我会说不要流汗。除非该程序花费大量时间进行这种截断,否则不值得优化可能超级快速的内容。但是,如果您想针对常见情况(1E-2&lt; = x&lt; = 10)进行优化,那么您可以尝试使用32位整数算术与1E-2、1e-1、1的二进制表示形式进行比较,和10(例如,1E-1为0x3DCCCCCD);如果超出该范围,则可以落在浮点版本上。只有实验才能确定这是否实际运行速度是否更快。
I would say don't sweat it. Unless the program is spending a large proportion of its time doing this truncation, it's not worth optimising what is probably super-fast anyway. But if you wanted to optimise for your common cases (1e-2 <= x <= 10), then you might try using 32-bit integer arithmetic to compare with the binary representations of 1e-2, 1e-1, 1, and 10 (for instance, 1e-1 is 0x3dcccccd) ; if it's outside that range, you can fall back on the floating point version. Only experimentation will determine if this actually runs faster.