ARM 上的快速浮点到整数转换和浮点精度 (iPhone 3GS/4)
我读到(http://www.stereopsis.com/FPU.html)中提到的( 转换浮点数最快的方法是什么为 x86 上的 int)。有谁知道缓慢的简单转换(参见下面的代码片段)是否也适用于 ARM 架构?
inline int Convert(float x)
{
int i = (int) x;
return i;
}
要应用 FPU 文章中提到的一些技巧,您必须设置浮点运算的精度。我如何在 ARM 上做到这一点?
ARM 架构上最快的浮点到整数转换是什么?
谢谢!
I read (http://www.stereopsis.com/FPU.html) mentioned in (What is the fastest way to convert float to int on x86). Does anyone know if the slow simple cast (see snippet below) does apply to ARM architecture, too?
inline int Convert(float x)
{
int i = (int) x;
return i;
}
To apply some tricks mentioned in the FPU article you have to set the precision for floating point operations. How do I do that on ARM?
What is the fastest float to int conversion on ARM architecture?
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
简短的版本,“不”。
那篇文章很古老,甚至不适用于现代 x86 系统,更不用说 ARM 了。尽管将数据从 VFP/NEON 寄存器移动到通用寄存器时存在一定的停顿,但在 ARMv7 (iPhone 3GS/4) 上,简单转换为整数相当快。但是,考虑到您的浮点数据可能来自 VFP/NEON 寄存器中完成的计算,无论您如何进行转换,您都必须为该移动付费。
我不认为这是一条有利可图的优化途径,除非有迹象表明这是程序的主要瓶颈。即使如此,最快的转化也是您不进行的转化;找到算法方法来消除程序中的转换几乎总是会更好。
如果您确实需要优化转换,请查看 vcvt.i32.f32 指令,该指令将两个或四个浮点数组成的向量转换为两个或四个浮点数组成的向量四个整数,而无需将数据移出 NEON 寄存器(因此,不会引起我提到的停顿)。当然,您需要在 NEON 单元上进行后续整数计算,才能实现有利可图的优化。
问题:您真正想要做什么?为什么您认为需要更快的 float->int 转换?
Short version, "no".
That article is ancient and doesn't even apply to modern x86 systems, let alone ARM. A simple cast to integer is reasonably fast on ARMv7 (iPhone 3GS/4), though there is a modest stall moving data from the VFP/NEON registers to the general purpose registers. However, given that your
float
data is probably coming from a computation done in VFP/NEON registers, you will have to pay for that move no matter how you do the conversion.I don't think that this is a profitable path for optimization unless you have traces showing that this is a major bottleneck for your program. Even then, the fastest conversion is the conversion you don't do; you will almost always be better off finding algorithmic ways to eliminate conversions from your program.
If you do genuinely need to optimize conversions, look into the
vcvt.i32.f32
instruction, which converts a vector of two or four floating point numbers to a vector of two or four integers without moving the data out of the NEON registers (and therefore, without incurring the stall that I mentioned). Of course, you will need to do your subsequent integer computations on the NEON unit for this to be a profitable optimization.Question: What are you really trying to do? Why do you think you need a faster float->int conversion?