如何在 ARM 上进行整数(有符号或无符号)除法?
我主要致力于 Cortex-A8 和 Cortex-A9 的研究。我知道有些架构没有整数除法,但是除了转换为浮点数、除法、转换为整数之外,最好的方法是什么?或者这确实是最好的解决方案?
干杯! =)
I'm working on Cortex-A8 and Cortex-A9 in particular. I know that some architectures don't come with integer division, but what is the best way to do it other than convert to float, divide, convert to integer? Or is that indeed the best solution?
Cheers! = )
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
除以常数值可以通过执行 64 位乘法和右移来快速完成,例如,如下所示:
这里 R1 除以 1625。
计算如下: 64bitreg(R2:R3) = R1*0xA151C331,则结果是高 32bit 右移 10:
您可以根据以下公式计算您自己的常数:
选择最大的 n,其中 A < 2^32
Division by a constant value is done quickly by doing a 64bit-multiply and shift-right, for example, like this:
here R1 is divided by 1625.
The calculation is done like this: 64bitreg(R2:R3) = R1*0xA151C331, then the result is the upper 32bit right shifted by 10:
You can calculate your own constants from this formula:
select the largest n, for which A < 2^32
一些从其他地方复制过来的整数除法:
基本上,每位 3 条指令。来自这个网站,尽管我也在其他很多地方看到过它。
此网站还有一个不错的版本,一般来说可能会更快。
Some copy-pasta from elsewhere for an integer divide:
Basically, 3 instructions per bit. From this website, though I've seen it many other places as well.
This site also has a nice version which may be faster in general.
编译器通常在其库中包含一个分隔符,例如 gcclib 我已从 gcc 中提取它们并直接使用它们:
https: //github.com/dwelch67/stm32vld/ 然后 stm32f4d/adventure/gcclib
浮动并返回可能不是最好的解决方案。你可以尝试一下,看看它有多快......这是一个乘法,但也可以很容易地使它成为一个除法:
https://github.com/dwelch67/stm32vld/ 然后是 stm32f4d/float01/vectors.s
我没有计时来看看有多快/慢。明白了,我在上面使用的是 cortex-m,而你正在谈论 cortex-a,频谱的不同端,类似的浮点指令,gcc lib 的东西是相似的,对于 cortex-m,我必须为拇指构建,但你可以同样轻松地为 Arm 构建。实际上,对于 gcc 来说,它应该自动工作,你不需要像我那样做。其他编译器也不需要像我在上面的冒险游戏中那样做。
The compiler normally includes a divide in its library, gcclib for example I have extracted them from gcc and use them directly:
https://github.com/dwelch67/stm32vld/ then stm32f4d/adventure/gcclib
going to float and back is probably not the best solution. you can try it and see how fast it is...This is a multiply but could as easily make it a divide:
https://github.com/dwelch67/stm32vld/ then stm32f4d/float01/vectors.s
I didnt time it though to see how fast/slow. Understood I am using a cortex-m above and you are talking about a cortex-a, different ends of the spectrum, similar float instructions and the gcc lib stuff is similar, for the cortex-m I have to build for thumb but you can just as easily build for arm. Actually with gcc it should all just work automagically you should not need to do it the way I did it. Other compilers as well you should not need to do it the way I did it in the adventure game above.
我编写了自己的例程来执行未签名的除法,因为我在网络上找不到未签名的版本。我需要将 64 位值除以 32 位值以获得 32 位结果。
内部循环不如上面提供的有符号解决方案高效,但这确实支持无符号算术。如果分子的高位部分 (hi) 小于分母 (den),则此例程执行 32 位除法,否则执行完整的 64 位除法 (hi:lo/den)。结果在lo。
可以添加对边界条件和 2 的幂的额外检查。完整详细信息请访问 http://www.idwiz.co.za /Tips%20and%20Tricks/Divide.htm
I wrote my own routine to perform an unsigned division as I could not find an unsigned version on the web. I needed to divide a 64 bit value with a 32 bit value to get a 32 bit result.
The inner loop is not as efficient as the signed solution provided above, but this does support unsigned arithmetic. This routine performs a 32 bit division if the high part of the numerator (hi) is smaller than the denominator (den), otherwise a full 64 bit division is performed (hi:lo/den). The result is in lo.
Extra checking for boundary conditions and power of 2 can be added. Full details can be found at http://www.idwiz.co.za/Tips%20and%20Tricks/Divide.htm
我为
ARM GNU
汇编器编写了以下函数。如果您没有支持udiv/sdiv
机器支持的 CPU,只需剪掉任一函数中直到“0:”标签的前几行即可。有两个函数,
udiv
用于无符号整数除法,sdiv
用于有符号整数除法。它们都期望 r1(高位字)和 r0(低位字)中的 64 位被除数(有符号或无符号),以及r1
(低位字)中的 32 位除数>r2。它们在r0
中返回商,在r1
中返回余数,因此您可以在C header
中将它们定义为extern
> 返回一个 64 位整数并随后屏蔽掉商和余数。错误(除以 0 或溢出)由绝对值大于或等于除数绝对值的余数表示。有符号除法算法通过被除数和除数的符号来区分大小写;它不会首先转换为正整数,因为这无法正确检测所有溢出情况。I wrote the following functions for the
ARM GNU
assembler. If you don't have a CPU withudiv/sdiv
machine support, just cut out the first few lines up to the "0:" label in either function.There are two functions,
udiv
for unsigned integer division andsdiv
for signed integer division. They both expect a 64-bit dividend (either signed or unsigned) inr1
(high word) andr0
(low word), and a 32-bit divisor inr2
. They return the quotient inr0
and the remainder inr1
, thus you can define them in aC header
asextern
returning a 64-bit integer and mask out the quotient and remainder afterwards. An error (division by 0 or overflow) is indicated by a remainder having an absolute value greater than or equal the absolute value of the divisor. The signed division algorithm uses case distinction by the signs of both dividend and divisor; it does not convert to positive integers first, since that wouldn't detect all overflow conditions properly.