如何在 ARM 上进行整数(有符号或无符号)除法?

发布于 2024-12-18 20:10:27 字数 114 浏览 2 评论 0原文

我主要致力于 Cortex-A8 和 Cortex-A9 的研究。我知道有些架构没有整数除法,但是除了转换为浮点数、除法、转换为整数之外,最好的方法是什么?或者这确实是最好的解决方案?

干杯! =)

I'm working on Cortex-A8 and Cortex-A9 in particular. I know that some architectures don't come with integer division, but what is the best way to do it other than convert to float, divide, convert to integer? Or is that indeed the best solution?

Cheers! = )

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

喜你已久 2024-12-25 20:10:27

除以常数值可以通过执行 64 位乘法和右移来快速完成,例如,如下所示:

LDR     R3, =0xA151C331
UMULL   R3, R2, R1, R3
MOV     R0, R2,LSR#10

这里 R1 除以 1625。
计算如下: 64bitreg(R2:R3) = R1*0xA151C331,则结果是高 32bit 右移 10:

R1*0xA151C331/2^(32+10) = R1*0.00061538461545751488 = R1/1624.99999980

您可以根据以下公式计算您自己的常数:

x / N ==  (x*A)/2^(32+n)   -->       A = 2^(32+n)/N

选择最大的 n,其中 A < 2^32

Division by a constant value is done quickly by doing a 64bit-multiply and shift-right, for example, like this:

LDR     R3, =0xA151C331
UMULL   R3, R2, R1, R3
MOV     R0, R2,LSR#10

here R1 is divided by 1625.
The calculation is done like this: 64bitreg(R2:R3) = R1*0xA151C331, then the result is the upper 32bit right shifted by 10:

R1*0xA151C331/2^(32+10) = R1*0.00061538461545751488 = R1/1624.99999980

You can calculate your own constants from this formula:

x / N ==  (x*A)/2^(32+n)   -->       A = 2^(32+n)/N

select the largest n, for which A < 2^32

提赋 2024-12-25 20:10:27

一些从其他地方复制过来的整数除法:
基本上,每位 3 条指令。来自这个网站,尽管我也在其他很多地方看到过它。
网站还有一个不错的版本,一般来说可能会更快。


@ Entry  r0: numerator (lo) must be signed positive
@        r2: deniminator (den) must be non-zero and signed negative
idiv:
        lo .req r0; hi .req r1; den .req r2
        mov hi, #0 @ hi = 0
        adds lo, lo, lo
        .rept 32 @ repeat 32 times
          adcs hi, den, hi, lsl #1
          subcc hi, hi, den
          adcs lo, lo, lo
        .endr
        mov pc, lr @ return
@ Exit   r0: quotient (lo)
@        r1: remainder (hi)

Some copy-pasta from elsewhere for an integer divide:
Basically, 3 instructions per bit. From this website, though I've seen it many other places as well.
This site also has a nice version which may be faster in general.


@ Entry  r0: numerator (lo) must be signed positive
@        r2: deniminator (den) must be non-zero and signed negative
idiv:
        lo .req r0; hi .req r1; den .req r2
        mov hi, #0 @ hi = 0
        adds lo, lo, lo
        .rept 32 @ repeat 32 times
          adcs hi, den, hi, lsl #1
          subcc hi, hi, den
          adcs lo, lo, lo
        .endr
        mov pc, lr @ return
@ Exit   r0: quotient (lo)
@        r1: remainder (hi)
稀香 2024-12-25 20:10:27

编译器通常在其库中包含一个分隔符,例如 gcclib 我已从 gcc 中提取它们并直接使用它们:

https: //github.com/dwelch67/stm32vld/ 然后 stm32f4d/adventure/gcclib

浮动并返回可能不是最好的解决方案。你可以尝试一下,看看它有多快......这是一个乘法,但也可以很容易地使它成为一个除法:

https://github.com/dwelch67/stm32vld/ 然后是 stm32f4d/float01/vectors.s

我没有计时来看看有多快/慢。明白了,我在上面使用的是 cortex-m,而你正在谈论 cortex-a,频谱的不同端,类似的浮点指令,gcc lib 的东西是相似的,对于 cortex-m,我必须为拇指构建,但你可以同样轻松地为 Arm 构建。实际上,对于 gcc 来说,它应该自动工作,你不需要像我那样做。其他编译器也不需要像我在上面的冒险游戏中那样做。

The compiler normally includes a divide in its library, gcclib for example I have extracted them from gcc and use them directly:

https://github.com/dwelch67/stm32vld/ then stm32f4d/adventure/gcclib

going to float and back is probably not the best solution. you can try it and see how fast it is...This is a multiply but could as easily make it a divide:

https://github.com/dwelch67/stm32vld/ then stm32f4d/float01/vectors.s

I didnt time it though to see how fast/slow. Understood I am using a cortex-m above and you are talking about a cortex-a, different ends of the spectrum, similar float instructions and the gcc lib stuff is similar, for the cortex-m I have to build for thumb but you can just as easily build for arm. Actually with gcc it should all just work automagically you should not need to do it the way I did it. Other compilers as well you should not need to do it the way I did it in the adventure game above.

丘比特射中我 2024-12-25 20:10:27

我编写了自己的例程来执行未签名的除法,因为我在网络上找不到未签名的版本。我需要将 64 位值除以 32 位值以获得 32 位结果。

内部循环不如上面提供的有符号解决方案高效,但这确实支持无符号算术。如果分子的高位部分 (hi) 小于分母 (den),则此例程执行 32 位除法,否则执行完整的 64 位除法 (hi:lo/den)。结果在lo。

  cmp     hi, den                   // if hi < den do 32 bits, else 64 bits
  bpl     do64bits
  REPT    32
    adds    lo, lo, lo              // shift numerator through carry
    adcs    hi, hi, hi
    subscc  work, hi, den           // if carry not set, compare        
    subcs   hi, hi, den             // if carry set, subtract
    addcs   lo, lo, #1              // if carry set, and 1 to quotient
  ENDR

  mov     r0, lo                    // move result into R0
  mov     pc, lr                    // return

do64bits:
  mov     top, #0
  REPT    64
    adds    lo, lo, lo              // shift numerator through carry
    adcs    hi, hi, hi
    adcs    top, top, top
    subscc  work, top, den          // if carry not set, compare        
    subcs   top, top, den           // if carry set, subtract
    addcs   lo, lo, #1              // if carry set, and 1 to quotient
  ENDR
  mov     r0, lo                    // move result into R0
  mov     pc, lr                    // return

可以添加对边界条件和 2 的幂的额外检查。完整详细信息请访问 http://www.idwiz.co.za /Tips%20and%20Tricks/Divide.htm

I wrote my own routine to perform an unsigned division as I could not find an unsigned version on the web. I needed to divide a 64 bit value with a 32 bit value to get a 32 bit result.

The inner loop is not as efficient as the signed solution provided above, but this does support unsigned arithmetic. This routine performs a 32 bit division if the high part of the numerator (hi) is smaller than the denominator (den), otherwise a full 64 bit division is performed (hi:lo/den). The result is in lo.

  cmp     hi, den                   // if hi < den do 32 bits, else 64 bits
  bpl     do64bits
  REPT    32
    adds    lo, lo, lo              // shift numerator through carry
    adcs    hi, hi, hi
    subscc  work, hi, den           // if carry not set, compare        
    subcs   hi, hi, den             // if carry set, subtract
    addcs   lo, lo, #1              // if carry set, and 1 to quotient
  ENDR

  mov     r0, lo                    // move result into R0
  mov     pc, lr                    // return

do64bits:
  mov     top, #0
  REPT    64
    adds    lo, lo, lo              // shift numerator through carry
    adcs    hi, hi, hi
    adcs    top, top, top
    subscc  work, top, den          // if carry not set, compare        
    subcs   top, top, den           // if carry set, subtract
    addcs   lo, lo, #1              // if carry set, and 1 to quotient
  ENDR
  mov     r0, lo                    // move result into R0
  mov     pc, lr                    // return

Extra checking for boundary conditions and power of 2 can be added. Full details can be found at http://www.idwiz.co.za/Tips%20and%20Tricks/Divide.htm

请叫√我孤独 2024-12-25 20:10:27

我为 ARM GNU 汇编器编写了以下函数。如果您没有支持 udiv/sdiv 机器支持的 CPU,只需剪掉任一函数中直到“0:”标签的前几行即可。

.arm
.cpu    cortex-a7
.syntax unified

.type   udiv,%function
.globl  udiv
udiv:   tst     r1,r1
        bne     0f
        udiv    r3,r0,r2
        mls     r1,r2,r3,r0
        mov     r0,r3
        bx      lr
0:      cmp     r1,r2
        movhs   r1,r2
        bxhs    lr
        mvn     r3,0
1:      adds    r0,r0
        adcs    r1,r1
        cmpcc   r1,r2
        subcs   r1,r2
        orrcs   r0,1
        lsls    r3,1
        bne     1b
        bx      lr
.size   udiv,.-udiv

.type   sdiv,%function
.globl  sdiv
sdiv:   teq     r1,r0,ASR 31
        bne     0f
        sdiv    r3,r0,r2
        mls     r1,r2,r3,r0
        mov     r0,r3
        bx      lr
0:      mov     r3,2
        adds    r0,r0
        and     r3,r3,r1,LSR 30
        adcs    r1,r1
        orr     r3,r3,r2,LSR 31
        movvs   r1,r2
        ldrvc   pc,[pc,r3,LSL 2]
        bx      lr
        .int    1f
        .int    3f
        .int    5f
        .int    11f
1:      cmp     r1,r2
        movge   r1,r2
        bxge    lr
        mvn     r3,1
2:      adds    r0,r0
        adcs    r1,r1
        cmpvc   r1,r2
        subge   r1,r2
        orrge   r0,1
        lsls    r3,1
        bne     2b
        bx      lr
3:      cmn     r1,r2
        movge   r1,r2
        bxge    lr
        mvn     r3,1
4:      adds    r0,r0
        adcs    r1,r1
        cmnvc   r1,r2
        addge   r1,r2
        orrge   r0,1
        lsls    r3,1
        bne     4b
        rsb     r0,0
        bx      lr
5:      cmn     r1,r2
        blt     6f
        tsteq   r0,r0
        bne     7f
6:      mov     r1,r2
        bx      lr
7:      mvn     r3,1
8:      adds    r0,r0
        adcs    r1,r1
        cmnvc   r1,r2
        blt     9f
        tsteq   r0,r3
        bne     10f
9:      add     r1,r2
        orr     r0,1
10:     lsls    r3,1
        bne     8b
        rsb     r0,0
        bx      lr
11:     cmp     r1,r2
        blt     12f
        tsteq   r0,r0
        bne     13f
12:     mov     r1,r2
        bx      lr
13:     mvn     r3,1
14:     adds    r0,r0
        adcs    r1,r1
        cmpvc   r1,r2
        blt     15f
        tsteq   r0,r3
        bne     16f
15:     sub     r1,r2
        orr     r0,1
16:     lsls    r3,1
        bne     14b
        bx      lr

有两个函数,udiv 用于无符号整数除法,sdiv 用于有符号整数除法。它们都期望 r1(高位字)和 r0(低位字)中的 64 位被除数(有符号或无符号),以及 r1(低位字)中的 32 位除数>r2。它们在 r0 中返回商,在 r1 中返回余数,因此您可以在 C header 中将它们定义为 extern > 返回一个 64 位整数并随后屏蔽掉商和余数。错误(除以 0 或溢出)由绝对值大于或等于除数绝对值的余数表示。有符号除法算法通过被除数和除数的符号来区分大小写;它不会首先转换为正整数,因为这无法正确检测所有溢出情况。

I wrote the following functions for the ARM GNU assembler. If you don't have a CPU with udiv/sdiv machine support, just cut out the first few lines up to the "0:" label in either function.

.arm
.cpu    cortex-a7
.syntax unified

.type   udiv,%function
.globl  udiv
udiv:   tst     r1,r1
        bne     0f
        udiv    r3,r0,r2
        mls     r1,r2,r3,r0
        mov     r0,r3
        bx      lr
0:      cmp     r1,r2
        movhs   r1,r2
        bxhs    lr
        mvn     r3,0
1:      adds    r0,r0
        adcs    r1,r1
        cmpcc   r1,r2
        subcs   r1,r2
        orrcs   r0,1
        lsls    r3,1
        bne     1b
        bx      lr
.size   udiv,.-udiv

.type   sdiv,%function
.globl  sdiv
sdiv:   teq     r1,r0,ASR 31
        bne     0f
        sdiv    r3,r0,r2
        mls     r1,r2,r3,r0
        mov     r0,r3
        bx      lr
0:      mov     r3,2
        adds    r0,r0
        and     r3,r3,r1,LSR 30
        adcs    r1,r1
        orr     r3,r3,r2,LSR 31
        movvs   r1,r2
        ldrvc   pc,[pc,r3,LSL 2]
        bx      lr
        .int    1f
        .int    3f
        .int    5f
        .int    11f
1:      cmp     r1,r2
        movge   r1,r2
        bxge    lr
        mvn     r3,1
2:      adds    r0,r0
        adcs    r1,r1
        cmpvc   r1,r2
        subge   r1,r2
        orrge   r0,1
        lsls    r3,1
        bne     2b
        bx      lr
3:      cmn     r1,r2
        movge   r1,r2
        bxge    lr
        mvn     r3,1
4:      adds    r0,r0
        adcs    r1,r1
        cmnvc   r1,r2
        addge   r1,r2
        orrge   r0,1
        lsls    r3,1
        bne     4b
        rsb     r0,0
        bx      lr
5:      cmn     r1,r2
        blt     6f
        tsteq   r0,r0
        bne     7f
6:      mov     r1,r2
        bx      lr
7:      mvn     r3,1
8:      adds    r0,r0
        adcs    r1,r1
        cmnvc   r1,r2
        blt     9f
        tsteq   r0,r3
        bne     10f
9:      add     r1,r2
        orr     r0,1
10:     lsls    r3,1
        bne     8b
        rsb     r0,0
        bx      lr
11:     cmp     r1,r2
        blt     12f
        tsteq   r0,r0
        bne     13f
12:     mov     r1,r2
        bx      lr
13:     mvn     r3,1
14:     adds    r0,r0
        adcs    r1,r1
        cmpvc   r1,r2
        blt     15f
        tsteq   r0,r3
        bne     16f
15:     sub     r1,r2
        orr     r0,1
16:     lsls    r3,1
        bne     14b
        bx      lr

There are two functions, udiv for unsigned integer division and sdiv for signed integer division. They both expect a 64-bit dividend (either signed or unsigned) in r1 (high word) and r0 (low word), and a 32-bit divisor in r2. They return the quotient in r0 and the remainder in r1, thus you can define them in a C header as extern returning a 64-bit integer and mask out the quotient and remainder afterwards. An error (division by 0 or overflow) is indicated by a remainder having an absolute value greater than or equal the absolute value of the divisor. The signed division algorithm uses case distinction by the signs of both dividend and divisor; it does not convert to positive integers first, since that wouldn't detect all overflow conditions properly.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文