当前位置：文江博客话题详情

如何在 ARM 上进行整数（有符号或无符号）除法？

发布于 2024-12-18 20:10:27 字数 114 浏览 2 评论 0原文

我主要致力于 Cortex-A8 和 Cortex-A9 的研究。我知道有些架构没有整数除法，但是除了转换为浮点数、除法、转换为整数之外，最好的方法是什么？或者这确实是最好的解决方案？

干杯! =）

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

喜你已久 2024-12-25 20:10:27

除以常数值可以通过执行 64 位乘法和右移来快速完成，例如，如下所示：

LDR     R3, =0xA151C331
UMULL   R3, R2, R1, R3
MOV     R0, R2,LSR#10

这里 R1 除以 1625。
计算如下： 64bitreg(R2:R3) = R1*0xA151C331，则结果是高 32bit 右移 10：

R1*0xA151C331/2^(32+10) = R1*0.00061538461545751488 = R1/1624.99999980

您可以根据以下公式计算您自己的常数：

x / N ==  (x*A)/2^(32+n)   -->       A = 2^(32+n)/N

选择最大的 n，其中 A < 2^32

Division by a constant value is done quickly by doing a 64bit-multiply and shift-right, for example, like this:

LDR     R3, =0xA151C331
UMULL   R3, R2, R1, R3
MOV     R0, R2,LSR#10

here R1 is divided by 1625.
The calculation is done like this: 64bitreg(R2:R3) = R1*0xA151C331, then the result is the upper 32bit right shifted by 10:

R1*0xA151C331/2^(32+10) = R1*0.00061538461545751488 = R1/1624.99999980

You can calculate your own constants from this formula:

x / N ==  (x*A)/2^(32+n)   -->       A = 2^(32+n)/N

select the largest n, for which A < 2^32

回复收藏 0 原文

提赋 2024-12-25 20:10:27

一些从其他地方复制过来的整数除法：
基本上，每位 3 条指令。来自这个网站，尽管我也在其他很多地方看到过它。
此网站还有一个不错的版本，一般来说可能会更快。


@ Entry  r0: numerator (lo) must be signed positive
@        r2: deniminator (den) must be non-zero and signed negative
idiv:
        lo .req r0; hi .req r1; den .req r2
        mov hi, #0 @ hi = 0
        adds lo, lo, lo
        .rept 32 @ repeat 32 times
          adcs hi, den, hi, lsl #1
          subcc hi, hi, den
          adcs lo, lo, lo
        .endr
        mov pc, lr @ return
@ Exit   r0: quotient (lo)
@        r1: remainder (hi)

Some copy-pasta from elsewhere for an integer divide:
Basically, 3 instructions per bit. From this website, though I've seen it many other places as well.
This site also has a nice version which may be faster in general.


@ Entry  r0: numerator (lo) must be signed positive
@        r2: deniminator (den) must be non-zero and signed negative
idiv:
        lo .req r0; hi .req r1; den .req r2
        mov hi, #0 @ hi = 0
        adds lo, lo, lo
        .rept 32 @ repeat 32 times
          adcs hi, den, hi, lsl #1
          subcc hi, hi, den
          adcs lo, lo, lo
        .endr
        mov pc, lr @ return
@ Exit   r0: quotient (lo)
@        r1: remainder (hi)

回复收藏 0 原文

稀香 2024-12-25 20:10:27

编译器通常在其库中包含一个分隔符，例如 gcclib 我已从 gcc 中提取它们并直接使用它们：

https: //github.com/dwelch67/stm32vld/ 然后 stm32f4d/adventure/gcclib

浮动并返回可能不是最好的解决方案。你可以尝试一下，看看它有多快......这是一个乘法，但也可以很容易地使它成为一个除法：

https://github.com/dwelch67/stm32vld/ 然后是 stm32f4d/float01/vectors.s

我没有计时来看看有多快/慢。明白了，我在上面使用的是 cortex-m，而你正在谈论 cortex-a，频谱的不同端，类似的浮点指令，gcc lib 的东西是相似的，对于 cortex-m，我必须为拇指构建，但你可以同样轻松地为 Arm 构建。实际上，对于 gcc 来说，它应该自动工作，你不需要像我那样做。其他编译器也不需要像我在上面的冒险游戏中那样做。

回复收藏 0 原文

丘比特射中我 2024-12-25 20:10:27

我编写了自己的例程来执行未签名的除法，因为我在网络上找不到未签名的版本。我需要将 64 位值除以 32 位值以获得 32 位结果。

内部循环不如上面提供的有符号解决方案高效，但这确实支持无符号算术。如果分子的高位部分 (hi) 小于分母 (den)，则此例程执行 32 位除法，否则执行完整的 64 位除法 (hi:lo/den)。结果在lo。

  cmp     hi, den                   // if hi < den do 32 bits, else 64 bits
  bpl     do64bits
  REPT    32
    adds    lo, lo, lo              // shift numerator through carry
    adcs    hi, hi, hi
    subscc  work, hi, den           // if carry not set, compare        
    subcs   hi, hi, den             // if carry set, subtract
    addcs   lo, lo, #1              // if carry set, and 1 to quotient
  ENDR

  mov     r0, lo                    // move result into R0
  mov     pc, lr                    // return

do64bits:
  mov     top, #0
  REPT    64
    adds    lo, lo, lo              // shift numerator through carry
    adcs    hi, hi, hi
    adcs    top, top, top
    subscc  work, top, den          // if carry not set, compare        
    subcs   top, top, den           // if carry set, subtract
    addcs   lo, lo, #1              // if carry set, and 1 to quotient
  ENDR
  mov     r0, lo                    // move result into R0
  mov     pc, lr                    // return

可以添加对边界条件和 2 的幂的额外检查。完整详细信息请访问 http://www.idwiz.co.za /Tips%20and%20Tricks/Divide.htm

I wrote my own routine to perform an unsigned division as I could not find an unsigned version on the web. I needed to divide a 64 bit value with a 32 bit value to get a 32 bit result.

The inner loop is not as efficient as the signed solution provided above, but this does support unsigned arithmetic. This routine performs a 32 bit division if the high part of the numerator (hi) is smaller than the denominator (den), otherwise a full 64 bit division is performed (hi:lo/den). The result is in lo.

  cmp     hi, den                   // if hi < den do 32 bits, else 64 bits
  bpl     do64bits
  REPT    32
    adds    lo, lo, lo              // shift numerator through carry
    adcs    hi, hi, hi
    subscc  work, hi, den           // if carry not set, compare        
    subcs   hi, hi, den             // if carry set, subtract
    addcs   lo, lo, #1              // if carry set, and 1 to quotient
  ENDR

  mov     r0, lo                    // move result into R0
  mov     pc, lr                    // return

do64bits:
  mov     top, #0
  REPT    64
    adds    lo, lo, lo              // shift numerator through carry
    adcs    hi, hi, hi
    adcs    top, top, top
    subscc  work, top, den          // if carry not set, compare        
    subcs   top, top, den           // if carry set, subtract
    addcs   lo, lo, #1              // if carry set, and 1 to quotient
  ENDR
  mov     r0, lo                    // move result into R0
  mov     pc, lr                    // return

Extra checking for boundary conditions and power of 2 can be added. Full details can be found at http://www.idwiz.co.za/Tips%20and%20Tricks/Divide.htm

回复收藏 0 原文

请叫√我孤独 2024-12-25 20:10:27

我为 ARM GNU 汇编器编写了以下函数。如果您没有支持 udiv/sdiv 机器支持的 CPU，只需剪掉任一函数中直到“0:”标签的前几行即可。

.arm
.cpu    cortex-a7
.syntax unified

.type   udiv,%function
.globl  udiv
udiv:   tst     r1,r1
        bne     0f
        udiv    r3,r0,r2
        mls     r1,r2,r3,r0
        mov     r0,r3
        bx      lr
0:      cmp     r1,r2
        movhs   r1,r2
        bxhs    lr
        mvn     r3,0
1:      adds    r0,r0
        adcs    r1,r1
        cmpcc   r1,r2
        subcs   r1,r2
        orrcs   r0,1
        lsls    r3,1
        bne     1b
        bx      lr
.size   udiv,.-udiv

.type   sdiv,%function
.globl  sdiv
sdiv:   teq     r1,r0,ASR 31
        bne     0f
        sdiv    r3,r0,r2
        mls     r1,r2,r3,r0
        mov     r0,r3
        bx      lr
0:      mov     r3,2
        adds    r0,r0
        and     r3,r3,r1,LSR 30
        adcs    r1,r1
        orr     r3,r3,r2,LSR 31
        movvs   r1,r2
        ldrvc   pc,[pc,r3,LSL 2]
        bx      lr
        .int    1f
        .int    3f
        .int    5f
        .int    11f
1:      cmp     r1,r2
        movge   r1,r2
        bxge    lr
        mvn     r3,1
2:      adds    r0,r0
        adcs    r1,r1
        cmpvc   r1,r2
        subge   r1,r2
        orrge   r0,1
        lsls    r3,1
        bne     2b
        bx      lr
3:      cmn     r1,r2
        movge   r1,r2
        bxge    lr
        mvn     r3,1
4:      adds    r0,r0
        adcs    r1,r1
        cmnvc   r1,r2
        addge   r1,r2
        orrge   r0,1
        lsls    r3,1
        bne     4b
        rsb     r0,0
        bx      lr
5:      cmn     r1,r2
        blt     6f
        tsteq   r0,r0
        bne     7f
6:      mov     r1,r2
        bx      lr
7:      mvn     r3,1
8:      adds    r0,r0
        adcs    r1,r1
        cmnvc   r1,r2
        blt     9f
        tsteq   r0,r3
        bne     10f
9:      add     r1,r2
        orr     r0,1
10:     lsls    r3,1
        bne     8b
        rsb     r0,0
        bx      lr
11:     cmp     r1,r2
        blt     12f
        tsteq   r0,r0
        bne     13f
12:     mov     r1,r2
        bx      lr
13:     mvn     r3,1
14:     adds    r0,r0
        adcs    r1,r1
        cmpvc   r1,r2
        blt     15f
        tsteq   r0,r3
        bne     16f
15:     sub     r1,r2
        orr     r0,1
16:     lsls    r3,1
        bne     14b
        bx      lr

有两个函数，udiv 用于无符号整数除法，sdiv 用于有符号整数除法。它们都期望 r1（高位字）和 r0（低位字）中的 64 位被除数（有符号或无符号），以及 r1（低位字）中的 32 位除数>r2。它们在 r0 中返回商，在 r1 中返回余数，因此您可以在 C header 中将它们定义为 extern > 返回一个 64 位整数并随后屏蔽掉商和余数。错误（除以 0 或溢出）由绝对值大于或等于除数绝对值的余数表示。有符号除法算法通过被除数和除数的符号来区分大小写；它不会首先转换为正整数，因为这无法正确检测所有溢出情况。

I wrote the following functions for the ARM GNU assembler. If you don't have a CPU with udiv/sdiv machine support, just cut out the first few lines up to the "0:" label in either function.

.arm
.cpu    cortex-a7
.syntax unified

.type   udiv,%function
.globl  udiv
udiv:   tst     r1,r1
        bne     0f
        udiv    r3,r0,r2
        mls     r1,r2,r3,r0
        mov     r0,r3
        bx      lr
0:      cmp     r1,r2
        movhs   r1,r2
        bxhs    lr
        mvn     r3,0
1:      adds    r0,r0
        adcs    r1,r1
        cmpcc   r1,r2
        subcs   r1,r2
        orrcs   r0,1
        lsls    r3,1
        bne     1b
        bx      lr
.size   udiv,.-udiv

.type   sdiv,%function
.globl  sdiv
sdiv:   teq     r1,r0,ASR 31
        bne     0f
        sdiv    r3,r0,r2
        mls     r1,r2,r3,r0
        mov     r0,r3
        bx      lr
0:      mov     r3,2
        adds    r0,r0
        and     r3,r3,r1,LSR 30
        adcs    r1,r1
        orr     r3,r3,r2,LSR 31
        movvs   r1,r2
        ldrvc   pc,[pc,r3,LSL 2]
        bx      lr
        .int    1f
        .int    3f
        .int    5f
        .int    11f
1:      cmp     r1,r2
        movge   r1,r2
        bxge    lr
        mvn     r3,1
2:      adds    r0,r0
        adcs    r1,r1
        cmpvc   r1,r2
        subge   r1,r2
        orrge   r0,1
        lsls    r3,1
        bne     2b
        bx      lr
3:      cmn     r1,r2
        movge   r1,r2
        bxge    lr
        mvn     r3,1
4:      adds    r0,r0
        adcs    r1,r1
        cmnvc   r1,r2
        addge   r1,r2
        orrge   r0,1
        lsls    r3,1
        bne     4b
        rsb     r0,0
        bx      lr
5:      cmn     r1,r2
        blt     6f
        tsteq   r0,r0
        bne     7f
6:      mov     r1,r2
        bx      lr
7:      mvn     r3,1
8:      adds    r0,r0
        adcs    r1,r1
        cmnvc   r1,r2
        blt     9f
        tsteq   r0,r3
        bne     10f
9:      add     r1,r2
        orr     r0,1
10:     lsls    r3,1
        bne     8b
        rsb     r0,0
        bx      lr
11:     cmp     r1,r2
        blt     12f
        tsteq   r0,r0
        bne     13f
12:     mov     r1,r2
        bx      lr
13:     mvn     r3,1
14:     adds    r0,r0
        adcs    r1,r1
        cmpvc   r1,r2
        blt     15f
        tsteq   r0,r3
        bne     16f
15:     sub     r1,r2
        orr     r0,1
16:     lsls    r3,1
        bne     14b
        bx      lr

There are two functions, udiv for unsigned integer division and sdiv for signed integer division. They both expect a 64-bit dividend (either signed or unsigned) in r1 (high word) and r0 (low word), and a 32-bit divisor in r2. They return the quotient in r0 and the remainder in r1, thus you can define them in a C header as extern returning a 64-bit integer and mask out the quotient and remainder afterwards. An error (division by 0 or overflow) is indicated by a remainder having an absolute value greater than or equal the absolute value of the divisor. The signed division algorithm uses case distinction by the signs of both dividend and divisor; it does not convert to positive integers first, since that wouldn't detect all overflow conditions properly.