当前位置：文江博客话题详情

将 ADC（带进位相加）组装到 C++

发布于 2024-10-01 23:49:13 字数 177 浏览 0 评论 0原文

有一个 x86 汇编指令 ADC。我发现这意味着“带进位添加”。这是什么意思/做什么？如何在 C++ 中实现该指令的行为？

信息：
在 Windows 上编译。我使用的是 32 位 Windows 安装。我的处理器是 Intel 的 Core 2 Duo。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

诠释孤独 2024-10-08 23:49:13

ADC 与 ADD 相同，但如果设置了处理器的进位标志，则添加额外的 1。

回复收藏 0 原文

兔小萌 2024-10-08 23:49:13

ADC 行为可以用 C 和 C++ 进行模拟。以下示例将两个数字相加（存储为无符号数组，因为它们太大而无法放入单个无符号数组中）。

unsigned first[10];
unsigned second[10];
unsigned result[11];

....   /* first and second get defined */

unsigned carry = 0;
for (i = 0; i < 10; i++) {
    result[i] = first[i] + second[i] + carry;
    carry = (first[i] > result[i]);
}
result[10] = carry;

希望这有帮助。

The ADC behaviour can be simulated in both C and C++. The following example adds two numbers (stored as arrays of unsigned as they are too large to fit into a single unsigned).

unsigned first[10];
unsigned second[10];
unsigned result[11];

....   /* first and second get defined */

unsigned carry = 0;
for (i = 0; i < 10; i++) {
    result[i] = first[i] + second[i] + carry;
    carry = (first[i] > result[i]);
}
result[10] = carry;

Hope this helps.

回复收藏 0 原文

意犹 2024-10-08 23:49:13

C++ 语言没有任何进位标志的概念，因此围绕制作一个内部函数包装器ADC 指令很笨拙。然而，英特尔还是这么做了： unsigned char _addcarry_u32（unsigned char c_in、unsigned a、unsigned b、unsigned * out）；。最后我检查了一下，gcc 在这方面做得很差（将进位结果保存到整数寄存器中，而不是将其保留在 CF 中），但希望英特尔自己的编译器能做得更好。

另请参阅 x86 汇编文档的标记 wiki。

当添加比单个寄存器更宽的整数时，编译器将使用 ADC，例如在 32 位代码中添加 int64_t 或在 64 位代码中添加 __int128_t。

#include <stdint.h>
#ifdef __x86_64__
__int128_t add128(__int128_t a, __int128_t b) { return a+b; }
#endif
    # clang 3.8 -O3  for x86-64, SystemV ABI.
    # __int128_t args passed in 2 regs each, and returned in rdx:rax
    add     rdi, rdx
    adc     rsi, rcx
    mov     rax, rdi
    mov     rdx, rsi
    ret

Godbolt 编译器浏览器。 clang 的 -fverbose-asm 不是很冗长，但 gcc 5.3 / 6.1 浪费了两条 mov 指令，因此可读性较差。

有时，您可以手持编译器发出 adc 或以其他方式使用成语 uint64_t sum = a+b; 进行 add 的进位> / 进位=总和<一个；。但是，当前的编译器无法扩展此功能以从 adc 获取进位，而不是从 add 获取； c+d+carry_in 可以一路回绕，编译器无法优化对 c+d 中每个 + 执行的多重检查+携带如果你安全的话。

Clang `_ExtInt`

我知道有一种方法可以获取 add/adc/.../adc 链：Clang 的新 _ExtInt(width) 功能提供了固定的-bit-width 任何大小的类型，最多 16,777,215 位（博客帖子）。它于 2020 年 4 月 21 日添加到 clang 的开发版本中，因此尚未出现在任何发布版本中。

这有望在某个时候出现在 ISO C 和/或 C++ 中； N2472 提案显然正在“被ISO WG14 C 语言委员会积极考虑”

（更新：现在在 C23 中为 _BitInt(256)，尽管支持的最大位宽取决于实现，可能低至 128。 Clang 支持将其作为 C++ 中的扩展，但 GCC 和 MSVC 不支持。）

typedef _ExtInt(256) wide_int;

wide_int add ( wide_int a, wide_int b) {
    return a+b;
}

使用 x86-64 的 clang trunk -O2 编译如下（Godbolt)：

add(int _ExtInt<256>, int _ExtInt<256>):
        add     rsi, r9
        adc     rdx, qword ptr [rsp + 8]
        adc     rcx, qword ptr [rsp + 16]
        mov     rax, rdi                        # return the retval pointer
        adc     r8, qword ptr [rsp + 24]        # chain of ADD / 3x ADC!

        mov     qword ptr [rdi + 8], rdx        # store results to mem
        mov     qword ptr [rdi], rsi
        mov     qword ptr [rdi + 16], rcx
        mov     qword ptr [rdi + 24], r8
        ret

显然 _ExtInt< /code> 按整数寄存器中的值传递，直到调用约定用完寄存器为止。（至少在这个早期版本中；当 x86-64 SysV 比 2 个或 3 个寄存器宽时，也许 x86-64 SysV 应该将其归类为“内存”，例如大于 16 字节的结构。虽然比结构更多，但将其放在寄存器中很可能是只需将其他参数放在前面，这样它们就不会被替换。）

第一个 _ExtInt 参数位于 R8:RCX:RDX:RSI 中，第二个参数的低位 qword 位于 R9 中，其余的位于内存中。

指向返回值对象的指针作为 RDI 中隐藏的第一个参数传递； x86-64 System V 仅在最多 2 个整数寄存器 (RDX:RAX) 中返回，这不会改变这一点。

The C++ language doesn't have any concept of a carry flag, so making an intrinsic function wrapper around the ADC instruction is clunky. However, Intel did it anyway: unsigned char _addcarry_u32 (unsigned char c_in, unsigned a, unsigned b, unsigned * out);. Last I checked, gcc did a poor job with this (saving the carry result into an integer register, instead of leaving it in CF), but hopefully Intel's own compiler does better.

See also the x86 tag wiki for assembly documentation.

The compiler will use ADC for you when adding integers wider than a single register, e.g. adding int64_t in 32bit code, or __int128_t in 64bit code.

#include <stdint.h>
#ifdef __x86_64__
__int128_t add128(__int128_t a, __int128_t b) { return a+b; }
#endif
    # clang 3.8 -O3  for x86-64, SystemV ABI.
    # __int128_t args passed in 2 regs each, and returned in rdx:rax
    add     rdi, rdx
    adc     rsi, rcx
    mov     rax, rdi
    mov     rdx, rsi
    ret

asm output from the Godbolt compiler explorer. clang's -fverbose-asm isn't very vebose, but gcc 5.3 / 6.1 wastes two mov instructions so it's less readable.

You can sometimes hand-hold compilers into emitting an adc or otherwise using the carry-out of add using the idiom uint64_t sum = a+b; / carry = sum < a;. But extending this to get a carry-out from an adc instead of add is not possible with current compilers; c+d+carry_in can wrap all the way around, and compilers don't manage to optimize the multiple checks for carry out on each + in c+d+carry if you do it safely.

Clang `_ExtInt`

There is one way I'm aware of to get a chain of add/adc/.../adc: Clang's new _ExtInt(width) feature that provides fixed-bit-width types of any size up to 16,777,215 bits (blog post). It was added to clang's development version on April 21, 2020, so it's not yet in any released version.

This will hopefully show up in ISO C and/or C++ at some point; The N2472 proposal is apparently being "being actively considered by the ISO WG14 C Language Committee"

(Update: this is now in C23 as _BitInt(256), although the max supported bit-width is implementation-dependent and might be as low as 128. Clang supports it as an extension in C++, but GCC and MSVC don't.)

typedef _ExtInt(256) wide_int;

wide_int add ( wide_int a, wide_int b) {
    return a+b;
}

compiles as follows with clang trunk -O2 for x86-64 (Godbolt):

add(int _ExtInt<256>, int _ExtInt<256>):
        add     rsi, r9
        adc     rdx, qword ptr [rsp + 8]
        adc     rcx, qword ptr [rsp + 16]
        mov     rax, rdi                        # return the retval pointer
        adc     r8, qword ptr [rsp + 24]        # chain of ADD / 3x ADC!

        mov     qword ptr [rdi + 8], rdx        # store results to mem
        mov     qword ptr [rdi], rsi
        mov     qword ptr [rdi + 16], rcx
        mov     qword ptr [rdi + 24], r8
        ret

Apparently _ExtInt is passed by value in integer registers until the calling convention runs out of registers. (At least in this early version; Perhaps x86-64 SysV should class it as "memory" when it's wider than 2 or maybe 3 registers, like structs larger than 16 bytes. Although moreso than structs, having it in registers is likely to be useful. Just put other args first so they're not displaced.)

The first _ExtInt arg is in R8:RCX:RDX:RSI, and the second has its low qword in R9, with the rest in memory.

A pointer to the return-value object is passed as a hidden first arg in RDI; x86-64 System V only ever returns in up to 2 integer registers (RDX:RAX) and this doesn't change that.

回复收藏 0 原文

蘸点软妹酱 2024-10-08 23:49:13

来自此处（已损坏）或此处

但是，英特尔处理器
有一个特殊的指令叫做adc。
该命令的行为与
添加命令。唯一额外的事情是
它还添加了值进位标志
沿着。所以，这可能非常方便
添加大整数。假设你想要
将 32 位整数与 16 位整数相加
寄存器。我们怎样才能做到这一点？出色地，
假设第一个整数是
保存在寄存器对 DX:AX 上，并且
第二个是在 BX:CX 上。这是
如何：
添加 ax, cx
adc dx, bx
啊，所以首先，低 16 位加上
添加斧头、CX。那么高16位就是
使用 adc 而不是 add 添加。这是
因为：如果有溢出，
进位位自动添加在
较高的 16 位。所以，不麻烦
检查。这个方法可以扩展
到 64 位等等...请注意：如果
32位整数加法溢出
也是在较高的 16 位，结果
将不正确并且进位标志
已设置，例如 5 加上 50 亿
亿。

从这里开始的一切，请记住，它几乎属于实现定义的行为区域。

下面是一个适用于 VS 2010（32 位，WinXp）的小示例

注意：$7.4/1-“asm 声明是有条件支持的；其含义是实现定义的。[注：通常它用于通过汇编器的实现。-结束注]”

int main(){
   bool carry = false;
   int x = 0xffffffff + 0xffffffff;
   __asm {
      jc setcarry
setcarry:
      mov carry, 1
   }
}

From here (broken) or here

However, Intel processor
has a special instruction called adc.
This command behaves similarly as the
add command. The only extra thing is
that it also add the value carry flag
along. So, this may be very handy to
add large integers. Suppose you'd like
to add a 32-bit integers with 16-bit
registers. How can we do that? Well,
let's say that the first integer is
held on the register pair DX:AX, and
the second one is on BX:CX. This is
how:
add  ax, cx
adc  dx, bx
Ah, so first, the lower 16-bit is added by
add ax, cx. Then the higher 16-bit is
added using adc instead of add. It is
because: if there are overflows, the
carry bit is automatically added in
the higher 16-bit. So, no cumbersome
checking. This method can be extended
to 64 bits and so on... Note that: If
the 32-bit integer addition overflows
too at the higher 16-bit, the result
will not be correct and the carry flag
is set, e.g. Adding 5 billion to 5
billion.

Everything from here on, remember that it falls pretty much into the zone of implementation defined behavior.

Here's a small sample that works for VS 2010 (32-bit, WinXp)

Caveat: $7.4/1- "The asm declaration is conditionally-supported; its meaning is implementation-defined. [ Note: Typically it is used to pass information through the implementation to an assembler. —end note ]"

int main(){
   bool carry = false;
   int x = 0xffffffff + 0xffffffff;
   __asm {
      jc setcarry
setcarry:
      mov carry, 1
   }
}

回复收藏 0 原文

你怎么这么可爱啊 2024-10-08 23:49:13

在 x86-64 中，ADD 指令将两个 64 位整数相加：add rax, rbx 的作用是 rax = rax + rbx。
当存在无符号溢出时（=当结果不适合 64 位时），它还将进位标志设置为 1，否则它将进位标志设置为 0。

在 C++ 中，您可以像这样模拟 ADD

uint64_t a, b;
bool carry;

a += b;
carry = (a < b);  // a+b can't be smaller than b: there must have been an overflow

： ADC指令类似于ADD，但将进位标志添加到结果中：
adc rax, rbx 的作用是rax = rax + rbx + Carry_flag。
如果存在无符号溢出，它还会设置进位标志。

在 C++ 中：

uint64_t tmp = b + carry;
a += tmp;
carry = (tmp < carry) + (a < tmp);  // only one overflow can happen

ADD 和 ADC 指令可用于添加大整数（带有n“数字”）。
对最低有效数字使用 ADD，然后使用 ADC (n – 1) 次将其余数字相加。
这就是“教科书添加算法”。

例如，将 256 位大整数与四个 64 位“数字”相加：

mov rax, [rsi]        ; load the least significant source digit
mov rbx, [rsi + 8]    ; ...
mov rcx, [rsi + 16]
mov rdx, [rsi + 24]
add [rdi], rax        ; add it to the least significant destination digit
adc [rdi + 8], rbx    ; ... propagate carry up
adc [rdi + 16], rcx
adc [rdi + 24], rdx

最新版本的 clang 编译器可以识别大整数加法和使用ADD/ADC来实现。

constexpr uint64_t n = 4;
uint64_t dst[n], src[n];

// Add src to dst.
uint64_t carry = 0;
for (int i = 0; i < n; i++) {
  uint64_t tmp = src[i] + carry;
  dst[i] += tmp;
  carry = (tmp < carry) + (dst[i] < tmp);
}

In x86-64, the ADD instruction adds two 64-bit integers: add rax, rbx does rax = rax + rbx.
It also sets the carry flag to 1 when there was unsigned overflow (= when the result didn't fit in 64 bits), otherwise it sets the carry flag to 0.

In C++, you can simulate ADD like this:

uint64_t a, b;
bool carry;

a += b;
carry = (a < b);  // a+b can't be smaller than b: there must have been an overflow

The ADC instruction is like ADD, but adds the carry flag to the result:
adc rax, rbx does rax = rax + rbx + carry_flag.
It also sets the carry flag if there was unsigned overflow.

In C++:

uint64_t tmp = b + carry;
a += tmp;
carry = (tmp < carry) + (a < tmp);  // only one overflow can happen

The ADD and ADC instructions can be used to add big integers (with n "digits").
Use ADD for the least significant digits, then use ADC (n – 1) times to add the rest of the digits.
This is the “schoolbook addition algorithm”.

For example, adding 256-bit big integers with four 64-bit "digits":

mov rax, [rsi]        ; load the least significant source digit
mov rbx, [rsi + 8]    ; ...
mov rcx, [rsi + 16]
mov rdx, [rsi + 24]
add [rdi], rax        ; add it to the least significant destination digit
adc [rdi + 8], rbx    ; ... propagate carry up
adc [rdi + 16], rcx
adc [rdi + 24], rdx

Recent versions of the clang compiler can recognize big integer addition and use ADD/ADC to implement it.

constexpr uint64_t n = 4;
uint64_t dst[n], src[n];

// Add src to dst.
uint64_t carry = 0;
for (int i = 0; i < n; i++) {
  uint64_t tmp = src[i] + carry;
  dst[i] += tmp;
  carry = (tmp < carry) + (dst[i] < tmp);
}

回复收藏 0 原文

方觉久 2024-10-08 23:49:13

这里面有一个bug。试试这个输入：

unsigned first[10] =  {0x00000001};
unsigned second[10] = {0xffffffff, 0xffffffff};

结果应该是 {0, 0, 1, ...} 但结果是 {0, 0, 0, ...}

将此行：更改

carry = (first[i] > result[i]);

为：

if (carry)
    carry = (first[i] >= result[i]);
else
    carry = (first[i] > result[i]);

修复它。

There is a bug in this. Try this input:

unsigned first[10] =  {0x00000001};
unsigned second[10] = {0xffffffff, 0xffffffff};

The result should be {0, 0, 1, ...} but the result is {0, 0, 0, ...}

Changing this line:

carry = (first[i] > result[i]);

to this:

if (carry)
    carry = (first[i] >= result[i]);
else
    carry = (first[i] > result[i]);

fixes it.

回复收藏 0 原文

溇涏 2024-10-08 23:49:13

有__builtin_uadd_overflow内在函数可以通过编译器最佳地使用add/adc来构建加法链。

来自旧文档（早为 gcc 5!!!!）

https： //gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html

https://gcc.gnu.org/onlinedocs/gcc-5.3.0/gcc/Integer-Overflow-Builtins.html

两者clang< /strong> 和 GCC 确实实现了这些内置函数，并且我验证了生成的代码在 x86_64 和 aarch64 目标上都是最佳的。

此功能现已在最新的产品更新发布版本 Visual Studio 2022 中发布版本 17.7（例如 _add_overflow_i8 ....）

#include <stdint.h>

typedef unsigned __int128 uint128_t;


// carry_out = a + b + carry_in
uint8_t my_addcarry_u64(uint8_t carry_in, uint64_t a, uint64_t b, uint64_t * sum)
{
        bool c;
        uint64_t res;
        c = __builtin_uaddll_overflow (a, b, (long long unsigned *)&res);
        c |= __builtin_uaddll_overflow (res, carry_in, (long long unsigned *)&res);
        *sum = res;
        return c;
}

// carry_out = a + b + carry_in
uint8_t my_addcarry_u128(uint8_t carry_in, uint128_t a, uint128_t b, uint128_t * sum)
{
        bool c;
        uint64_t res_lo, res_hi;
        c = __builtin_uaddll_overflow (a, b, (long long unsigned *)&res_lo);
        c |= __builtin_uaddll_overflow (carry_in, res_lo, (long long unsigned *)&res_lo);
        c = __builtin_uaddll_overflow (a >> 64, c, (long long unsigned *)&res_hi);
        c |= __builtin_uaddll_overflow (b >> 64, res_hi, (long long unsigned *)&res_hi);
        *sum = ((uint128_t)res_hi << 64) + res_lo;
        return c;
}

该线程相当旧，我对原始问题“如何在 C++ 中实现该指令的行为”提供了答复。

There are __builtin_uadd_overflow intrinsics to build additions chains with the optimal use of add/adc by the compiler.

From old documentation (as old as gcc 5 !!!!)

https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html

https://gcc.gnu.org/onlinedocs/gcc-5.3.0/gcc/Integer-Overflow-Builtins.html

Both clang and GCC do implement these builtins, and I verified the generated code is optimal on both x86_64 and aarch64 targets

This feature has now been released in the latest product update released version Visual Studio 2022 version 17.7 (e.g. _add_overflow_i8 ....)

#include <stdint.h>

typedef unsigned __int128 uint128_t;


// carry_out = a + b + carry_in
uint8_t my_addcarry_u64(uint8_t carry_in, uint64_t a, uint64_t b, uint64_t * sum)
{
        bool c;
        uint64_t res;
        c = __builtin_uaddll_overflow (a, b, (long long unsigned *)&res);
        c |= __builtin_uaddll_overflow (res, carry_in, (long long unsigned *)&res);
        *sum = res;
        return c;
}

// carry_out = a + b + carry_in
uint8_t my_addcarry_u128(uint8_t carry_in, uint128_t a, uint128_t b, uint128_t * sum)
{
        bool c;
        uint64_t res_lo, res_hi;
        c = __builtin_uaddll_overflow (a, b, (long long unsigned *)&res_lo);
        c |= __builtin_uaddll_overflow (carry_in, res_lo, (long long unsigned *)&res_lo);
        c = __builtin_uaddll_overflow (a >> 64, c, (long long unsigned *)&res_hi);
        c |= __builtin_uaddll_overflow (b >> 64, res_hi, (long long unsigned *)&res_hi);
        *sum = ((uint128_t)res_hi << 64) + res_lo;
        return c;
}

This thread is quite old, I provide a response to the original question "How would one implement the behavior of this instruction in C++" .

回复收藏 0 原文

哭泣的笑容 2024-10-08 23:49:13

这是我最快的代码：

template <class Ty>
constexpr bool add_carry_ux(bool carry_in, Ty src1, Ty src2, Ty* sum_out)
{
    const Ty sum = src1 + src2;
    const bool carry_out = (sum < src1) | ((sum == ~static_cast<Ty>(0)) & carry_in);

    *sum_out = sum + carry_in;

    return carry_out;
}

ASM：

add_carry_ux(bool, unsigned long, unsigned long, unsigned long*):
        add     rsi, rdx
        movzx   eax, dil
        setc    dl
        add     rax, rsi
        cmp     rsi, -1
        mov     QWORD PTR [rcx], rax
        sete    al
        movzx   edx, dl
        and     eax, edi
        or      eax, edx
        ret

This is my fastest Code:

template <class Ty>
constexpr bool add_carry_ux(bool carry_in, Ty src1, Ty src2, Ty* sum_out)
{
    const Ty sum = src1 + src2;
    const bool carry_out = (sum < src1) | ((sum == ~static_cast<Ty>(0)) & carry_in);

    *sum_out = sum + carry_in;

    return carry_out;
}

ASM:

add_carry_ux(bool, unsigned long, unsigned long, unsigned long*):
        add     rsi, rdx
        movzx   eax, dil
        setc    dl
        add     rax, rsi
        cmp     rsi, -1
        mov     QWORD PTR [rcx], rax
        sete    al
        movzx   edx, dl
        and     eax, edi
        or      eax, edx
        ret

回复收藏 0 原文

笔落惊风雨 2024-10-08 23:49:13

int32_t adc(uint32_t first, uint32_t second, uint32_t *carry)
    {
        uint32_t res;
        uint32_t carry_out = 0;
    
        if (!*carry)
        {
            res = first + second;
            *carry = (res < first) && (res < second);
    
            return res;
        }
    
        res = adc(first, second, &carry_out);
        if (*carry)
        {
            res++;
            carry_out |= !res;
        }
        
        *carry = carry_out;
    
        return res;
    }

int32_t adc(uint32_t first, uint32_t second, uint32_t *carry)
    {
        uint32_t res;
        uint32_t carry_out = 0;
    
        if (!*carry)
        {
            res = first + second;
            *carry = (res < first) && (res < second);
    
            return res;
        }
    
        res = adc(first, second, &carry_out);
        if (*carry)
        {
            res++;
            carry_out |= !res;
        }
        
        *carry = carry_out;
    
        return res;
    }

回复收藏 0 原文

~没有更多了~