将 ADC(带进位相加)组装到 C++
有一个 x86 汇编指令 ADC
。我发现这意味着“带进位添加”。这是什么意思/做什么?如何在 C++ 中实现该指令的行为?
信息:
在 Windows 上编译。我使用的是 32 位 Windows 安装。我的处理器是 Intel 的 Core 2 Duo。
There is an x86 assembly instruction ADC
. I've found this means "Add with carry". What does this mean/do? How would one implement the behavior of this instruction in C++?
INFO:
Compiled on Windows. I'm using a 32-bit Windows Installation. My processor is Core 2 Duo from Intel.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
ADC 与 ADD 相同,但如果设置了处理器的进位标志,则添加额外的 1。
ADC is the same as ADD but adds an extra 1 if processor's carry flag is set.
ADC 行为可以用 C 和 C++ 进行模拟。以下示例将两个数字相加(存储为无符号数组,因为它们太大而无法放入单个无符号数组中)。
希望这有帮助。
The ADC behaviour can be simulated in both C and C++. The following example adds two numbers (stored as arrays of unsigned as they are too large to fit into a single unsigned).
Hope this helps.
C++ 语言没有任何进位标志的概念,因此围绕 制作一个内部函数包装器
ADC
指令 很笨拙。然而,英特尔还是这么做了:unsigned char _addcarry_u32(unsigned char c_in、unsigned a、unsigned b、unsigned * out);
。最后我检查了一下,gcc 在这方面做得很差(将进位结果保存到整数寄存器中,而不是将其保留在 CF 中),但希望英特尔自己的编译器能做得更好。另请参阅 x86 汇编文档的标记 wiki。
当添加比单个寄存器更宽的整数时,编译器将使用 ADC,例如在 32 位代码中添加
int64_t
或在 64 位代码中添加__int128_t
。Godbolt 编译器浏览器。 clang 的 -fverbose-asm 不是很冗长,但 gcc 5.3 / 6.1 浪费了两条 mov 指令,因此可读性较差。
有时,您可以手持编译器发出
adc
或以其他方式使用成语uint64_t sum = a+b;
进行add
的进位> /进位=总和<一个;
。但是,当前的编译器无法扩展此功能以从adc
获取进位,而不是从add
获取;c+d+carry_in
可以一路回绕,编译器无法优化对c+d 中每个
如果你安全的话。+
执行的多重检查+携带Clang
_ExtInt
我知道有一种方法可以获取 add/adc/.../adc 链:Clang 的新
_ExtInt(width)
功能提供了固定的-bit-width 任何大小的类型,最多 16,777,215 位(博客帖子)。它于 2020 年 4 月 21 日添加到 clang 的开发版本中,因此尚未出现在任何发布版本中。这有望在某个时候出现在 ISO C 和/或 C++ 中; N2472 提案显然正在“被ISO WG14 C 语言委员会积极考虑”
(更新:现在在 C23 中为
_BitInt(256)
,尽管支持的最大位宽取决于实现,可能低至 128。 Clang 支持将其作为 C++ 中的扩展,但 GCC 和 MSVC 不支持。)使用 x86-64 的 clang trunk
-O2
编译如下(Godbolt):显然
_ExtInt< /code> 按整数寄存器中的值传递,直到调用约定用完寄存器为止。 (至少在这个早期版本中;当 x86-64 SysV 比 2 个或 3 个寄存器宽时,也许 x86-64 SysV 应该将其归类为“内存”,例如大于 16 字节的结构。虽然比结构更多,但将其放在寄存器中很可能是只需将其他参数放在前面,这样它们就不会被替换。)
第一个 _ExtInt 参数位于 R8:RCX:RDX:RSI 中,第二个参数的低位 qword 位于 R9 中,其余的位于内存中。
指向返回值对象的指针作为 RDI 中隐藏的第一个参数传递; x86-64 System V 仅在最多 2 个整数寄存器 (RDX:RAX) 中返回,这不会改变这一点。
The C++ language doesn't have any concept of a carry flag, so making an intrinsic function wrapper around the
ADC
instruction is clunky. However, Intel did it anyway:unsigned char _addcarry_u32 (unsigned char c_in, unsigned a, unsigned b, unsigned * out);
. Last I checked, gcc did a poor job with this (saving the carry result into an integer register, instead of leaving it in CF), but hopefully Intel's own compiler does better.See also the x86 tag wiki for assembly documentation.
The compiler will use ADC for you when adding integers wider than a single register, e.g. adding
int64_t
in 32bit code, or__int128_t
in 64bit code.asm output from the Godbolt compiler explorer. clang's
-fverbose-asm
isn't very vebose, but gcc 5.3 / 6.1 wastes twomov
instructions so it's less readable.You can sometimes hand-hold compilers into emitting an
adc
or otherwise using the carry-out ofadd
using the idiomuint64_t sum = a+b;
/carry = sum < a;
. But extending this to get a carry-out from anadc
instead ofadd
is not possible with current compilers;c+d+carry_in
can wrap all the way around, and compilers don't manage to optimize the multiple checks for carry out on each+
inc+d+carry
if you do it safely.Clang
_ExtInt
There is one way I'm aware of to get a chain of add/adc/.../adc: Clang's new
_ExtInt(width)
feature that provides fixed-bit-width types of any size up to 16,777,215 bits (blog post). It was added to clang's development version on April 21, 2020, so it's not yet in any released version.This will hopefully show up in ISO C and/or C++ at some point; The N2472 proposal is apparently being "being actively considered by the ISO WG14 C Language Committee"
(Update: this is now in C23 as
_BitInt(256)
, although the max supported bit-width is implementation-dependent and might be as low as 128. Clang supports it as an extension in C++, but GCC and MSVC don't.)compiles as follows with clang trunk
-O2
for x86-64 (Godbolt):Apparently
_ExtInt
is passed by value in integer registers until the calling convention runs out of registers. (At least in this early version; Perhaps x86-64 SysV should class it as "memory" when it's wider than 2 or maybe 3 registers, like structs larger than 16 bytes. Although moreso than structs, having it in registers is likely to be useful. Just put other args first so they're not displaced.)The first _ExtInt arg is in R8:RCX:RDX:RSI, and the second has its low qword in R9, with the rest in memory.
A pointer to the return-value object is passed as a hidden first arg in RDI; x86-64 System V only ever returns in up to 2 integer registers (RDX:RAX) and this doesn't change that.
来自此处(已损坏)或此处
从这里开始的一切,请记住,它几乎属于实现定义的行为区域。
下面是一个适用于 VS 2010(32 位,WinXp)的小示例
注意:$7.4/1-“asm 声明是有条件支持的;其含义是实现定义的。[注:通常它用于通过汇编器的实现。-结束注]”
From here (broken) or here
Everything from here on, remember that it falls pretty much into the zone of implementation defined behavior.
Here's a small sample that works for VS 2010 (32-bit, WinXp)
Caveat: $7.4/1- "The asm declaration is conditionally-supported; its meaning is implementation-defined. [ Note: Typically it is used to pass information through the implementation to an assembler. —end note ]"
在 x86-64 中,ADD 指令将两个 64 位整数相加:
add rax, rbx
的作用是rax = rax + rbx
。当存在无符号溢出时(=当结果不适合 64 位时),它还将进位标志设置为 1,否则它将进位标志设置为 0。
在 C++ 中,您可以像这样模拟 ADD
: ADC指令类似于ADD,但将进位标志添加到结果中:
adc rax, rbx
的作用是rax = rax + rbx + Carry_flag
。如果存在无符号溢出,它还会设置进位标志。
在 C++ 中:
ADD 和 ADC 指令可用于添加大整数(带有n“数字”)。
对最低有效数字使用 ADD,然后使用 ADC (n – 1) 次将其余数字相加。
这就是“教科书添加算法”。
例如,将 256 位大整数与四个 64 位“数字”相加:
最新版本的
clang
编译器可以识别大整数加法和 使用ADD/ADC来实现。In x86-64, the ADD instruction adds two 64-bit integers:
add rax, rbx
doesrax = rax + rbx
.It also sets the carry flag to 1 when there was unsigned overflow (= when the result didn't fit in 64 bits), otherwise it sets the carry flag to 0.
In C++, you can simulate ADD like this:
The ADC instruction is like ADD, but adds the carry flag to the result:
adc rax, rbx
doesrax = rax + rbx + carry_flag
.It also sets the carry flag if there was unsigned overflow.
In C++:
The ADD and ADC instructions can be used to add big integers (with n "digits").
Use ADD for the least significant digits, then use ADC (n – 1) times to add the rest of the digits.
This is the “schoolbook addition algorithm”.
For example, adding 256-bit big integers with four 64-bit "digits":
Recent versions of the
clang
compiler can recognize big integer addition and use ADD/ADC to implement it.这里面有一个bug。试试这个输入:
结果应该是 {0, 0, 1, ...} 但结果是 {0, 0, 0, ...}
将此行:更改
为:
修复它。
There is a bug in this. Try this input:
The result should be {0, 0, 1, ...} but the result is {0, 0, 0, ...}
Changing this line:
to this:
fixes it.
有__builtin_uadd_overflow内在函数可以通过编译器最佳地使用add/adc来构建加法链。
来自旧文档(早为 gcc 5!!!!)
https: //gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html
https://gcc.gnu.org/onlinedocs/gcc-5.3.0/gcc/Integer-Overflow-Builtins.html
两者clang< /strong> 和 GCC 确实实现了这些内置函数,并且我验证了生成的代码在 x86_64 和 aarch64 目标上都是最佳的。
此功能现已在最新的产品更新发布版本 Visual Studio 2022 中发布版本 17.7(例如 _add_overflow_i8 ....)
该线程相当旧,我对原始问题“如何在 C++ 中实现该指令的行为”提供了答复。
There are __builtin_uadd_overflow intrinsics to build additions chains with the optimal use of add/adc by the compiler.
From old documentation (as old as gcc 5 !!!!)
https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html
https://gcc.gnu.org/onlinedocs/gcc-5.3.0/gcc/Integer-Overflow-Builtins.html
Both clang and GCC do implement these builtins, and I verified the generated code is optimal on both x86_64 and aarch64 targets
This feature has now been released in the latest product update released version Visual Studio 2022 version 17.7 (e.g. _add_overflow_i8 ....)
This thread is quite old, I provide a response to the original question "How would one implement the behavior of this instruction in C++" .
这是我最快的代码:
ASM:
This is my fastest Code:
ASM: