在 GCC 风格的扩展内联汇编中,是否可以输出“虚拟化”代码?布尔值,例如进位标志?

发布于 2024-08-22 02:43:15 字数 1472 浏览 4 评论 0原文

如果我有以下 C++ 代码来比较两个 128 位无符号整数,并使用内联 amd-64 asm:

struct uint128_t {
    uint64_t lo, hi;
};
inline bool operator< (const uint128_t &a, const uint128_t &b)
{
    uint64_t temp;
    bool result;
    __asm__(
        "cmpq %3, %2;"
        "sbbq %4, %1;"
        "setc %0;"
        : // outputs:
        /*0*/"=r,1,2"(result),
        /*1*/"=r,r,r"(temp)
        : // inputs:
        /*2*/"r,r,r"(a.lo),
        /*3*/"emr,emr,emr"(b.lo),
        /*4*/"emr,emr,emr"(b.hi),
        "1"(a.hi));
    return result;
}

那么它将非常有效地内联,但有一个缺陷。返回值是通过值为 0 或 1 的通用寄存器的“接口”完成的。这增加了两个或三个不必要的额外指令,并降低了本来可以完全优化的比较操作。生成的代码将如下所示:

    mov    r10, [r14]
    mov    r11, [r14+8]
    cmp    r10, [r15]
    sbb    r11, [r15+8]
    setc   al
    movzx  eax, al
    test   eax, eax
    jnz    is_lessthan

如果我使用带有“int”返回值的“sbb %0,%0”而不是带有“bool”返回值的“setc %0”,则仍然有两个额外的指令

    mov    r10, [r14]
    mov    r11, [r14+8]
    cmp    r10, [r15]
    sbb    r11, [r15+8]
    sbb    eax, eax
    test   eax, eax
    jnz    is_lessthan

:想要的是这样的:

    mov    r10, [r14]
    mov    r11, [r14+8]
    cmp    r10, [r15]
    sbb    r11, [r15+8]
    jc     is_lessthan

GCC 扩展内联汇编很棒,否则。但我希望它在各个方面都与内在函数一样好。我希望能够以一个或多个 CPU 标志状态的形式直接返回布尔值,而不必将其“渲染”到通用寄存器中。

这是否可能,或者是否必须修改甚至重构 GCC(以及 Intel C++ 编译器,它也允许使用这种形式的内联汇编)才能使其成为可能?

另外,当我这样做时,还有其他方法可以改进我的比较运算符的表述吗?

If I have the following C++ code to compare two 128-bit unsigned integers, with inline amd-64 asm:

struct uint128_t {
    uint64_t lo, hi;
};
inline bool operator< (const uint128_t &a, const uint128_t &b)
{
    uint64_t temp;
    bool result;
    __asm__(
        "cmpq %3, %2;"
        "sbbq %4, %1;"
        "setc %0;"
        : // outputs:
        /*0*/"=r,1,2"(result),
        /*1*/"=r,r,r"(temp)
        : // inputs:
        /*2*/"r,r,r"(a.lo),
        /*3*/"emr,emr,emr"(b.lo),
        /*4*/"emr,emr,emr"(b.hi),
        "1"(a.hi));
    return result;
}

Then it will be inlined quite efficiently, but with one flaw. The return value is done through the "interface" of a general register with a value of 0 or 1. This adds two or three unnecessary extra instructions and detracts from a compare operation that would otherwise be fully optimized. The generated code will look something like this:

    mov    r10, [r14]
    mov    r11, [r14+8]
    cmp    r10, [r15]
    sbb    r11, [r15+8]
    setc   al
    movzx  eax, al
    test   eax, eax
    jnz    is_lessthan

If I use "sbb %0,%0" with an "int" return value instead of "setc %0" with a "bool" return value, there's still two extra instructions:

    mov    r10, [r14]
    mov    r11, [r14+8]
    cmp    r10, [r15]
    sbb    r11, [r15+8]
    sbb    eax, eax
    test   eax, eax
    jnz    is_lessthan

What I want is this:

    mov    r10, [r14]
    mov    r11, [r14+8]
    cmp    r10, [r15]
    sbb    r11, [r15+8]
    jc     is_lessthan

GCC extended inline asm is wonderful, otherwise. But I want it to be just as good as an intrinsic function would be, in every way. I want to be able to directly return a boolean value in the form of the state of a CPU flag or flags, without having to "render" it into a general register.

Is this possible, or would GCC (and the Intel C++ compiler, which also allows this form of inline asm to be used) have to be modified or even refactored to make it possible?

Also, while I'm at it — is there any other way my formulation of the compare operator could be improved?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

趁微风不噪 2024-08-29 02:43:15

差不多 7 年后了,是的,gcc 终于添加了对“输出标志”的支持(在 6.1.0 中添加,于 2016 年 4 月发布)。详细的文档位于此处,但简而言之,它看起来像这个:

/* Test if bit 0 is set in 'value' */
char a;

asm("bt $0, %1"
    : "=@ccc" (a)
    : "r" (value) );

if (a)
   blah;

要理解 =@ccc:输出约束(需要 =)的类型为 @cc ,后跟要使用的条件代码(在本例中,c 引用进位标志)。

好吧,对于您的具体情况来说,这可能不再是问题(因为 gcc 现在支持直接比较 128 位数据类型),但(目前)有 1,326 人查看了这个问题。显然有人对这个功能感兴趣。

现在我个人更喜欢根本不使用内联汇编的思想流派。但如果您必须,是的,您可以(现在)“输出”标志。

FWIW。

Here we are almost 7 years later, and YES, gcc finally added support for "outputting flags" (added in 6.1.0, released ~April 2016). The detailed docs are here, but in short, it looks like this:

/* Test if bit 0 is set in 'value' */
char a;

asm("bt $0, %1"
    : "=@ccc" (a)
    : "r" (value) );

if (a)
   blah;

To understand =@ccc: The output constraint (which requires =) is of type @cc followed by the condition code to use (in this case c to reference the carry flag).

Ok, this may not be an issue for your specific case anymore (since gcc now supports comparing 128bit data types directly), but (currently) 1,326 people have viewed this question. Apparently there's some interest in this feature.

Now I personally favor the school of thought that says don't use inline asm at all. But if you must, yes you can (now) 'output' flags.

FWIW.

柠檬色的秋千 2024-08-29 02:43:15

我不知道有什么方法可以做到这一点。您可能会或可能不会认为这是一种改进:

inline bool operator< (const uint128_t &a, const uint128_t &b)
{
    register uint64_t temp = a.hi;
    __asm__(
        "cmpq %2, %1;"
        "sbbq $0, %0;"
        : // outputs:
        /*0*/"=r"(temp)
        : // inputs:
        /*1*/"r"(a.lo),
        /*2*/"mr"(b.lo),
        "0"(temp));

    return temp < b.hi;
}

它会产生如下结果:

mov    rdx, [r14]
mov    rax, [r14+8]
cmp    rdx, [r15]
sbb    rax, 0
cmp    rax, [r15+8]
jc is_lessthan

I don't know a way to do this. You may or may not consider this an improvement:

inline bool operator< (const uint128_t &a, const uint128_t &b)
{
    register uint64_t temp = a.hi;
    __asm__(
        "cmpq %2, %1;"
        "sbbq $0, %0;"
        : // outputs:
        /*0*/"=r"(temp)
        : // inputs:
        /*1*/"r"(a.lo),
        /*2*/"mr"(b.lo),
        "0"(temp));

    return temp < b.hi;
}

It produces something like:

mov    rdx, [r14]
mov    rax, [r14+8]
cmp    rdx, [r15]
sbb    rax, 0
cmp    rax, [r15+8]
jc is_lessthan
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文