扫描二进制CPU功能用法

发布于 2025-02-03 18:15:37 字数 170 浏览 2 评论 0 原文

我正在调试一个应用程序,该应用程序可以在Intel CPU上正确运行,但不在另一个较新的AMD处理器上运行。我怀疑它可能已编译以使用某些特定于Intel的说明,从而导致撞车事故。但是,我正在寻找一种验证这一点的方法。我无法访问原始源代码。

是否有一个工具可以扫描二进制文件和列表,它可能会使用哪种CPU特定功能?

I am debugging an application that runs properly on an Intel CPU, but not on another, newer AMD processor. I suspect that it may have been compiled to use certain Intel-specific instructions, which leads to the crashes. However, I am looking for a way to verify this. I do not have access to the original source code.

Is there a tool that can scan a binary and list which CPU-specific features it may use?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

痴梦一场 2025-02-10 18:15:37

有两种好的方法:

  • 在调试器下运行,并查看导致
  • 在模拟器/模拟器下运行非法指导故障的说明,该指令可以向您显示指令组合,例如SDE。

但是,您的想法是静态扫描二进制文件,无法区分仅在检查 cpuid 后才调用的函数中的代码。


使用调试器查看故障指令

选择任何调试器。 GDB易于在任何Linux发行版上安装,也可能在Windows或Mac(或那里的LLDB)上安装。或选择其他任何调试器,例如使用GUID。

运行程序。故障后,使用调试器检查故障指令。

在Intel或AMD的X86 ASM参考手册中查找它,例如 https:// https:///www.felixcloutier.com/x86/ 是英特尔PDF的HTML刮擦。查看此指令的这种形式所需的ISA扩展。

例如,如果让编译器这样做,则此源可以编译以使用AVX-512指令,但首先只需要SSE2来编译。

#include <immintrin.h>
// stores to global vars typically aren't optimized out, even without volatile
int buf[16];
int main(int argc, char **argv)
{
        __m128i v = _mm_set1_epi32(argc);     // broadcast scalar to vector
        _mm_storeu_si128((__m128i*)buf, v);
}

(请在

。 AVX512 -O3 Ill.c 。
然后尝试运行它,例如我的Skylake-Client(非AVX512)GNU/Linux桌面。 (我还使用 strip A.Out 删除符号表(功能名称),例如仅二进制软件版本)。

$ ./a.out 
Illegal instruction (core dumped)
$ gdb a.out
...
(gdb) run
Starting program: /tmp/a.out 

Program received signal SIGILL, Illegal instruction.
0x0000555555555020 in ?? ()

(gdb) disas
No function contains program counter for selected frame.

(gdb) disas /r $pc,+20            # from current program counter to +20 bytes
Dump of assembler code from 0x555555555020 to 0x555555555034:
=> 0x0000555555555020:  62 f2 7d 08 7c c7       vpbroadcastd xmm0,edi
   0x0000555555555026:  c5 f9 7f 05 32 30 00 00 vmovdqa XMMWORD PTR [rip+0x3032],xmm0        # 0x555555558060
   0x000055555555502e:  31 c0   xor    eax,eax
   0x0000555555555030:  c3      ret    
   0x0000555555555031:  66 2e 0f 1f 84 00 00 00 00 00   cs nop WORD PTR [rax+rax*1+0x0]
End of assembler dump.

=&gt; 指示当前程序计数器(x86-64中的RIP,但是GDB便携式定义 $ PC 是任何ISA上的别名。

) > vpbroadcastd xmm0,edi 。 (GCC实施 _MM_SET1_EPI32(ARGC)我们告诉它AVX512可用时

。当然,实际上试图执行不支持的指令是造成崩溃的直接原因。 (也有可能成为间接原因,例如使用 lzcnt eax,ecx 的程序,但是将其运行为 bsr eax,ecx ,然后使用它,然后使用它 整数作为数组索引。

不同的

如果助记符以<<<代码> v ,您找不到条目,​​例如 vaddps ,这是因为该指令在AVX之前存在,并且在其Legacy-sse sse mnemonic下进行了记录,例如SSE1 addps 确实列出了同时 addps VADDPS 编码,包括允许ZMM寄存器的AVX-512编码,X/YMM16..31,以及像 VADDPS YMM0 {K3} {k3} {Z},YMM1,YMM1,YMM2 。那是AVX-512F+VL指令。

无论如何,回到我们的榜样。与故障指令相匹配的表条目如下。在编码操作数的MODR/M(/r )之前,请注意 7C OpCode字节。这是在4个字节前缀之前的,作为交叉检查,这确实是我们正在寻找的OpCode。

evex.128.66.0f38.w0 7c/r vpbroadcastd xmm1 {k1} {z} {z},r32

它需要“ avx512vl avx512f”。 {k1} {z} 是可选的掩码。 R32 是32位通用通用整数寄存器,例如 edi 在这种情况下。 XMM1 表示任何XMM寄存器都可以是该指令的第一个XMM操作数;在这种情况下,GCC选择了XMM0。

我的CPU根本没有AVX-512,因此错误。


SDE指令组合

这应该在Windows或其他任何操作系统上都可以很好地工作。

具有 -mix 选项,其输出包括按必需的ISA扩展名进行分类。参见我如何监视SIMD指令用法的数量< /a> re:使用它。

使用同一示例 A.Out 我与gdb一起使用:

运行/opt/sde-external-8.33.0-2019-02-02-07-lin/sde64 -.mix-./a./a 。 (有些动态链接器中有很多次。)IDK如果有一个选项可以忽略这一点,因为我希望它会在大型程序中变得非常肿。我认为,即使还有更多,它也可能只打印出前几个块。

然后,我们达到所需的部分:

...
# END_TOP_BLOCK_STATS
# EMIT_DYNAMIC_STATS FOR TID 0  OS-TID 1168465 EMIT #1
#
# $dynamic-counts
#
# TID 0
#       opcode                 count
#
*stack-read                                                         8806
*stack-write                                                        8314
*iprel-read                                                         1003
*iprel-write                                                         437

...

*isa-ext-AVX                                                           4
*isa-ext-AVX2                                                          5
*isa-ext-AVX512EVEX                                                    1
*isa-ext-BASE                                                     133338
*isa-ext-LONGMODE                                                    545
*isa-ext-SSE                                                          56
*isa-ext-SSE2                                                       2560
*isa-ext-XSAVE                                                         1
*isa-set-AVX                                                           4
*isa-set-AVX2                                                          5
*isa-set-AVX512F_128                                                   1
*isa-set-CMOV                                                        266
*isa-set-FAT_NOP                                                     891
*isa-set-I186                                                       2676
*isa-set-I386                                                       7626
*isa-set-I486REAL                                                     71
*isa-set-I86                                                      121192
*isa-set-LONGMODE                                                    545
*isa-set-PENTIUMREAL                                                   8
*isa-set-PPRO                                                        608
*isa-set-SSE                                                          56
*isa-set-SSE2                                                       2560
*isa-set-XSAVE                                                         1

ISA-SET-AVX512F_128 的1个计数是在我的CPU上错误的指令,该指令完全不支持AVX-512。 AVX512F_128是AVX512F(Foundation) + avx512vl (比其他允许其他vector length, 512位ZMM寄存器)。

(也被视为 isa-ext-avx512evex 。evex是AVX-512矢量指令的机器代码前缀。AVX-512掩码指令,例如 kandw k0,kandw k0,k1,k1,k2 vpermb 在Skylake-Server CPU上有任何支持AVX-512F

代码>使用vex编码,例如Avx1/avx2 Simd指令, 除了AVX-512以外,每个扩展程序都有一个完全独立的名称。


静态拆卸

can 大多数二进制文件;如果他们没有混淆,则拆卸应该找到所有可能执行的说明。 (以及使用新指令的高性能代码不太可能使用删除拆卸器的黑客,例如跳入直线拆卸的中间,将其视为另一种说明; x86机器代码是可变的字节流- 长度指令。)

但是,这并不能告诉您哪些说明实际上确实执行;有些可能是在检查CPUID后才调用的功能,以找出是否支持必要的扩展。

(尽管我从未寻找过一个工具,但我不知道通过ISA扩展名对其进行分类的工具;通常是要确保他们不使用AVX2指令的开发人员,而AVX2指令仅在AVX1上使用的CPUS使用构建时间检查,或通过在模拟器下或在真实的CPU下运行。)

There are two good approaches:

  • Run under a debugger and look at instruction that caused an illegal-instruction fault
  • Run under a simulator/emulator that can show you an instruction mix, like SDE.

But your idea, statically scanning the binary, can't distinguish code in functions that are only called after checking cpuid.


Using a debugger to look at the faulting instruction

Pick any debugger. GDB is easy to install on any Linux distro, and probably also on Windows or Mac (or lldb there). Or pick any other debugger, e.g. one with a GUID.

Run the program. Once it faults, use the debugger to examine the faulting instruction.

Look it up in Intel or AMD's x86 asm reference manual, e.g. https://www.felixcloutier.com/x86/ is an HTML scrape of Intel's PDFs. See which ISA extension this form of this instruction requires.

For example, this source can compile to use AVX-512 instructions if you let the compiler do so, but only needs SSE2 to compile in the first place.

#include <immintrin.h>
// stores to global vars typically aren't optimized out, even without volatile
int buf[16];
int main(int argc, char **argv)
{
        __m128i v = _mm_set1_epi32(argc);     // broadcast scalar to vector
        _mm_storeu_si128((__m128i*)buf, v);
}

(See it on Godbolt with different compile options.)

Build with gcc -march=skylake-avx512 -O3 ill.c.
Then try to run it, e.g. on my Skylake-client (non-AVX512) GNU/Linux desktop. (I also used strip a.out to remove the symbol table (function names), like a binary-only software release).

$ ./a.out 
Illegal instruction (core dumped)
$ gdb a.out
...
(gdb) run
Starting program: /tmp/a.out 

Program received signal SIGILL, Illegal instruction.
0x0000555555555020 in ?? ()

(gdb) disas
No function contains program counter for selected frame.

(gdb) disas /r $pc,+20            # from current program counter to +20 bytes
Dump of assembler code from 0x555555555020 to 0x555555555034:
=> 0x0000555555555020:  62 f2 7d 08 7c c7       vpbroadcastd xmm0,edi
   0x0000555555555026:  c5 f9 7f 05 32 30 00 00 vmovdqa XMMWORD PTR [rip+0x3032],xmm0        # 0x555555558060
   0x000055555555502e:  31 c0   xor    eax,eax
   0x0000555555555030:  c3      ret    
   0x0000555555555031:  66 2e 0f 1f 84 00 00 00 00 00   cs nop WORD PTR [rax+rax*1+0x0]
End of assembler dump.

The => indicates the current program counter (RIP in x86-64, but GDB portably defines $pc as an alias on any ISA.)

So we faulted on vpbroadcastd xmm0,edi. (The way GCC implemented _mm_set1_epi32(argc) when we told it AVX512 was available.)

That doesn't involve memory access, and the fault was illegal-instruction not segmentation-fault anyway, so we can be sure that actually trying to execute an unsupported instruction was the direct cause of the crash here. (It's also possible for it to be an indirect cause, e.g. a program using lzcnt eax, ecx but an old CPU running it as bsr eax, ecx, and then using that different integer as an array index. lzcnt/bsr is somewhat unlikely for your case since AMD has supported it for longer than Intel.)

So let's check on vpbroadcastd: there are multiple entries for vpbroadcast in Intel's manual:

If the mnemonic starts with v and you can't find an entry, e.g. vaddps, that's because the instruction existed before AVX, and is documented under its legacy-SSE mnemonic, like SSE1 addps which does list both addps and vaddps encodings, including the AVX-512 encodings that allow ZMM registers, x/ymm16..31, and masking like vaddps ymm0{k3}{z}, ymm1, ymm2. That's an AVX-512F+VL instruction.

Anyway, back to our example. The table entry that matches the faulting instruction was the following. Note the 7C opcode byte before the ModR/M (/r) that encodes the operands. That's present after the 4-byte EVEX prefix, as a cross-check that this is indeed the opcode we're looking for.

EVEX.128.66.0F38.W0 7C /r VPBROADCASTD xmm1 {k1}{z}, r32

It requires "AVX512VL AVX512F" according to the table. The {k1}{z} is optional masking. r32 is a 32-bit general-purpose integer register, like edi in this case. xmm1 means any XMM register can be the first xmm operand to this instruction; in this case GCC chose XMM0.

My CPU doesn't have AVX-512 at all, so it faulted.


SDE instruction mix

This should work equally well on Windows or any other OS.

Intel's SDE (Software Development Emulator) has a -mix option, whose output includes categorizing by required ISA extension. See How do I monitor the amount of SIMD instruction usage re: using it.

Using the same example a.out I used with GDB:

Running /opt/sde-external-8.33.0-2019-02-07-lin/sde64 -mix -- ./a.out created a file sde-mix-out.txt which contained a lot of stuff, including stats for how often different basic blocks were executed. (Some in the dynamic linker ran many times.) IDK if there's an option to omit that, because it would get pretty bloated for a large program, I expect. I think it might only print the top few blocks, even if there are many more.

Then we get to the part we want:

...
# END_TOP_BLOCK_STATS
# EMIT_DYNAMIC_STATS FOR TID 0  OS-TID 1168465 EMIT #1
#
# $dynamic-counts
#
# TID 0
#       opcode                 count
#
*stack-read                                                         8806
*stack-write                                                        8314
*iprel-read                                                         1003
*iprel-write                                                         437

...

*isa-ext-AVX                                                           4
*isa-ext-AVX2                                                          5
*isa-ext-AVX512EVEX                                                    1
*isa-ext-BASE                                                     133338
*isa-ext-LONGMODE                                                    545
*isa-ext-SSE                                                          56
*isa-ext-SSE2                                                       2560
*isa-ext-XSAVE                                                         1
*isa-set-AVX                                                           4
*isa-set-AVX2                                                          5
*isa-set-AVX512F_128                                                   1
*isa-set-CMOV                                                        266
*isa-set-FAT_NOP                                                     891
*isa-set-I186                                                       2676
*isa-set-I386                                                       7626
*isa-set-I486REAL                                                     71
*isa-set-I86                                                      121192
*isa-set-LONGMODE                                                    545
*isa-set-PENTIUMREAL                                                   8
*isa-set-PPRO                                                        608
*isa-set-SSE                                                          56
*isa-set-SSE2                                                       2560
*isa-set-XSAVE                                                         1

The 1 count for isa-set-AVX512F_128 is the instruction that would have faulted on my CPU, which doesn't support AVX-512 at all. AVX512F_128 is AVX512F (foundation) + AVX512VL (vector length, allowing vectors other than 512-bit ZMM registers).

(It was also counted as isa-ext-AVX512EVEX. EVEX is the machine-code prefix for AVX-512 vector instructions. AVX-512 mask instructions like kandw k0, k1, k2 use VEX encoding, like AVX1/AVX2 SIMD instructions. But this wouldn't distinguish an Ice Lake new instruction like vpermb faulting on a Skylake-server CPU that supports AVX-512F but not AVX512VBMI)

Everything other than AVX-512 is probably simpler, since there's a fully separate name for each extension.


Static disassembly

You can disassemble most binaries; if they're not obfuscated then disassembly should find all the instructions that might ever execute. (And high-performance code that uses new instructions is unlikely to be using hacks that throw off a disassembler, like jumping into the middle of what straight-line disassembly would see as a different instruction; x86 machine code is a byte-stream of variable-length instructions.)

But that doesn't tell you which instructions actually do execute; some might be in functions that are only called after checking CPUID to find out if the necessary extensions are supported.

(And I don't know of a tool to categorize them by ISA extension, although I've never looked for one; usually developers wanting to make sure they didn't use AVX2 instructions in code that will run on AVX1-only CPUs use build-time checks, or test by running under an emulator or on a real CPU.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文