缓冲区中的 SSE 指令

发布于 2024-11-30 03:16:15 字数 106 浏览 2 评论 0原文

如果我有 x86 的指令缓冲区,是否有一种简单的方法可以检查指令是否是 SSE 指令,而不必检查操作码是否在 SSE 指令的范围内?我的意思是是否有可以检查的通用指令前缀或处理器状态(例如寄存器)?

If I have an instruction buffer for x86 is there an easy way to check if an instruction is an SSE instruction without having to check if the opcode is within the ranges for the SSE instructions? By this I mean is there a common instruction prefix or processor state (such as a register) that can be checked?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

丶情人眼里出诗心の 2024-12-07 03:16:15

(已更新)

根据您如何定义简单,答案是是或否:)

指令格式在 Intel 64 和 IA-32 架构软件开发人员手册
第 2A 卷和第 2B 卷合并:指令集参考,AZ
。有问题的部分之一是前缀。其中一些对于某些 SSE 指令 (66 F2 F3) 是必需的,而它们对于其他操作码具有不同的含义(操作数大小覆盖、 REPNZREPZ)。

要了解如何使用前缀来区分不同的指令,请考虑将两个 xmm 寄存器相加的这 4 种形式(通过 objdump -D -b binary -m i386:x86-64:intel --insn-width 获得的输出=12):

0f 58 c0                                addps  xmm0,xmm0
66 0f 58 c0                             addpd  xmm0,xmm0
f3 0f 58 c0                             addss  xmm0,xmm0
f2 0f 58 c0                             addsd  xmm0,xmm0

似乎默认是添加两个单精度标量,66(通常:操作数大小覆盖前缀)选择双精度版本, F3 (repz) 选择打包的单版本,最后 F2 (repnz) 选择打包的双版本。

此外,它们有时可以组合在一起,在 64 位模式下,您还必须担心 REX 前缀 (第 2-9 页)。下面的示例是 64 位模式下具有不同前缀的大致相同基本指令的不同版本。我不知道您是否关心 AVX 指令,但无论如何我都包含了一个作为示例:

0f 51 ca                                sqrtps xmm1,xmm2
0f 51 0c 85 0a 00 00 00                 sqrtps xmm1,XMMWORD PTR [rax*4+0xa]
65 0f 51 0c 85 0a 00 00 00              sqrtps xmm1,XMMWORD PTR gs:[rax*4+0xa]
67 0f 51 0c 85 0a 00 00 00              sqrtps xmm1,XMMWORD PTR [eax*4+0xa]
65 67 0f 51 0c 85 0a 00 00 00           sqrtps xmm1,XMMWORD PTR gs:[eax*4+0xa]
f0 65 67 0f 51 0c 85 0a 00 00 00        lock sqrtps xmm1,XMMWORD PTR gs:[eax*4+0xa]
c5 fd 51 ca                             vsqrtpd ymm1,ymm2
c5 fc 51 0c 85 0a 00 00 00              vsqrtps ymm1,YMMWORD PTR [rax*4+0xa]
65 c5 fc 51 0c 85 0a 00 00 00           vsqrtps ymm1,YMMWORD PTR gs:[rax*4+0xa]
67 c5 fc 51 0c 85 0a 00 00 00           vsqrtps ymm1,YMMWORD PTR [eax*4+0xa]
65 67 c5 fc 51 0c 85 0a 00 00 00        vsqrtps ymm1,YMMWORD PTR gs:[eax*4+0xa]
f0 65 67 c5 fc 51 0c 85 0a 00 00 00     lock vsqrtps ymm1,YMMWORD PTR gs:[eax*4+0xa]

因此据我所知,您始终必须循环遍历所有前缀来确定指令是否是 SSE 指令。

更新:
另一个复杂之处是存在仅 ModRM 编码不同的指令。考虑一下:

df 00 fild              WORD PTR [rax] # Non-SSE instruction: DF /0
df 08 fisttp            WORD PTR [rax] # SSE instruction: DF /1

要找到这些以及所有其他可以编码的方式,最简单的方法是使用操作码映射

因为我一直想考虑编写一个反汇编程序,所以我认为看看它需要做什么将是一个有趣的挑战。它应该找到大多数 SSE 指令,尽管显然我不能也不会保证它。我将上面的操作码映射转换为代码通过的一系列测试(tests.c - 太大而无法内联)。该代码测试一系列包含操作码编码的十六进制数字的文本字符串(它在第一个非十六进制数字处停止解析,字符串中的最后一个字符表示它是否是 SSE 指令)。

它首先扫描所有前缀,然后使用操作码表来测试指令是否与额外逻辑匹配,以处理多字节操作码所需的嵌套表以及匹配后续 modrm 字节中的数字的需要。

sseDetect.c:

#include <stdio.h> 
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <ctype.h>

#include "inst_table.h"

enum { PREFIX_66=OP_66_SSE, PREFIX_F2=OP_F2_SSE, PREFIX_F3=OP_F3_SSE  };

static int check_prefixes(int prefixes, int op_type) {
    if (op_type & OP_ALWAYS_SSE) return 1;
    if ((op_type & OP_66_SSE) && (prefixes & PREFIX_66)) return 1;
    if ((op_type & OP_F2_SSE) && (prefixes & PREFIX_F2)) return 1;
    if ((op_type & OP_F3_SSE) && (prefixes & PREFIX_F3)) return 1;
    return 0;
}

int isInstructionSSE(const uint8_t* code, int length)
{
    int position = 0;

    // read prefixes
    int prefixes = 0;
    while (position < length) {
        uint8_t b = code[position];

        if (b == 0x66) {
            prefixes |= PREFIX_66;
            position++;
        } else if (b == 0xF2) { 
            prefixes |= PREFIX_F2;
            position++;
        } else if (b == 0xF3) { 
            prefixes |= PREFIX_F3; 
            position++;
        } else if (b >= 0x40 && b <= 0x4F) {
            //prefixes |= PREFIX_REX;
            position++;
            break; // opcode must follow REX
        } else if (b == 0x2E || b == 0x3E || b == 0x26 || b == 0x36 || b == 0x64 || b == 0x65 || b == 0x67 || b == 0xF0) {
            // ignored prefix
            position++;    
        } else {
            break;
        }
    }

    // read opcode
    const uint16_t* op_table = op;
    int op_length = 0;
    while (position < length) {
        uint8_t b = code[position];
        uint16_t op_type = op_table[b];
        if (op_type & OP_EXTENDED) {
            op_length++;
            position++;
            // hackish
            if (op_length == 1 && b == 0x0F) op_table = op_0F;
            else if (op_length == 2 && b == 0x01) op_table = op_0F_01;
            else if (op_length == 2 && b == 0x38) op_table = op_0F_38;
            else if (op_length == 2 && b == 0x3A) op_table = op_0F_3A;
            else { printf("\n\n%2.2X\n",b); abort(); }
        } else if (op_type & OP_DIGIT) {
            break;
        } else {
            return check_prefixes(prefixes, op_type);
        }
    } 

    // optionally read a digit

    // find digits we need can match in table
    uint8_t match_digits = (op_table[code[position]] & OP_DIGIT_MASK) >> OP_DIGIT_SHIFT;

    // consume the byte
    op_length++;
    position++;
    if (position >= length) {
        return 0;
    }

    uint8_t digit = (code[position]>>3)&7; // reg part of modrm

    return (match_digits & (1 << digit)) != 0;
}

static int read_code(const char* str, uint8_t** code, int* length)
{
    int size = 1000;
    *length = 0;
    *code = malloc(size);
    if (!*code) {
        printf("out of memory\n");
        return 0;
    }

    while (*str) {
        char* endptr;
        unsigned long val = strtoul(str, &endptr, 16);
        if (str == endptr) {
            break;
        } 

        if (val > 255) {
            printf("%lX is out of range\n", val);
            goto error;
            return 0;
        }

        (*code)[*length] = (uint8_t)val;

        if (++*length >= size) {
            printf("needs resize, not implemented\n");
            goto error;
        }

        str = endptr;
    }

    if (*length == 0) {
        printf("No instruction bytes found\n");
        goto error;
    }

    return 1;

error:
    free(*code);
    return 0;
}

static void test(const char* str)
{
    uint8_t* code;
    int length;
    if (!read_code(str, &code, &length)) {
        puts(str);
        exit(1);
    }
    char is_sse = isInstructionSSE(code, length) ? 'Y' : 'N';
    char should_be_sse = str[strlen(str)-1];
    free(code);
    if (should_be_sse != is_sse) {
        printf("(%c) %c %s\n", should_be_sse, is_sse, str);
        exit(1);
    }
}

int main() 
{
#include "tests.c"
    test("48 ba 39 00 00 00 00 00 00 00           # movabs rdx,0x39 N");
    test("48 b8 00 00 00 00 00 00 00 00           # movabs rax,0x0 N");
    test("48 b9 14 00 00 00 00 00 00 00           # movabs rcx,0x14 N");
    test("48 6b c0 0a                             # imul   rax,rax,0xa N");
    test("48 83 ea 30                             # sub    rdx,0x30 N");
    test("48 01 d0                                # add    rax,rdx N");
    test("48 ff c9                                # dec    rcx N");
    test("75 f0                                   # jne    0x1e N");
    test("0f 51 ca                                # sqrtps xmm1,xmm2 Y");
    test("0f 51 0c 85 0a 00 00 00                 # sqrtps xmm1,XMMWORD PTR [rax*4+0xa] Y");
    test("65 0f 51 0c 85 0a 00 00 00              # sqrtps xmm1,XMMWORD PTR gs:[rax*4+0xa] Y");
    test("67 0f 51 0c 85 0a 00 00 00              # sqrtps xmm1,XMMWORD PTR [eax*4+0xa] Y");
    test("65 67 0f 51 0c 85 0a 00 00 00           # sqrtps xmm1,XMMWORD PTR gs:[eax*4+0xa] Y");
    test("f0 65 67 0f 51 0c 85 0a 00 00 00        # lock sqrtps xmm1,XMMWORD PTR gs:[eax*4+0xa] Y");
    test("f0 65 67 f3 43 0f 5c 8c 81 2a 2a 00 00  # lock subss xmm1, [gs:r8d*4+r9d+0x2A2A] Y");
    test("0f 58 c0                                # addps  xmm0,xmm0 Y");
    test("66 0f 58 c0                             # addpd  xmm0,xmm0 Y");
    test("f3 0f 58 c0                             # addss  xmm0,xmm0 Y");
    test("f2 0f 58 c0                             # addsd  xmm0,xmm0 Y");
    test("df 04 25 2c 00 00 00                    # fild   WORD PTR ds:0x2c N");
    test("df 0c 25 2c 00 00 00                    # fisttp WORD PTR ds:0x2c Y");
    test("67 0f ae 10                             # ldmxcsr DWORD PTR [eax] Y");
    test("67 0f ae 18                             # stmxcsr DWORD PTR [eax] Y");
    test("0f ae 00                                # fxsave [rax] N");
    test("0f ae e8                                # lfence Y");
    test("0f ae f0                                # mfence Y");
    test("0f ae f8                                # sfence Y");
    test("67 0f ae 38                             # clflush BYTE PTR [eax] Y");
    test("67 0f 18 00                             # prefetchnta BYTE PTR [eax] Y");
    test("0f 18 0b                                # prefetcht0 BYTE PTR [rbx] Y");
    test("67 0f 18 11                             # prefetcht1 BYTE PTR [ecx] Y");
    test("0f 18 1a                                # prefetcht2 BYTE PTR [rdx] Y");
    test("df 08                                   # fisttp WORD PTR [rax] Y");
    test("df 00                                   # fild   WORD PTR [rax] N");

    printf("All tests passed\n");
    return 0;
}

inst_table.h:

// Table Element format:
// Bit: 0 SSE instruction if 66 prefix
//      1 SSE instruction if F2 prefix
//      2 SSE instruction if F3 prefix
//      3 Extended table
//      4 Instruction is always SSE
//      5 SSE instruction if ModRM byte matches digit(s) 
//      6 -----
//      7 -----
//      8 SSE if ModRM has reg = 0
//      9 SSE if ModRM has reg = 1
//      .
//      . That is it matches instructoins on the form XX XX /digit
//      .
//      15 SSE if modRM has reg = 7 

#define OP_66_SSE      0x0001 // SSE if 66 prefix
#define OP_F2_SSE      0x0002 // SSE if F2 prefix
#define OP_F3_SSE      0x0004 // SSE if F3 prefix
#define OP_EXTENDED    0x0008 // continue with extended table
#define OP_ALWAYS_SSE  0x0010
#define OP_DIGIT       0x0020

#define OP_DIGIT_MASK  0xFF00
#define OP_DIGIT_SHIFT      8
#define OP_MATCH_DIGIT(d) (OP_DIGIT | (1 << (d + OP_DIGIT_SHIFT)))

static const uint16_t op[256] = {
    [0x0F] = OP_EXTENDED,
    [0x90] = OP_F3_SSE,
    [0xDB] = OP_MATCH_DIGIT(1), // DB /1: FISTTP
    [0xDD] = OP_MATCH_DIGIT(1), 
    [0xDF] = OP_MATCH_DIGIT(1),
};

static const uint16_t op_0F[256] = {
    [0x01] = OP_EXTENDED, 
    [0x10] = OP_ALWAYS_SSE, // 0F 10 MOVUPS, F3 0F 10 MOVSS ... 
    [0x11] = OP_ALWAYS_SSE,
    [0x12] = OP_ALWAYS_SSE,
    [0x13] = OP_ALWAYS_SSE,
    [0x14] = OP_ALWAYS_SSE,
    [0x15] = OP_ALWAYS_SSE,
    [0x16] = OP_ALWAYS_SSE,
    [0x17] = OP_ALWAYS_SSE,
    [0x18] = OP_MATCH_DIGIT(0)|OP_MATCH_DIGIT(1)|OP_MATCH_DIGIT(2)|OP_MATCH_DIGIT(3),
    [0x28] = OP_ALWAYS_SSE,
    [0x29] = OP_ALWAYS_SSE,
    [0x2A] = OP_ALWAYS_SSE,
    [0x2B] = OP_ALWAYS_SSE,
    [0x2C] = OP_ALWAYS_SSE,
    [0x2D] = OP_ALWAYS_SSE,
    [0x2E] = OP_ALWAYS_SSE,
    [0x2F] = OP_ALWAYS_SSE,
    [0x38] = OP_EXTENDED,
    [0x3A] = OP_EXTENDED,
    [0x50] = OP_ALWAYS_SSE,
    [0x51] = OP_ALWAYS_SSE,
    [0x52] = OP_ALWAYS_SSE,
    [0x53] = OP_ALWAYS_SSE,
    [0x54] = OP_ALWAYS_SSE,
    [0x55] = OP_ALWAYS_SSE,
    [0x56] = OP_ALWAYS_SSE,
    [0x57] = OP_ALWAYS_SSE,
    [0x58] = OP_ALWAYS_SSE,
    [0x59] = OP_ALWAYS_SSE,
    [0x5A] = OP_ALWAYS_SSE,
    [0x5B] = OP_ALWAYS_SSE,
    [0x5C] = OP_ALWAYS_SSE,
    [0x5D] = OP_ALWAYS_SSE,
    [0x5E] = OP_ALWAYS_SSE,
    [0x5F] = OP_ALWAYS_SSE,
    [0x60] = OP_66_SSE,
    [0x61] = OP_66_SSE,
    [0x62] = OP_66_SSE,
    [0x63] = OP_66_SSE,
    [0x64] = OP_66_SSE,
    [0x65] = OP_66_SSE,
    [0x66] = OP_66_SSE,
    [0x67] = OP_66_SSE,
    [0x68] = OP_66_SSE,
    [0x69] = OP_66_SSE,
    [0x6A] = OP_66_SSE,
    [0x6B] = OP_66_SSE,
    [0x6C] = OP_66_SSE,
    [0x6D] = OP_66_SSE,
    [0x6E] = OP_66_SSE,
    [0x6F] = OP_66_SSE | OP_F3_SSE,
    [0x70] = OP_ALWAYS_SSE,
    [0x71] = OP_66_SSE,
    [0x72] = OP_66_SSE,
    [0x73] = OP_66_SSE,
    [0x74] = OP_66_SSE,
    [0x75] = OP_66_SSE,
    [0x76] = OP_66_SSE,
    [0x77] = OP_66_SSE,
    [0x78] = OP_66_SSE,
    [0x79] = OP_66_SSE,
    [0x7A] = OP_66_SSE,
    [0x7B] = OP_66_SSE,
    [0x7C] = OP_66_SSE | OP_F2_SSE,
    [0x7D] = OP_66_SSE | OP_F2_SSE,
    [0x7E] = OP_66_SSE | OP_F3_SSE,
    [0x7F] = OP_66_SSE | OP_F3_SSE,
    [0xAE] = OP_MATCH_DIGIT(2)|OP_MATCH_DIGIT(3)|OP_MATCH_DIGIT(5)|OP_MATCH_DIGIT(6)|OP_MATCH_DIGIT(7),
    [0xC2] = OP_ALWAYS_SSE,
    [0xC3] = OP_ALWAYS_SSE,
    [0xC4] = OP_ALWAYS_SSE,
    [0xC5] = OP_ALWAYS_SSE,
    [0xC6] = OP_ALWAYS_SSE,
    [0xD0] = OP_66_SSE | OP_F2_SSE,
    [0xD1] = OP_66_SSE,
    [0xD2] = OP_66_SSE,
    [0xD3] = OP_66_SSE,
    [0xD4] = OP_ALWAYS_SSE,
    [0xD5] = OP_66_SSE,
    [0xD6] = OP_66_SSE | OP_F2_SSE | OP_F3_SSE,
    [0xD7] = OP_ALWAYS_SSE,
    [0xD8] = OP_66_SSE,
    [0xD9] = OP_66_SSE,
    [0xDA] = OP_ALWAYS_SSE,
    [0xDB] = OP_66_SSE,
    [0xDC] = OP_66_SSE,
    [0xDD] = OP_66_SSE,
    [0xDE] = OP_ALWAYS_SSE,
    [0xDF] = OP_66_SSE,
    [0xE0] = OP_ALWAYS_SSE,
    [0xE1] = OP_66_SSE,
    [0xE2] = OP_66_SSE,
    [0xE3] = OP_ALWAYS_SSE,
    [0xE4] = OP_ALWAYS_SSE,
    [0xE5] = OP_66_SSE,
    [0xE6] = OP_66_SSE | OP_F2_SSE | OP_F3_SSE,
    [0xE7] = OP_ALWAYS_SSE,
    [0xE8] = OP_66_SSE,
    [0xE9] = OP_66_SSE,
    [0xEA] = OP_ALWAYS_SSE,
    [0xEB] = OP_66_SSE,
    [0xEC] = OP_66_SSE,
    [0xED] = OP_66_SSE,
    [0xEE] = OP_ALWAYS_SSE,
    [0xEF] = OP_66_SSE,
    [0xF0] = OP_F2_SSE,
    [0xF1] = OP_66_SSE,
    [0xF2] = OP_66_SSE,
    [0xF3] = OP_66_SSE,
    [0xF4] = OP_ALWAYS_SSE,
    [0xF5] = OP_66_SSE,
    [0xF6] = OP_ALWAYS_SSE,
    [0xF7] = OP_ALWAYS_SSE,
    [0xF8] = OP_66_SSE,
    [0xF9] = OP_66_SSE,
    [0xFA] = OP_66_SSE,
    [0xFB] = OP_ALWAYS_SSE,
    [0xFC] = OP_66_SSE,
    [0xFD] = OP_66_SSE,
    [0xFE] = OP_66_SSE,
};

static const uint16_t op_0F_01[256] = {
    [0xC8] = OP_ALWAYS_SSE, // 0F 01 C8: MONITOR
    [0xC9] = OP_ALWAYS_SSE,
};


static const uint16_t op_0F_38[256] = {
    [0xF0] = OP_F2_SSE, // F2 0F 38 F0: CRC32
    [0xF1] = OP_F2_SSE,
};

static const uint16_t op_0F_3A[256] = {
    [0x08] = OP_66_SSE, // 66 0F 3A 08: ROUNDPS
    [0x09] = OP_66_SSE,
    [0x0A] = OP_66_SSE,
    [0x0B] = OP_66_SSE,
    [0x0C] = OP_66_SSE,
    [0x0D] = OP_66_SSE,
    [0x0E] = OP_66_SSE,
    [0x0F] = OP_ALWAYS_SSE,
    [0x14] = OP_66_SSE,
    [0x15] = OP_66_SSE,
    [0x16] = OP_66_SSE,
    [0x17] = OP_66_SSE,
    [0x20] = OP_66_SSE,
    [0x21] = OP_66_SSE,
    [0x22] = OP_66_SSE,
    [0x40] = OP_66_SSE,
    [0x41] = OP_66_SSE,
    [0x42] = OP_66_SSE,
    [0x60] = OP_66_SSE,
    [0x61] = OP_66_SSE,
    [0x62] = OP_66_SSE,
    [0x63] = OP_66_SSE,
};

(Updated)

Depending on how you define easy the answer is either yes or no :)

The instruction format is described in section 2 of the Intel 64 and IA-32 Architectures Software Developer's Manual
Combined Volumes 2A and 2B: Instruction Set Reference, A-Z
. One of the problematic parts is the prefixes. Some of these are mandatory for some SSE instructions (66 F2 F3), while they have a different meaning for other opcodes (operand size override, REPNZ and REPZ).

To see how the prefixes are used to distinguish between different instructions, consider these 4 forms of adding two xmm registers together (output obtained with objdump -D -b binary -m i386:x86-64:intel --insn-width=12):

0f 58 c0                                addps  xmm0,xmm0
66 0f 58 c0                             addpd  xmm0,xmm0
f3 0f 58 c0                             addss  xmm0,xmm0
f2 0f 58 c0                             addsd  xmm0,xmm0

It seems that the default is to add two single precision scalars, 66 (normally: operand size override prefix) selects the double precision version, F3 (repz) selects the packed single version and finally F2 (repnz) selects the packed double version.

Additionally they can sometimes be combined and in 64-bit mode you also have to worry about the REX prefix (pg. 2-9). Here is an example are different versions of roughly the same base instructions with different prefixes in 64-bit mode. I don't know if you care about AVX instructions but I included one anyway as an example:

0f 51 ca                                sqrtps xmm1,xmm2
0f 51 0c 85 0a 00 00 00                 sqrtps xmm1,XMMWORD PTR [rax*4+0xa]
65 0f 51 0c 85 0a 00 00 00              sqrtps xmm1,XMMWORD PTR gs:[rax*4+0xa]
67 0f 51 0c 85 0a 00 00 00              sqrtps xmm1,XMMWORD PTR [eax*4+0xa]
65 67 0f 51 0c 85 0a 00 00 00           sqrtps xmm1,XMMWORD PTR gs:[eax*4+0xa]
f0 65 67 0f 51 0c 85 0a 00 00 00        lock sqrtps xmm1,XMMWORD PTR gs:[eax*4+0xa]
c5 fd 51 ca                             vsqrtpd ymm1,ymm2
c5 fc 51 0c 85 0a 00 00 00              vsqrtps ymm1,YMMWORD PTR [rax*4+0xa]
65 c5 fc 51 0c 85 0a 00 00 00           vsqrtps ymm1,YMMWORD PTR gs:[rax*4+0xa]
67 c5 fc 51 0c 85 0a 00 00 00           vsqrtps ymm1,YMMWORD PTR [eax*4+0xa]
65 67 c5 fc 51 0c 85 0a 00 00 00        vsqrtps ymm1,YMMWORD PTR gs:[eax*4+0xa]
f0 65 67 c5 fc 51 0c 85 0a 00 00 00     lock vsqrtps ymm1,YMMWORD PTR gs:[eax*4+0xa]

So as far as I can see you will always have to loop over all prefixes to determine if an instruction is an SSE instruction.

Update:
An additional complication is the existence of instructions that only differ in their ModRM encoding. Consider:

df 00 fild              WORD PTR [rax] # Non-SSE instruction: DF /0
df 08 fisttp            WORD PTR [rax] # SSE instruction: DF /1

To find these and all the other ways they can be encoded it's easiest to use an opcode map.

Because I've been meaning to look at writing a disassembler anyway I figured it would be a fun challenge to see what it takes. It should find most SSE instructions, though obviously I can't and won't guarantee it. I transformed the above opcode map into a series of tests that the code passes (tests.c - too big for inlining). The code tests a series of text strings containing hex digits of the opcode encoding (it stops parsing at the first non-hex digit, the last character in the string signifies whether it is an SSE instruction or not).

It first scans all prefixes, then uses the opcode tables to test if the instruction matches with extra logic to handle the nested tables needed for multi-byte opcodes and the need to match digits in the following modrm byte.

ssedetect.c:

#include <stdio.h> 
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <ctype.h>

#include "inst_table.h"

enum { PREFIX_66=OP_66_SSE, PREFIX_F2=OP_F2_SSE, PREFIX_F3=OP_F3_SSE  };

static int check_prefixes(int prefixes, int op_type) {
    if (op_type & OP_ALWAYS_SSE) return 1;
    if ((op_type & OP_66_SSE) && (prefixes & PREFIX_66)) return 1;
    if ((op_type & OP_F2_SSE) && (prefixes & PREFIX_F2)) return 1;
    if ((op_type & OP_F3_SSE) && (prefixes & PREFIX_F3)) return 1;
    return 0;
}

int isInstructionSSE(const uint8_t* code, int length)
{
    int position = 0;

    // read prefixes
    int prefixes = 0;
    while (position < length) {
        uint8_t b = code[position];

        if (b == 0x66) {
            prefixes |= PREFIX_66;
            position++;
        } else if (b == 0xF2) { 
            prefixes |= PREFIX_F2;
            position++;
        } else if (b == 0xF3) { 
            prefixes |= PREFIX_F3; 
            position++;
        } else if (b >= 0x40 && b <= 0x4F) {
            //prefixes |= PREFIX_REX;
            position++;
            break; // opcode must follow REX
        } else if (b == 0x2E || b == 0x3E || b == 0x26 || b == 0x36 || b == 0x64 || b == 0x65 || b == 0x67 || b == 0xF0) {
            // ignored prefix
            position++;    
        } else {
            break;
        }
    }

    // read opcode
    const uint16_t* op_table = op;
    int op_length = 0;
    while (position < length) {
        uint8_t b = code[position];
        uint16_t op_type = op_table[b];
        if (op_type & OP_EXTENDED) {
            op_length++;
            position++;
            // hackish
            if (op_length == 1 && b == 0x0F) op_table = op_0F;
            else if (op_length == 2 && b == 0x01) op_table = op_0F_01;
            else if (op_length == 2 && b == 0x38) op_table = op_0F_38;
            else if (op_length == 2 && b == 0x3A) op_table = op_0F_3A;
            else { printf("\n\n%2.2X\n",b); abort(); }
        } else if (op_type & OP_DIGIT) {
            break;
        } else {
            return check_prefixes(prefixes, op_type);
        }
    } 

    // optionally read a digit

    // find digits we need can match in table
    uint8_t match_digits = (op_table[code[position]] & OP_DIGIT_MASK) >> OP_DIGIT_SHIFT;

    // consume the byte
    op_length++;
    position++;
    if (position >= length) {
        return 0;
    }

    uint8_t digit = (code[position]>>3)&7; // reg part of modrm

    return (match_digits & (1 << digit)) != 0;
}

static int read_code(const char* str, uint8_t** code, int* length)
{
    int size = 1000;
    *length = 0;
    *code = malloc(size);
    if (!*code) {
        printf("out of memory\n");
        return 0;
    }

    while (*str) {
        char* endptr;
        unsigned long val = strtoul(str, &endptr, 16);
        if (str == endptr) {
            break;
        } 

        if (val > 255) {
            printf("%lX is out of range\n", val);
            goto error;
            return 0;
        }

        (*code)[*length] = (uint8_t)val;

        if (++*length >= size) {
            printf("needs resize, not implemented\n");
            goto error;
        }

        str = endptr;
    }

    if (*length == 0) {
        printf("No instruction bytes found\n");
        goto error;
    }

    return 1;

error:
    free(*code);
    return 0;
}

static void test(const char* str)
{
    uint8_t* code;
    int length;
    if (!read_code(str, &code, &length)) {
        puts(str);
        exit(1);
    }
    char is_sse = isInstructionSSE(code, length) ? 'Y' : 'N';
    char should_be_sse = str[strlen(str)-1];
    free(code);
    if (should_be_sse != is_sse) {
        printf("(%c) %c %s\n", should_be_sse, is_sse, str);
        exit(1);
    }
}

int main() 
{
#include "tests.c"
    test("48 ba 39 00 00 00 00 00 00 00           # movabs rdx,0x39 N");
    test("48 b8 00 00 00 00 00 00 00 00           # movabs rax,0x0 N");
    test("48 b9 14 00 00 00 00 00 00 00           # movabs rcx,0x14 N");
    test("48 6b c0 0a                             # imul   rax,rax,0xa N");
    test("48 83 ea 30                             # sub    rdx,0x30 N");
    test("48 01 d0                                # add    rax,rdx N");
    test("48 ff c9                                # dec    rcx N");
    test("75 f0                                   # jne    0x1e N");
    test("0f 51 ca                                # sqrtps xmm1,xmm2 Y");
    test("0f 51 0c 85 0a 00 00 00                 # sqrtps xmm1,XMMWORD PTR [rax*4+0xa] Y");
    test("65 0f 51 0c 85 0a 00 00 00              # sqrtps xmm1,XMMWORD PTR gs:[rax*4+0xa] Y");
    test("67 0f 51 0c 85 0a 00 00 00              # sqrtps xmm1,XMMWORD PTR [eax*4+0xa] Y");
    test("65 67 0f 51 0c 85 0a 00 00 00           # sqrtps xmm1,XMMWORD PTR gs:[eax*4+0xa] Y");
    test("f0 65 67 0f 51 0c 85 0a 00 00 00        # lock sqrtps xmm1,XMMWORD PTR gs:[eax*4+0xa] Y");
    test("f0 65 67 f3 43 0f 5c 8c 81 2a 2a 00 00  # lock subss xmm1, [gs:r8d*4+r9d+0x2A2A] Y");
    test("0f 58 c0                                # addps  xmm0,xmm0 Y");
    test("66 0f 58 c0                             # addpd  xmm0,xmm0 Y");
    test("f3 0f 58 c0                             # addss  xmm0,xmm0 Y");
    test("f2 0f 58 c0                             # addsd  xmm0,xmm0 Y");
    test("df 04 25 2c 00 00 00                    # fild   WORD PTR ds:0x2c N");
    test("df 0c 25 2c 00 00 00                    # fisttp WORD PTR ds:0x2c Y");
    test("67 0f ae 10                             # ldmxcsr DWORD PTR [eax] Y");
    test("67 0f ae 18                             # stmxcsr DWORD PTR [eax] Y");
    test("0f ae 00                                # fxsave [rax] N");
    test("0f ae e8                                # lfence Y");
    test("0f ae f0                                # mfence Y");
    test("0f ae f8                                # sfence Y");
    test("67 0f ae 38                             # clflush BYTE PTR [eax] Y");
    test("67 0f 18 00                             # prefetchnta BYTE PTR [eax] Y");
    test("0f 18 0b                                # prefetcht0 BYTE PTR [rbx] Y");
    test("67 0f 18 11                             # prefetcht1 BYTE PTR [ecx] Y");
    test("0f 18 1a                                # prefetcht2 BYTE PTR [rdx] Y");
    test("df 08                                   # fisttp WORD PTR [rax] Y");
    test("df 00                                   # fild   WORD PTR [rax] N");

    printf("All tests passed\n");
    return 0;
}

inst_table.h:

// Table Element format:
// Bit: 0 SSE instruction if 66 prefix
//      1 SSE instruction if F2 prefix
//      2 SSE instruction if F3 prefix
//      3 Extended table
//      4 Instruction is always SSE
//      5 SSE instruction if ModRM byte matches digit(s) 
//      6 -----
//      7 -----
//      8 SSE if ModRM has reg = 0
//      9 SSE if ModRM has reg = 1
//      .
//      . That is it matches instructoins on the form XX XX /digit
//      .
//      15 SSE if modRM has reg = 7 

#define OP_66_SSE      0x0001 // SSE if 66 prefix
#define OP_F2_SSE      0x0002 // SSE if F2 prefix
#define OP_F3_SSE      0x0004 // SSE if F3 prefix
#define OP_EXTENDED    0x0008 // continue with extended table
#define OP_ALWAYS_SSE  0x0010
#define OP_DIGIT       0x0020

#define OP_DIGIT_MASK  0xFF00
#define OP_DIGIT_SHIFT      8
#define OP_MATCH_DIGIT(d) (OP_DIGIT | (1 << (d + OP_DIGIT_SHIFT)))

static const uint16_t op[256] = {
    [0x0F] = OP_EXTENDED,
    [0x90] = OP_F3_SSE,
    [0xDB] = OP_MATCH_DIGIT(1), // DB /1: FISTTP
    [0xDD] = OP_MATCH_DIGIT(1), 
    [0xDF] = OP_MATCH_DIGIT(1),
};

static const uint16_t op_0F[256] = {
    [0x01] = OP_EXTENDED, 
    [0x10] = OP_ALWAYS_SSE, // 0F 10 MOVUPS, F3 0F 10 MOVSS ... 
    [0x11] = OP_ALWAYS_SSE,
    [0x12] = OP_ALWAYS_SSE,
    [0x13] = OP_ALWAYS_SSE,
    [0x14] = OP_ALWAYS_SSE,
    [0x15] = OP_ALWAYS_SSE,
    [0x16] = OP_ALWAYS_SSE,
    [0x17] = OP_ALWAYS_SSE,
    [0x18] = OP_MATCH_DIGIT(0)|OP_MATCH_DIGIT(1)|OP_MATCH_DIGIT(2)|OP_MATCH_DIGIT(3),
    [0x28] = OP_ALWAYS_SSE,
    [0x29] = OP_ALWAYS_SSE,
    [0x2A] = OP_ALWAYS_SSE,
    [0x2B] = OP_ALWAYS_SSE,
    [0x2C] = OP_ALWAYS_SSE,
    [0x2D] = OP_ALWAYS_SSE,
    [0x2E] = OP_ALWAYS_SSE,
    [0x2F] = OP_ALWAYS_SSE,
    [0x38] = OP_EXTENDED,
    [0x3A] = OP_EXTENDED,
    [0x50] = OP_ALWAYS_SSE,
    [0x51] = OP_ALWAYS_SSE,
    [0x52] = OP_ALWAYS_SSE,
    [0x53] = OP_ALWAYS_SSE,
    [0x54] = OP_ALWAYS_SSE,
    [0x55] = OP_ALWAYS_SSE,
    [0x56] = OP_ALWAYS_SSE,
    [0x57] = OP_ALWAYS_SSE,
    [0x58] = OP_ALWAYS_SSE,
    [0x59] = OP_ALWAYS_SSE,
    [0x5A] = OP_ALWAYS_SSE,
    [0x5B] = OP_ALWAYS_SSE,
    [0x5C] = OP_ALWAYS_SSE,
    [0x5D] = OP_ALWAYS_SSE,
    [0x5E] = OP_ALWAYS_SSE,
    [0x5F] = OP_ALWAYS_SSE,
    [0x60] = OP_66_SSE,
    [0x61] = OP_66_SSE,
    [0x62] = OP_66_SSE,
    [0x63] = OP_66_SSE,
    [0x64] = OP_66_SSE,
    [0x65] = OP_66_SSE,
    [0x66] = OP_66_SSE,
    [0x67] = OP_66_SSE,
    [0x68] = OP_66_SSE,
    [0x69] = OP_66_SSE,
    [0x6A] = OP_66_SSE,
    [0x6B] = OP_66_SSE,
    [0x6C] = OP_66_SSE,
    [0x6D] = OP_66_SSE,
    [0x6E] = OP_66_SSE,
    [0x6F] = OP_66_SSE | OP_F3_SSE,
    [0x70] = OP_ALWAYS_SSE,
    [0x71] = OP_66_SSE,
    [0x72] = OP_66_SSE,
    [0x73] = OP_66_SSE,
    [0x74] = OP_66_SSE,
    [0x75] = OP_66_SSE,
    [0x76] = OP_66_SSE,
    [0x77] = OP_66_SSE,
    [0x78] = OP_66_SSE,
    [0x79] = OP_66_SSE,
    [0x7A] = OP_66_SSE,
    [0x7B] = OP_66_SSE,
    [0x7C] = OP_66_SSE | OP_F2_SSE,
    [0x7D] = OP_66_SSE | OP_F2_SSE,
    [0x7E] = OP_66_SSE | OP_F3_SSE,
    [0x7F] = OP_66_SSE | OP_F3_SSE,
    [0xAE] = OP_MATCH_DIGIT(2)|OP_MATCH_DIGIT(3)|OP_MATCH_DIGIT(5)|OP_MATCH_DIGIT(6)|OP_MATCH_DIGIT(7),
    [0xC2] = OP_ALWAYS_SSE,
    [0xC3] = OP_ALWAYS_SSE,
    [0xC4] = OP_ALWAYS_SSE,
    [0xC5] = OP_ALWAYS_SSE,
    [0xC6] = OP_ALWAYS_SSE,
    [0xD0] = OP_66_SSE | OP_F2_SSE,
    [0xD1] = OP_66_SSE,
    [0xD2] = OP_66_SSE,
    [0xD3] = OP_66_SSE,
    [0xD4] = OP_ALWAYS_SSE,
    [0xD5] = OP_66_SSE,
    [0xD6] = OP_66_SSE | OP_F2_SSE | OP_F3_SSE,
    [0xD7] = OP_ALWAYS_SSE,
    [0xD8] = OP_66_SSE,
    [0xD9] = OP_66_SSE,
    [0xDA] = OP_ALWAYS_SSE,
    [0xDB] = OP_66_SSE,
    [0xDC] = OP_66_SSE,
    [0xDD] = OP_66_SSE,
    [0xDE] = OP_ALWAYS_SSE,
    [0xDF] = OP_66_SSE,
    [0xE0] = OP_ALWAYS_SSE,
    [0xE1] = OP_66_SSE,
    [0xE2] = OP_66_SSE,
    [0xE3] = OP_ALWAYS_SSE,
    [0xE4] = OP_ALWAYS_SSE,
    [0xE5] = OP_66_SSE,
    [0xE6] = OP_66_SSE | OP_F2_SSE | OP_F3_SSE,
    [0xE7] = OP_ALWAYS_SSE,
    [0xE8] = OP_66_SSE,
    [0xE9] = OP_66_SSE,
    [0xEA] = OP_ALWAYS_SSE,
    [0xEB] = OP_66_SSE,
    [0xEC] = OP_66_SSE,
    [0xED] = OP_66_SSE,
    [0xEE] = OP_ALWAYS_SSE,
    [0xEF] = OP_66_SSE,
    [0xF0] = OP_F2_SSE,
    [0xF1] = OP_66_SSE,
    [0xF2] = OP_66_SSE,
    [0xF3] = OP_66_SSE,
    [0xF4] = OP_ALWAYS_SSE,
    [0xF5] = OP_66_SSE,
    [0xF6] = OP_ALWAYS_SSE,
    [0xF7] = OP_ALWAYS_SSE,
    [0xF8] = OP_66_SSE,
    [0xF9] = OP_66_SSE,
    [0xFA] = OP_66_SSE,
    [0xFB] = OP_ALWAYS_SSE,
    [0xFC] = OP_66_SSE,
    [0xFD] = OP_66_SSE,
    [0xFE] = OP_66_SSE,
};

static const uint16_t op_0F_01[256] = {
    [0xC8] = OP_ALWAYS_SSE, // 0F 01 C8: MONITOR
    [0xC9] = OP_ALWAYS_SSE,
};


static const uint16_t op_0F_38[256] = {
    [0xF0] = OP_F2_SSE, // F2 0F 38 F0: CRC32
    [0xF1] = OP_F2_SSE,
};

static const uint16_t op_0F_3A[256] = {
    [0x08] = OP_66_SSE, // 66 0F 3A 08: ROUNDPS
    [0x09] = OP_66_SSE,
    [0x0A] = OP_66_SSE,
    [0x0B] = OP_66_SSE,
    [0x0C] = OP_66_SSE,
    [0x0D] = OP_66_SSE,
    [0x0E] = OP_66_SSE,
    [0x0F] = OP_ALWAYS_SSE,
    [0x14] = OP_66_SSE,
    [0x15] = OP_66_SSE,
    [0x16] = OP_66_SSE,
    [0x17] = OP_66_SSE,
    [0x20] = OP_66_SSE,
    [0x21] = OP_66_SSE,
    [0x22] = OP_66_SSE,
    [0x40] = OP_66_SSE,
    [0x41] = OP_66_SSE,
    [0x42] = OP_66_SSE,
    [0x60] = OP_66_SSE,
    [0x61] = OP_66_SSE,
    [0x62] = OP_66_SSE,
    [0x63] = OP_66_SSE,
};
月牙弯弯 2024-12-07 03:16:15

没有明确的 SSE 前缀。有些SSE指令以0F开头,有些以F3开头,但并非所有0F和F3指令都是SSE指令。您需要一个更全面的解码器来判断指令是否是 SSE。由于 x86 指令是可变长度的,因此您无论如何都需要它。

There's no unambiguous SSE prefix. Some SSE instructions start with 0F and some with F3 but not all 0F and F3 instructions are SSE instructions. You'll need a more comprehensive decoder to tell whether an instruction is SSE. Since x86 instructions are variable-length, you'd need that anyway.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文