自动 x86 指令混淆

发布于 2024-12-12 10:12:01 字数 672 浏览 0 评论 0原文

我正在开发一个 x86 asm 混淆器，它将 Intel 语法代码作为字符串并输出一组等效的混淆操作码。

这是一个例子：

mov eax, 0x5523
or eax, [ebx]
push eax
call someAPI

变得像这样：

mov eax, 0xFFFFFFFF ; mov eax, 0x5523
and eax, 0x5523     ;
push [ebx]          ; xor eax, [ebx]
or [esp], eax       ;
pop eax             ;
push 12345h         ; push eax
mov [esp], eax      ;
call getEIP         ; call someAPI
getEIP:             ;
add [esp], 9        ;
jmp someAPI         ;

这只是一个例子，我没有检查这是否会搞砸标志（它可能会搞砸）。

现在我有一个 XML 文档，其中列出了指令模板（例如 push e*x）和可以使用的替换指令列表。

我正在寻找一种自动生成操作码序列的方法，该序列产生与输入相同的结果。我不介意进行受过教育的暴力，但我不确定我会如何处理这个问题。

原文

I'm working on an x86 asm obfuscator that takes Intel-syntax code as a string and outputs an equivilent set of opcodes that are obfuscated.

Here's an example:

mov eax, 0x5523
or eax, [ebx]
push eax
call someAPI

Becomes something like:

mov eax, 0xFFFFFFFF ; mov eax, 0x5523
and eax, 0x5523     ;
push [ebx]          ; xor eax, [ebx]
or [esp], eax       ;
pop eax             ;
push 12345h         ; push eax
mov [esp], eax      ;
call getEIP         ; call someAPI
getEIP:             ;
add [esp], 9        ;
jmp someAPI         ;

This is just an example, I've not checked that this doesn't screw up flags (it probably does).

Right now I have an XML document that lists instruction templates (e.g. push e*x) and a list of replacement instructions that can be used.

What I'm looking for is a way to automatically generate opcode sequences that produce the same result as an input. I don't mind doing an educated bruteforce, but I'm not sure how I'd approach this.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

终止放荡 2024-12-19 10:12:02

您需要的是操作码功能的代数描述，以及一组允许您确定等效操作的代数定律。

然后，对于每条指令，您查找其代数描述（作为示例，
an

 XOR  eax,mem[ecx]

其代数等价物是

 eax exclusive_or mem[ecx]

使用这些代数等价物枚举代数等价物，例如：

 a exclusive_or b ==> (a and not b) or (b and not a)

为您的 XOR 指令生成等价代数语句

 eax exclusive_or mem[ecx] ==> (eax and not mem[ecx]) or (mem[ecx] and not eax)

您可以对此应用更多代数定律，例如德摩根定理：

 a or b ==> not (not a and not b)

得到

(not (not (eax and not mem[ecx])) and (not (mem[ecx] and not eax)))

此时您有一个规范代数计算的
和原来的一样。这就是你的蛮力。

现在你必须通过匹配哪些指令来将其“编译”为机器指令
就按这个说的做。与任何编译器一样，您可能希望优化
生成的代码（两次获取 mem[ecx] 没有意义）。（所有这些都很难......它是一个代码生成器！）
生成的代码序列类似于：

mov ebx, mem[ecx]
mov edx, ebx
not edx
and edx, eax
not eax
and eax, ebx
not eax
or eax, edx

这是需要手动构建的大量机器。

另一种方法是利用程序转换系统，该系统允许您将源到源的转换应用于代码。然后，您可以将“等效项”编码为直接在代码上重写。

这些工具之一是我们的 DMS 软件重新工程工具包。

DMS 采用语言定义（本质上是 EBNF），自动实现解析器、AST 构建器和 Prettyprinter（反解析器，将 AST 转回有效的源文本）。
[DMS 目前没有 ASM86 的 EBNF，但有数十个针对各种
已经为 DMS 构建了复杂的语言，包括多种用于各种非 x86 汇编器的语言
因此，您必须将 ASM86 EBNF 定义为 DMS。这非常简单；数据管理系统
有一个非常强大的解析器生成器]。

使用它，DMS 将允许您直接在代码上编写源转换。您可以编写以下转换来直接实现 XOR 等价和德摩根定律：

  domain ASM86;

  rule obfuscate_XOR(r: register, m: memory_access):instruction:instruction
  =  " XOR \r, \m " 
      rewrites to
     " MOV \free_register\(\),\m
       NOT \free_register\(\)
       AND \free_register\(\),\r 
       NOT \r
       AND \r,\m
       OR \r,\free_register\(\)";

 rule obfuscate_OR(r1: register, r2: register):instruction:instruction
 = " OR \r1, \r2"
     rewrites to
    " MOV \free_register\(\),\r1
      NOT \free_register\(\)
      AND \free_register\(\),\r2
      NOT \r2
      AND \r1,\r2
      NOT \r1";

在名为“free_register”的元过程中使用一些额外的魔法来确定寄存器的内容
在代码中的那个点（AST 匹配）是空闲的。（如果您不想这样做，请使用堆栈顶部
就像您在示例中所做的那样，到处都是临时的）。

您需要进行大量重写来覆盖您想要混淆的所有情况，以及寄存器和内存操作数的组合。

然后，可以要求转换引擎在代码中的每个点随机应用这些转换一次或多次，以对其进行扰乱。

您可以看到使用 DMS 应用一些代数变换的完整示例。

What you need is an algebraic descripton of what the opcodes do, and a set of algebraic laws that allow you to determine equivalent operations.

Then for each instruction, you look up its algebraic description (for the sake of an example,
an

 XOR  eax,mem[ecx]

whose algebraic equivalent is

 eax exclusive_or mem[ecx]

enumerate algebraic equivalences using those algebra equivalents, such as:

 a exclusive_or b ==> (a and not b) or (b and not a)

to generate equivalent algebraic statement for your XOR instruction

 eax exclusive_or mem[ecx] ==> (eax and not mem[ecx]) or (mem[ecx] and not eax)

You may apply more algebraic laws to this, for instance de morgans' theorem:

 a or b ==> not (not a and not b)

to get

(not (not (eax and not mem[ecx])) and (not (mem[ecx] and not eax)))

At this point you have a specification of an algebraic computation that will do
the same thing as the original. There's your brute force.

Now you have to "compile" this to machine instructions by matching what instructions
will do with what this says. Like any compiler, you likely want to optimize the
generated code (no point in fetching mem[ecx] twice). (All of this hard... its a code generator!)
The resulting code sequence would be something like:

mov ebx, mem[ecx]
mov edx, ebx
not edx
and edx, eax
not eax
and eax, ebx
not eax
or eax, edx

This is a lot of machinery to build manually.

Another way to do this is to take advantage of a program transformation system that allows you to apply source-to-source transformations to code. Then you can encode "equivalences" as rewrites directly on the code.

One of these tools is our DMS Software Reengineering Toolkit.

DMS takes a langauge definition (essentially as an EBNF), automatically implements a parser, AST builder, and prettyprinter (anti parser, turning AST back into valid source text).
[DMS doesn't presently have an EBNF for ASM86, but dozens of EBNFs for various
complex langauges have been build for DMS including several for miscellaneous non-x86 assemblers
So you'd have to define the ASM86 EBNF to DMS. This is pretty straightforward; DMS
has a really strong parser generator].

Using that, DMS will let you write source transformations directly on the code. You could write the following transformations that implement the XOR equivalant and DeMorgan's law directly:

  domain ASM86;

  rule obfuscate_XOR(r: register, m: memory_access):instruction:instruction
  =  " XOR \r, \m " 
      rewrites to
     " MOV \free_register\(\),\m
       NOT \free_register\(\)
       AND \free_register\(\),\r 
       NOT \r
       AND \r,\m
       OR \r,\free_register\(\)";

 rule obfuscate_OR(r1: register, r2: register):instruction:instruction
 = " OR \r1, \r2"
     rewrites to
    " MOV \free_register\(\),\r1
      NOT \free_register\(\)
      AND \free_register\(\),\r2
      NOT \r2
      AND \r1,\r2
      NOT \r1";

with some additional magic in a meta-procedure called "free_register" that determines what registers
are free at that point (of the AST match) in the code. (If you don't want to do that, use the top of the stack
as temporary everywhere as you did in your example).

You'd need a bunch of rewrites to cover all the cases that you want to obfuscate, with thier combinatorics with registers and memory operands.

Then the transformation engine can be asked to apply these transformations randomly once or more than once at each point in the code to scramble it.

You can see a fully worked example of some algebraic transforms being applied with DMS.

回复收藏 0 原文