自动 x86 指令混淆
我正在开发一个 x86 asm 混淆器,它将 Intel 语法代码作为字符串并输出一组等效的混淆操作码。
这是一个例子:
mov eax, 0x5523
or eax, [ebx]
push eax
call someAPI
变得像这样:
mov eax, 0xFFFFFFFF ; mov eax, 0x5523
and eax, 0x5523 ;
push [ebx] ; xor eax, [ebx]
or [esp], eax ;
pop eax ;
push 12345h ; push eax
mov [esp], eax ;
call getEIP ; call someAPI
getEIP: ;
add [esp], 9 ;
jmp someAPI ;
这只是一个例子,我没有检查这是否会搞砸标志(它可能会搞砸)。
现在我有一个 XML 文档,其中列出了指令模板(例如 push e*x
)和可以使用的替换指令列表。
我正在寻找一种自动生成操作码序列的方法,该序列产生与输入相同的结果。我不介意进行受过教育的暴力,但我不确定我会如何处理这个问题。
I'm working on an x86 asm obfuscator that takes Intel-syntax code as a string and outputs an equivilent set of opcodes that are obfuscated.
Here's an example:
mov eax, 0x5523
or eax, [ebx]
push eax
call someAPI
Becomes something like:
mov eax, 0xFFFFFFFF ; mov eax, 0x5523
and eax, 0x5523 ;
push [ebx] ; xor eax, [ebx]
or [esp], eax ;
pop eax ;
push 12345h ; push eax
mov [esp], eax ;
call getEIP ; call someAPI
getEIP: ;
add [esp], 9 ;
jmp someAPI ;
This is just an example, I've not checked that this doesn't screw up flags (it probably does).
Right now I have an XML document that lists instruction templates (e.g. push e*x
) and a list of replacement instructions that can be used.
What I'm looking for is a way to automatically generate opcode sequences that produce the same result as an input. I don't mind doing an educated bruteforce, but I'm not sure how I'd approach this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您需要的是操作码功能的代数描述,以及一组允许您确定等效操作的代数定律。
然后,对于每条指令,您查找其代数描述(作为示例,
an
其代数等价物是
使用这些代数等价物枚举代数等价物,例如:
为您的 XOR 指令生成等价代数语句
您可以对此应用更多代数定律,例如德摩根定理:
得到
此时您有一个规范代数计算的
和原来的一样。这就是你的蛮力。
现在你必须通过匹配哪些指令来将其“编译”为机器指令
就按这个说的做。与任何编译器一样,您可能希望优化
生成的代码(两次获取 mem[ecx] 没有意义)。 (所有这些都很难......它是一个代码生成器!)
生成的代码序列类似于:
这是需要手动构建的大量机器。
另一种方法是利用程序转换系统,该系统允许您将源到源的转换应用于代码。然后,您可以将“等效项”编码为直接在代码上重写。
这些工具之一是我们的 DMS 软件重新工程工具包。
DMS 采用语言定义(本质上是 EBNF),自动实现解析器、AST 构建器和 Prettyprinter(反解析器,将 AST 转回有效的源文本)。
[DMS 目前没有 ASM86 的 EBNF,但有数十个针对各种
已经为 DMS 构建了复杂的语言,包括多种用于各种非 x86 汇编器的语言
因此,您必须将 ASM86 EBNF 定义为 DMS。这非常简单;数据管理系统
有一个非常强大的解析器生成器]。
使用它,DMS 将允许您直接在代码上编写源转换。您可以编写以下转换来直接实现 XOR 等价和德摩根定律:
在名为“free_register”的元过程中使用一些额外的魔法来确定寄存器的内容
在代码中的那个点(AST 匹配)是空闲的。 (如果您不想这样做,请使用堆栈顶部
就像您在示例中所做的那样,到处都是临时的)。
您需要进行大量重写来覆盖您想要混淆的所有情况,以及寄存器和内存操作数的组合。
然后,可以要求转换引擎在代码中的每个点随机应用这些转换一次或多次,以对其进行扰乱。
您可以看到使用 DMS 应用一些代数变换的完整示例。
What you need is an algebraic descripton of what the opcodes do, and a set of algebraic laws that allow you to determine equivalent operations.
Then for each instruction, you look up its algebraic description (for the sake of an example,
an
whose algebraic equivalent is
enumerate algebraic equivalences using those algebra equivalents, such as:
to generate equivalent algebraic statement for your XOR instruction
You may apply more algebraic laws to this, for instance de morgans' theorem:
to get
At this point you have a specification of an algebraic computation that will do
the same thing as the original. There's your brute force.
Now you have to "compile" this to machine instructions by matching what instructions
will do with what this says. Like any compiler, you likely want to optimize the
generated code (no point in fetching mem[ecx] twice). (All of this hard... its a code generator!)
The resulting code sequence would be something like:
This is a lot of machinery to build manually.
Another way to do this is to take advantage of a program transformation system that allows you to apply source-to-source transformations to code. Then you can encode "equivalences" as rewrites directly on the code.
One of these tools is our DMS Software Reengineering Toolkit.
DMS takes a langauge definition (essentially as an EBNF), automatically implements a parser, AST builder, and prettyprinter (anti parser, turning AST back into valid source text).
[DMS doesn't presently have an EBNF for ASM86, but dozens of EBNFs for various
complex langauges have been build for DMS including several for miscellaneous non-x86 assemblers
So you'd have to define the ASM86 EBNF to DMS. This is pretty straightforward; DMS
has a really strong parser generator].
Using that, DMS will let you write source transformations directly on the code. You could write the following transformations that implement the XOR equivalant and DeMorgan's law directly:
with some additional magic in a meta-procedure called "free_register" that determines what registers
are free at that point (of the AST match) in the code. (If you don't want to do that, use the top of the stack
as temporary everywhere as you did in your example).
You'd need a bunch of rewrites to cover all the cases that you want to obfuscate, with thier combinatorics with registers and memory operands.
Then the transformation engine can be asked to apply these transformations randomly once or more than once at each point in the code to scramble it.
You can see a fully worked example of some algebraic transforms being applied with DMS.
查看
Obfusion
项目。它可以漂亮地混淆x86
shellcode
出色地。但是,它似乎不支持64位
。不过,该项目中的大多数代码、算法和想法可能都可以满足您的需求。另外一个值得研究的非常好的项目是
ADVobfuscator
但它适用于 < code>C/C++ 通过宏对源代码进行混淆。另一种方法可以是在反汇编器引擎的指令内部表示之上实现转换,例如
Zydis
< /a>.并且不要忘记
LLVM-obfuscator
这是一个带有混淆标志的C/C++
编译器。Take a look at the
Obfusion
project. It can obfuscatex86
shellcode
pretty well. However, it does not seem to support64-bit
. Most code, algorithms and ideas from this project can probably be applied to your needs though.Also another very nice project to look into is
ADVobfuscator
but it applies toC/C++
source code obfuscation via macros.Another approach could be implementing transformations on top of a disassembler engine's internal representation of instructions like
Zydis
.And don't forget about
LLVM-obfuscator
which is aC/C++
compiler with obfuscation flags.