逆向工程C程序
如果分发此二进制文件,每个 C 程序都会转换为机器代码。既然计算机的指令集是众所周知的,那么是否可以找回C原始程序呢?
every c program is converted to machine code, if this binary is distributed. Since the instruction set of a computer is well known, is it possible to get back the C original program?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
您永远无法回到完全相同的源,因为没有与编译代码一起保存的元数据。
但是您可以从汇编代码中重新创建代码。
如果您对以下内容感兴趣,请查看这本书:逆向:逆向工程的秘密。
编辑
这里的一些编译器-101,如果你用另一个词来定义一个编译器,而不是像“编译器”那么技术性,它会是什么?
答案:翻译器
编译器将您编写的语法/短语翻译成另一种语言,C 编译器将其翻译成汇编甚至机器代码。 C# 代码被翻译为 IL 等。
您拥有的可执行文件只是原始文本/语法的翻译,如果您想“反转它”,从而“将其翻译回来”,您很可能不会获得与开始时相同的结构。
一个更现实的例子是,如果您从英语翻译成德语,然后从德语翻译回英语,句子结构很可能会有所不同,可能会使用其他单词,但含义、上下文很可能不会改变。
如果您从 C 转到 ASM,编译器/翻译器也是如此,逻辑是相同的,只是读取它的方式不同(当然还有它的优化)。
You can never get back to the exact same source since there is no meta-data about that saved with the compiled code.
But you can re-create code out from the assembly-code.
Check out this book if you are interested in these things: Reversing: Secrets of Reverse Engineering.
Edit
Some compilers-101 here, if you were to define a compiler with another word and not as technical as "compiler", what would it be?
Answer: Translator
A compiler translates the syntax / phrases you have written into another language a C compiler translates to Assembly or even Machine-code. C# Code is translated to IL and so forth.
The executable you have is just a translation of your original text / syntax and if you want to "reverse it" hence "translate it back" you will most likely not get the same structure as you had at the start.
A more real life example would be if you Translate from English to German and the from German back to English, the sentance structure will most likely be different, other words might be used but the meaning, the context, will most likely not have changed.
The same goes for a compiler / translator if you go from C to ASM, the logic is the same, it's just a different way of reading it ( and of course its optimized ).
这取决于你所说的原始C程序是什么意思。诸如局部变量名称、注释等之类的内容不包含在二进制文件中,因此无法获得与用于生成二进制文件的源代码完全相同的源代码。 IDA Pro 等工具可能会帮助您反汇编二进制文件。
It depends on what you mean by original C program. Things like local variable names, comments, etc... are not included in the binary, so there's no way to get the exact same source code as the one used to produce the binary. Tools such as IDA Pro might help you disassemble a binary.
我估计一个真正熟练的黑客每天大约 1 KB 机器代码的转换率。按照西方人的普遍工资水平,一个 100 KB 的可执行文件的价格约为 25,000 美元。花了那么多钱之后,得到的只是一大块 C 代码,它的功能与你的代码完全一样,减去了注释之类的好处。它与您的版本没有任何竞争力,您将能够更快地提供更新和改进。对这些更新进行逆向工程也是一项不小的工作。
如果这个价格标签没有给您留下深刻的印象,您可以通过添加更多代码来任意提高转换成本。请记住,能够处理此类大型程序的熟练黑客还有更好的事情要做。他们编写自己的代码。
I would guestimate the conversion rate of a really skilled hacker at about 1 kilobyte of machine code per day. At common Western salaries, that puts the price of, say, a 100 KB executable at about $25,000. After spending that much money, all that's gained is a chunk of C code that does exactly what yours does, minus the benefit of comments and whatnot. It is no way competitive with your version, you'll be able to deliver updates and improvements much quicker. Reverse engineering those updates is a non trivial effort as well.
If that price tag doesn't impress you, you can arbitrarily raise the conversion cost by adding more code. Just keep in mind that skilled hackers that can tackle large programs like this have something much better to do. They write their own code.
据我所知,关于这个主题的最好的作品之一是:
来自香肠的猪?通过 FermaT 从汇编程序重新设计为 C。
声称你会得到一个合理的 C 程序,即使原始的 asm 代码不是用 C 编写的!有很多注意事项。
One of the best works on this topic that I know about is:
Pigs from sausages? Reengineering from assembler to C via FermaT.
The claim is you get back a reasonable C program, even if the original asm code was not written in C! Lots of caveats apply.
Hex-Rays 反编译器(IDA Pro 的扩展)可以做到这一点。它仍然是最近的、即将推出的,但显示出巨大的前景。这需要一点时间来适应,但可能会加快逆转过程。它不是“银弹”——任何 c 反编译器都不是“银弹”,但它是一笔巨大的财富。
The Hex-Rays decompiler (extension to IDA Pro) can do exactly that. It's still fairly recent and upcoming but showing great promise. It takes a little getting used to but can potentially speed up the reversing process. It's not a "silver bullet" - no c decompiler is, but it's a great asset.
这个过程的俗称是“把汉堡变回牛”。可以将二进制代码逆向工程为功能等效的 C 程序,但该 C 代码是否与原始代码非常相似仍然是一个悬而未决的问题。
The common name for this procedure is "turning hamburger back into cows." It's possible to reverse engineer binary code into a functionally equivalent C program, but whether that C code bears a close resemblance to the original is an open question.
研究执行此操作的工具是一项研究活动。也就是说,在简单的情况下可以获得一些东西(例如,除非存在调试符号,否则您将无法恢复局部变量名称)。对于大型程序来说,或者如果程序员决定让它变得困难,这在实践中几乎是不可能的。
Working on tools that do this is a research activity. That is, it is possible to get something in the easy cases (you won't recover local variables names unless debug symbols are present, for instance). It's nearly impossible in practice for large programs or if the programmer had decided to make it difficult.
C 程序和它将生成的 ASM/机器代码之间不存在 1:1 映射 - 一个 C 程序可以在不同的编译器上或使用不同的设置编译为不同的结果),有时 C 的两个不同位可以生成相同的结果机器代码。
您绝对可以从已编译的 EXE 生成 C 代码。您只是不知道它在结构上与原始代码有多相似 - 除了丢失变量/函数名称之外,我认为它不会知道代码在许多文件中分割的原始方式。
There is not a 1:1 mapping between a C program and the ASM/machine code it will produce - one C program can compile to a different result on different compilers or with different settings) and sometimes two different bits of C could produce the same machine code.
You definitely can generate C code from a compiled EXE. You just can't know how similar in structure it will be to the original code - apart from variable/function names being lost, I assume it won't know the original way the code was split amongst many files.
你可以试试 hex-rays.com,它有一个非常好的反编译器,可以将汇编代码反编译成 C,准确率达到 99%。
You can try hex-rays.com, it has a really nice decompiler which can decompile assembly code into C with 99% accuracy.