逆向工程“编译” Perl 与 C?
有客户声称编译的 C 代码比 sudo“编译的”Perl 字节代码等更难进行逆向工程。有人有办法证明或反驳这一点吗?
Have a client that's claiming complied C is harder to reverse engineer than sudo "compiled" Perl byte-code, or the like. Anyone have a way to prove, or disprove this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我对 perl 不太了解,但我会举一些例子,说明为什么将代码逆向编译为汇编如此丑陋。
逆向工程 C 代码最丑陋的事情是编译删除了所有类型信息。在我看来,完全缺乏名称和类型是最糟糕的部分。
在动态类型语言中,编译器需要保留更多相关信息。特别是字段/方法/...的名称,因为这些通常是不可能找到其每种用途的字符串。
还有很多其他丑陋的东西。比如整个程序优化每次使用不同的寄存器来传递参数。函数被内联,因此一个简单的函数出现在很多地方,由于优化,通常以稍微不同的形式出现。
堆栈上的相同寄存器和字节被函数内的不同内容重用。堆栈上的数组变得特别难看。因为你无法知道数组有多大以及它的结束位置。
然后还有一些可能会令人烦恼的微观优化。例如,我曾经花费超过 15 分钟来反转一个曾经类似于
return x/1600
的简单函数。因为编译器认为除法很慢,并将常量除法重写为几个乘法加法和按位运算。I don't know too much about perl, but I'll give some examples why reversing code compiled to assembly is so ugly.
The ugliest thing about reverse engineering c code is that the compilation removes all type information. This total lack of names and types is very the worst part IMO.
In a dynamically typed language the compiler needs to preserve much more information about that. In particular the names of fields/methods/... since these are usually strings for which it is impossible to find every use.
There is plenty of other ugly stuff. Such as whole program optimization using different registers to pass parameters every time. Functions being inlined so what was one a simple function appears in many places, often in slightly different form due to optimizations.
The same registers and bytes on the stack get reused by different content inside a function. Gets especially ugly with arrays on the stack. Since you have no way to know how big the array is and where it ends.
Then there are micro-optimizations which can get annoying. For example I once spend >15 minutes to reverse a simple function that once was similar to
return x/1600
. Because the compiler decided that divisions are slow and rewrote that division by a constant into several multiplications additions and bitwise-operations.Perl 非常容易逆向工程。选择的工具是 vi、vim、emacs 或记事本。
Perl is really easy to reverse engineer. The tool of choice is vi, vim, emacs or notepad.
这确实提出了一个问题:为什么他们担心逆向工程。通常,将机器代码转回类似于原始源代码的内容比字节代码更困难,但对于大多数邪恶活动来说,这是无关紧要的。如果有人想要复制您的秘密或破坏您的安全性,他们可以做足够多的事情,而无需将其恢复为原始源代码的完美表示。
That does raise the question about why they're worried about reverse engineering. It is more difficult to turn machine code back to something resembling the original source code than it is byte-code normally but for most nefarious activities that's irrelevant. If someone wants to copy your secrets or break your security they can do enough without turning it back into a perfect representation of your original source code.
虚拟机的逆向工程代码通常更容易。虚拟机通常被设计为该语言的简单目标。这意味着它通常相当容易且直接地表示该语言的构造。
然而,如果您正在处理的虚拟机不是为该特定语言设计的(例如,编译到 JVM 的 Perl),这通常会让您更接近于为真实硬件生成的代码 - 即,您必须采取一切必要措施来针对预定义的架构,而不是设计目标以适应源。
Reverse engineering code for a virtual machine is usually easier. A virtual machine is typically designed to be an easy target for the language. That means it typically represents the constructs of that language reasonably easily and directly.
If, however, you're dealing with a VM that wasn't designed for that particular language (e.g., Perl compiled to the JVM) that would frequently put you back much closer to working with code generated for real hardware -- i.e., you have to do whatever's necessary to target a pre-defined architecture instead of designing the target to fit the source.
好吧,多年来关于这个问题已经有足够多的争论了;大多数情况下,结果从来都不是决定性的……主要是因为它并不重要。
对于积极主动的逆向工程师来说,两者都是相同的。
如果您使用像 perl2exe 这样的伪 exe 生成器,那么这将比编译 C 更容易“反编译”,因为 perl2exe 根本不编译 perl,它只是有点“隐藏”(请参阅 http://www.net-security.org/vuln.php?id=2464 ;这真的很旧,但概念可能仍然是相同的(我没有研究过,所以不确定,但我希望你明白我的意思))
我建议看看最适合这项工作的语言,以便维护和开发实际产品可以明智且可持续地完成。
请记住,您_无法_阻止一个有动机的对手,您需要使逆转的成本比自己编写它的成本更高。
这 4 个应该会让它变得困难(但同样不是不可能)...
[1] 插入噪声代码(随机位置、随机代码),它会进行毫无意义的数学和复杂的数据结构交互(如果做得正确,如果目的是,这将是一个非常令人头痛的问题)是反转代码而不是功能)。
[2] 作为构建过程的一部分,在源代码上链接一些(不同的)代码混淆器。
[3] 应用软件保护加密狗,如果硬件不存在,它将阻止代码执行,这意味着在进行其余的逆向操作之前需要对加密狗的数据进行物理访问:http://en.wikipedia.org/wiki/Software_protection_dongle
[4] 总有保护器(例如 Themida http://www.oreans.com/themida.php) 你可以得到它,它将能够在之后保护.exe它已经被构建(无论它是如何编译的)。
......这应该让逆向者足够头痛。
但请记住,所有这些也会花费金钱,因此您应该始终权衡您想要实现的目标,然后考虑您的选择。
简而言之:这两种方法同样不安全。除非您使用非编译的 perl-to-exe 生成器,在这种情况下,本机编译的 EXE 获胜。
我希望这有帮助。
Ok, there has been suficient debate on this over the years; and mostly the results are never conclusive ... mainly because it doesn't matter.
For a motivated reverse engineer, both will be the same.
If you are using pseudo exe makers like perl2exe then that will be easier to "decompile" than compiled C, as perl2exe does not compile the perl at all, it's just a bit "hidden" (see http://www.net-security.org/vuln.php?id=2464 ; this is really old, but concept is probably still the same (I haven't researched so don't know for sure, but I hope you get my point) )
I would advise look at the language which is best for the job so maintenance and development of the actual product can be done sensibly and sustainably.
Remember you _can_not_ stop a motivated adversary, you need to make it more expensive to reverse than to write it themselves.
These 4 should make it difficult (but again not impossible)...
[1] Insert noise code (random places, random code) which does pointless maths and complex data structure interaction (if done properly this will be a great headache if the purpose is to reverse the code rather than the functionality).
[2] Chain a few (different) code obfuscators on the source code as part of build process.
[3] Apply a Software protection dongle which will prevent code execution if the h/w is not present, this will mean physical access to the dongle's data is required before rest of the reversing can take place : http://en.wikipedia.org/wiki/Software_protection_dongle
[4] There are always protectors (e.g. Themida http://www.oreans.com/themida.php) you can get which will be able to protect a .exe after it has been built (regardless of how it was compiled).
... That should give the reverser enough headache.
But remember that all this will also cost money, so you should always weigh up what is it that you are trying to achieve and then look at your options.
In short: Both methods are equally insecure. Unless you are using a non-compiling perl-to-exe maker in which case native compiled EXE wins.
I hope this helps.
C 比字节编译的 Perl 代码更难反编译。任何经过字节编译的 Perl 代码都可以反编译。字节编译代码不同于编译的 C 程序中的机器代码。其他一些人建议使用代码混淆技术。这些只是让代码更难阅读的技巧,不会影响反编译 Perl 源代码的难度。反编译的源代码可能更难阅读,但有许多可用的 Perl 反混淆工具,甚至还有 Perl 模块:
http://metacpan.org/pod/B::Deobfuscate
Perl 打包程序(如 Par、PerlAPP 或 Perl2exe)也不提供源代码保护。在某些时候,必须提取源代码,以便 Perl 可以执行脚本。即使是像 PerlAPP 和 Perl2exe 这样的加壳程序,在源代码上尝试一些加密技术,也可以通过调试器来击败:
http://www.perlmonks.org/?displaytype=print;node_id=779752;replies=1
它会阻止别人随意浏览你的 Perl 代码,但即使是加壳程序也必须这样做在运行之前解压脚本。任何有决心的人都可以获得源代码。
反编译 C 是完全不同的事情。一旦编译完成,它现在就是机器代码。大多数 C 反编译器最终都会得到汇编代码,或者某些商业 C 反编译器会采用汇编代码并尝试生成等效的 C 代码,但除非它是一个非常简单的程序,否则很少能够重新创建原始代码。
C is harder to decompile than byte-compiled Perl code. Any Perl code that's been byte-compiled can be decompiled. Byte-compiled code is not machine code like in compiled C programs. Some others suggested using code obfuscation techniques. Those are just tricks to make code harder to read and won't effect the difficulty in decompiling the Perl source. The decompiled source may be harder to read but there are many Perl de-obfuscation tools available and even a Perl module:
http://metacpan.org/pod/B::Deobfuscate
Perl packing programs like Par, PerlAPP or Perl2exe won't offer source code protection either. At some point the source has to be extracted so Perl can execute the script. Even packers like PerlAPP and Perl2exe, which attempt some encryption techniques on the source, can be defeated with a debugger:
http://www.perlmonks.org/?displaytype=print;node_id=779752;replies=1
It'll stop someone from casually browsing your Perl code but even the packer has to unpack the script before it can be run. Anyone who's determined can get the source code.
Decompiling C is a different beast altogether. Once it's compiled it's now machine code. You either end up with Assembly code with most C decompilers or some of the commercial C decompilers will take the Assembly code and try to generate equivalent C code but, unless it's a really simple program, seldom are able to recreate the original code.