人们为什么要反汇编 .NET (CLR) 二进制文件?
我对 .NET 有点陌生,但对编程并不陌生,而且我对反汇编已编译的 .NET 代码的趋势和兴奋感到有些困惑。 这似乎毫无意义。
.NET 的高度易用性是我使用它的原因。 我在资源有限的环境中编写了 C 语言和真实的(硬件处理器)汇编。 这就是为什么要花精力在这么多细致的细节上,以提高效率。 在 .NET 领域,如果您浪费时间深入研究实现的最神秘的细节,那么它就违背了高级面向对象语言的目的。 在使用 .NET 的过程中,我调试了常见的性能问题和奇怪的竞争条件,并且这一切都是通过阅读我自己的源代码来完成的,从来没有想过编译器正在生成什么中间语言。 例如,很明显,for(;;) 循环将比数组上的 foreach() 更快,考虑到 foreach() 将使用带有方法的枚举对象调用前进到下一次,而不是简单地递增变量,这很容易通过运行几百万次的紧密循环来证明(无需反汇编)。
真正让反汇编 IL 变得愚蠢的是,它不是真正的机器代码。 这是虚拟机代码。 我听说有些人实际上喜欢移动指令来优化它。 你在开玩笑吧? 即时编译的虚拟机代码甚至无法以本机编译代码的速度执行简单的紧凑 for(;;) 循环。 如果您想充分利用处理器的每个周期,请使用 C/C++ 并花时间学习真正的汇编。 这样,您花在理解大量底层细节上的时间实际上是值得的。
那么,除了有太多空闲时间之外,人们为什么还要反汇编 .NET (CLR) 二进制文件呢?
I'm somewhat new to .NET but not new to programming, and I'm somewhat puzzled at the trend and excitement about disassembling compiled .NET code. It seems pointless.
The high-level ease of use of .NET is the reason I use it. I've written C and real (hardware processor) assembly in environments with limited resources. That was the reason to spend the effort on so many meticulous details, for efficiency. Up in .NET land, it kind of defeats the purpose of having a high-level object-oriented language if you waste time diving down into the most cryptic details of the implementation. In the course of working with .NET, I have debugged the usual performance issues an odd race conditions, and I've done it all by reading my own source code, never once having any thought as to what intermediate language the compiler is generating. For example, it's pretty obvious that a for(;;) loop is going to be faster than a foreach() on an array, considering that foreach() is going to use an enumeration object with a method call to advance to each next time instead of a simple increment of a variable, and this is easy to prove with a tight loop run a few million times (no disassembly required).
What really makes disassembling IL silly is the fact that's it's not real machine code. It's virtual machine code. I've heard some people actually like to move instructions around to optimize it. Are you kidding me? Just-in-time compiled virtual machine code can't even do a simple tight for(;;) loop at the speed of natively compiled code. If you want to squeeze every last cycle out of your processor, then use C/C++ and spend time learning real assembly. That way the time you spend understanding lots of low-level details will actually be worthwhile.
So, other than having too much time on their hands, why do people disassemble .NET (CLR) binaries?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(13)
当您逐渐掌握特定环境时,了解各种高级语言的编译器实际上对源代码执行的操作是一项重要技能,就像了解数据库引擎将如何计划执行您的各种 SQL 查询一样。可以扔给他们。 以熟练的方式使用一定程度的抽象,熟悉(至少)低于它的水平是一件很好的事情; 请参阅关于我关于该主题的演讲的一些注释抽象和该演讲的幻灯片,以及 Joel Spolsky 的“定律”我在演讲中提到的“泄漏抽象”。
Understanding what compilers for various high-level languages are actually doing with your sources is an important skill to acquire as you move towards mastery of a certain environment, just like, say, understanding how DB engines will plan to execute various kinds of SQL queries you can toss at them. To use in a masterful way a certain level of abstraction, familiarity with (at least) the level below it is quite a good thing to acquire; see e.g. some notes on my talk on the subject of abstraction and the slides for that talk, as well as Joel Spolsky's "law of leaky abstractions" that I refer to in the talk.
当源代码丢失或特定标记版本中版本控制中的内容似乎与附带的二进制文件不对应时,我就使用了它。
I've used it when the source code has been lost or what's in version control in a particular tagged release doesn't appear to correspond to the shipped binary.
在刚刚完成 4 天的安全软件开发课程后,我想说很多人都会反编译源代码以查找其中的任何漏洞。 了解客户端应用程序的来源有助于规划对服务器的攻击。
当然,很少有公用事业之类的,就不会有这样的问题。
如果我没记错的话,有一个应用程序可以混淆你的 .net 二进制文件。 我相信它被称为 dotfuscator。
After just completing a 4 day course in secure software development, I would say that many people would decompile source to find any vulnerabilities in it. Knowing the source of a client application could help in planning an attack on a server.
Of course, little utilities and such, there wouldn't be any such issues.
If i remember correctly, there is an app out there that obfuscates your .net binaries. I believe it was called dotfuscator.
了解如何使用文档不完善的界面。
(遗憾的是,在基于 .net 的工具(例如 BizTalk 或 WCF)中,仅生成通用文档的情况太频繁了,因此有时需要反汇编为 C# 来查看方法正在做什么以及在什么上下文中使用它)
To understand how to use a poorly documented interface.
(sadly it's much too frequent in .net based tools such as BizTalk or WCF to only have generic generated documentation, so disassembling to C# is sometimes necessary to see what a method is doing, in which context to use it)
每种 .NET 语言都实现自己的 CLR 功能子集。 了解 CLR 能够完成您当前使用的语言所不具备的功能,可以让您就是否更改语言、发出 IL 或寻找其他方法做出明智的决定。
你认为人们做这样的事情的唯一原因是因为他们有太多的时间,这种假设是侮辱性的,也是没有受过教育的。
Each .NET language implements its own subset of CLR functionality. Knowing that the CLR is capable of things that the language you're currently using isn't can let you make an informed decision on whether to change languages or emit IL or find another way.
Your assumption that the only reason people do things like this is because they have too much time is insulting and uneducated.
找到库错误并找出解决方法。
例如:如果没有反射,您就无法远程异常并重新抛出它而不杀死它的回溯。 然而框架可以做到这一点。
To locate library bugs and figure out how to work around them.
For example: without reflection you cannot remote an exception and rethrow it without slaughtering its backtrace. However the framework can do it.
从您的问题来看,您似乎不知道 Reflector 将 CLR 程序集反汇编回 C# 或 VB,因此您几乎看到的是原始代码,而不是 IL!
From your question it looks like you do not know that Reflector disassembles CLR assemblies back to C# or VB so you pretty much see original code, not IL!
实际上,int[] 上的 foreach 被编译成 for 语句。 如果我们将它转换为可枚举的,那么你是对的,它使用一个枚举器。 然而,奇怪的是,由于没有增加 temp int,所以它变得更快。为了证明这一点,我们使用基准测试与反编译器相结合来增加理解......
所以我认为通过问这个问题,你真的你自己回答了。
如果此基准与您的不同,请告诉我。 我用对象数组、空值等进行了尝试...
代码:
结果:
反编译(请注意,空 foreach 必须添加变量赋值...我们的空 for 循环没有但显然需要的东西):
Actually, a foreach over an int[] gets compiled into a for statement. If we cast it to an enumerable, you are right, it uses an Enumerator. HOWEVER, that strangely makes it FASTER since there is no incrementing the temp int. To prove this, we use benchmarking coupled with the decompiler for added understanding...
So I think by asking this question, you really answered it yourself.
If this benchmark differs from yours, please let me know how. I tried it with object arrays, nulls, etc, etc...
code:
results:
decompiled (note that the empty foreach had to add a variable assignment... something our empty for loop didn't but obviously needed):
学习。
文章很好,但它们没有提供生产代码。 如果没有.NET Reflector,我需要花费几周的时间才能弄清楚 Microsoft 如何在
FileSystemWatcher
组件中实现事件。 相反,只用了几个小时,我就完成了我的FileSystemSearcher
组件。To learn.
Articles are nice, but they do not present production code. Without .NET Reflector, it would have taken me a couple of weeks to figure out how Microsoft implemented events in the
FileSystemWatcher
component. Instead, it only a few hours and I was able to finish myFileSystemSearcher
component.我自己经常想知道这一点...:)
有时需要了解特定的库方法如何工作或者它到底为什么这样工作。 可能存在这种情况:有关此功能的文档含糊不清,或者存在一些需要调查的奇怪行为。 在这种情况下,有些人会去反汇编库来查看某些方法内部进行了哪些调用。
至于优化我从来没有听说过。 我认为尝试优化 MIL 最终是愚蠢的,因为它将被输入到翻译器中,翻译器将以相当高的效率生成真实的机器代码,并且您的“优化”无论如何都可能会丢失。
I myself often wonder this... :)
Sometimes there is a need to understand how a specific library method works or why exactly it works this way. There maybe a situation when the documentation on this function is vague or there is some odd behavior that needs investigation. In this case some people go to disassemble libraries to look what calls inside certain methods are made.
As for optimization I never heard of this. I think it is ultimately stupid trying to optimize MIL, since it will be then fed to a translator which will generate the real machine code with a pretty good efficiency and your "optimizations" could get lost anyway.
要了解底层系统是如何实现的,请了解 IL 中的高级代码相当于什么,规避许可......
To understand how the underlying system is implemented, understand what's the equivalent of a high level code in IL, circumvent licensing...
我在以下更多情况中使用了它:
I have used it in the following, an more, cases:
人们没有提到的是,如果您使用像 PostSharp 这样的编译时编织 AOP 框架,那么反射器会非常有用。
Something that folks haven't mentioned is that reflector comes in super useful if you use a compile time weaving AOP framework like PostSharp.