混淆的效果如何?
另一个问题,即 Best .NET 混淆工具/策略,询问混淆是否容易使用工具来实现。
我的问题是,混淆有效吗?在回复 这个答案,有人说“如果你担心源代码被盗……混淆对于真正的破解者来说几乎是微不足道的”。
我查看了 Dotfuscator 社区版的输出:它对我来说看起来很混乱! 我不想维持这个!
我知道简单地“破解”混淆的软件可能相对容易:因为您只需要找到软件中实现您想要破解的任何内容的位置(通常是许可证保护),然后添加跳转以跳过它。
如果担心的不仅仅是最终用户或“盗版者”的破解:如果担心的是“源盗窃”,即如果您是软件供应商,而您担心的是另一个供应商(潜在的竞争对手)的反向-设计你的源代码,然后他们可以在自己的产品中使用或添加到他们自己的产品中……简单的混淆在多大程度上足以或不充分地防范这种风险?
第一次编辑:
有问题的代码大约是 20 KLOC,它在最终用户计算机上运行(用户控件,而不是远程服务)。
如果混淆确实“对于真正的黑客来说几乎是微不足道的”,我想深入了解它为什么无效(而不仅仅是“无效的程度”)。
第二次编辑:
我并不担心有人逆转算法:更担心他们将算法的实际实现(即源代码)重新利用到他们自己的产品中。
考虑到开发 20 KLOC 需要几个月的时间,那么将其全部反混淆需要比这更多还是更少的时间(几个月)?
是否有必要对某些东西进行反混淆以便“窃取”它:或者一个理智的竞争对手可能会在仍然混淆的情况下将其批发到他们的产品中,接受原样这是一个维护噩梦,并希望它几乎不需要维护? 如果这种情况存在,那么混淆的.Net 代码是否比编译的机器代码更容易受到这种情况的影响?
大多数混淆“军备竞赛”的主要目的是否是防止人们“破解”某些东西(例如查找并删除实现许可保护/执行的代码片段),而不是防止“源代码盗窃”?
A different question, i.e. Best .NET obfuscation tools/strategy, asks whether obfuscation is easy to implement using tools.
My question though is, is obfuscation effective? In a comment replying to this answer, someone said that "if you're worried about source theft ... obfuscation is almost trivial to a real cracker".
I've looked at the output from the Community Edition of Dotfuscator: and it looks obfuscated to me! I wouldn't want to maintain that!
I understand that simply 'cracking' obfuscated software might be relatively easy: because you only need to find whichever location in the software implements whatever it is you want to crack (typically the license protection), and add a jump to skip that.
If the worry is more than just cracking by an end-user or a 'pirate' though: if the worry is "source theft" i.e. if you're a software vendor, and your worry is another vendor (a potential competitor) reverse-engineering your source, which they could then use in or add to their own product ... to what extent is simple obfuscation an adequate or inadequate protection against that risk?
1st edit:
The code in question is about 20 KLOC which runs on end-user machines (a user control, not a remote service).
If obfuscation really is "almost trivial to a real cracker", I'd like some insight into why it's ineffective (and not just "how much" it's not effective).
2nd edit:
I'm not worried about someone's reversing the algorithm: more worried about their repurposing the actual implementation of the algorithm (i.e. the source code) into their own product.
Figuring that 20 KLOC is several month's work to develop, would it take more or less than this (several months) to deobfuscate it all?
Is it even necessary to deobfuscate something in order to 'steal' it: or might a sane competitor simply incorporate it wholesale into their product while still obfuscated, accept that as-is it's a maintenance nightmare, and hope that it needs little maintenance? If this scenario is a possibility then is obfuscated .Net code any more vulnerable to this than compiled machine code is?
Is most of the obfuscation "arms race" aimed mostly at preventing people people from even 'cracking' something (e.g. finding and deleting the code fragment which implements licensing protection/enforcement), more than at preventing 'source theft'?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
我已经讨论了为什么我不认为混淆是防止破解的有效手段:
保护.NET代码免受逆向工程
但是,您的问题是特别是关于源盗窃,这是一个有趣的话题。 在 Eldad Eiliams 的书“逆向:逆向工程的秘密”中,作者在前两章中讨论了源盗窃作为逆向工程背后的原因之一。
基本上,归根结底,你成为源盗窃目标的唯一机会是,如果你有一些与你的领域相关的非常具体的、难以设计的算法,这些算法可以让你在竞争中占据优势。 这几乎是唯一一次尝试对应用程序的一小部分进行逆向工程具有成本效益的情况。
因此,除非您拥有一些您不希望竞争对手拥有的绝密算法,否则您无需担心源代码被盗。 从应用程序中反转任何大量源代码所涉及的成本很快就会超过从头开始重写它的成本。
即使您确实有一些您不希望他们拥有的算法,您也无法采取太多措施来阻止意志坚定且技术熟练的个人获取它(如果应用程序正在他们的计算机上执行)。
一些常见的反逆向措施包括:
然而,加壳程序可以被解包,并且混淆并不会真正妨碍那些想要查看您的应用程序正在做什么的人。 如果该程序在用户计算机上运行,则它很容易受到攻击。
最终,它的代码必须作为机器代码执行,通常需要启动调试器、设置一些断点并监视相关操作期间正在执行的指令,并花一些时间仔细研究这些数据。
您提到您花了几个月的时间为您的应用程序编写约 20kLOC。 如果您采取最低限度的预防措施,则将应用程序中的等效 20kLOC 反转为可用源需要几乎一个数量级的时间。
这就是为什么从您的应用程序中逆向小型的、行业特定的算法才具有成本效益。 别的什么,都不值得。
以下面虚构的例子为例:假设我刚刚为 iTunes 开发了一款全新的竞争应用程序,该应用程序有大量的附加功能。 假设需要几个 100k LOC 和 2 年的时间来开发。 我的一个关键功能是一种根据您的音乐聆听品味向您提供音乐的新方式。
苹果(他们就是盗版者)听到了这一消息,并认为他们真的很喜欢你的音乐推荐功能,所以他们决定扭转它。 然后,他们将只专注于该算法,逆向工程师最终将提出一种可行的算法,在给定相同数据的情况下提供等效的建议。 然后他们在自己的应用程序中实现上述算法,称之为“天才”,并赚取下一个 10 万亿美元。
这就是源盗窃减少的原因。
没有人会坐在那里反转所有 100k LOC 来窃取已编译应用程序的大量内容。 这样做成本太高,而且太耗时。 大约 90% 的情况下,他们会逆向那些无聊的、非行业秘密的代码,这些代码只是处理按钮按下或处理用户输入。 相反,他们可以聘请自己的开发人员以更少的钱从头开始重写大部分内容,并简单地反转难以设计但给你带来优势的重要算法(即音乐建议功能)。
I've discussed why I don't think Obfuscation is an effective means of protection against cracking here:
Protect .NET Code from reverse engineering
However, your question is specifically about source theft, which is an interesting topic. In Eldad Eiliams book, "Reversing: Secrets of Reverse Engineering", the author discusses source theft as one reason behind reverse engineering in the first two chapters.
Basically, what it comes down to is the only chance you have of being targeted for source theft is if you have some very specific, hard to engineer, algorithm related to your domain that gives you a leg up on your competition. This is just about the only time it would be cost-effective to attempt to reverse engineer a small portion of your application.
So, unless you have some top-secret algorithm you don't want your competition to have, you don't need to worry about source theft. The cost involved with reversing any significant amount of source-code out of your application quickly exceeds the cost of re-writing it from scratch.
Even if you do have some algorithm you don't want them to have, there isn't much you can do to stop determined and skilled individuals from getting it anyway (if the application is executing on their machine).
Some common anti-reversing measures are:
However, packers can be unpacked, and obfuscation doesn't really hinder those who want to see what you application is doing. If the program is run on the users machine then it is vulnerable.
Eventually its code must be executed as machine code and it is normally a matter of firing up debugger, setting a few breakpoints and monitoring the instructions being executed during the relevant action and some time spent poring over this data.
You mentioned that it took you several months to write ~20kLOC for your application. It would take almost an order of magnitude longer to reverse those equivalent 20kLOC from your application into workable source if you took the bare minimum precautions.
This is why it is only cost-effective to reverse small, industry specific algorithms from your application. Anything else and it isn't worth it.
Take the following fictionalized example: Lets say I just developed a brand new competing application for iTunes that had a ton of bells and whistles. Let say it took several 100k LOC and 2 years to develop. One key feature I have is a new way of serving up music to you based off your music-listening taste.
Apple (being the pirates they are) gets wind of this and decides they really like your music suggest feature so they decide to reverse it. They will then hone-in on only that algorithm and the reverse engineers will eventually come up with a workable algorithm that serves up the equivalent suggestions given the same data. Then they implement said algorithm in their own application, call it "Genius" and make their next 10 trillion dollars.
That is how source theft goes down.
No one would sit there and reverse all 100k LOC to steal significant chunks of your compiled application. It would simply be too costly and too time consuming. About 90% of the time they would be reversing boring, non-industry-secretive code that simply handled button presses or handled user input. Instead, they could hire developers of their own to re-write most of it from scratch for less money and simply reverse the important algorithms that are difficult to engineer and that give you an edge (ie, music suggest feature).
混淆是通过模糊实现安全的一种形式,虽然它提供了一些保护,但安全性显然是相当有限。
出于您所描述的目的,模糊性肯定会有所帮助,并且在许多情况下,可以充分保护代码被盗的风险。 然而,如果有足够的时间和精力,代码肯定仍然存在“未混淆”的风险。 消除整个代码库的混淆实际上是不可能的,但如果感兴趣的一方只想确定您如何完成实现的某些特定部分,则风险会更高。
最后,只有您才能确定您或您的企业是否值得冒这个风险。 但是,在许多情况下,如果您希望将产品出售给客户以在他们自己的环境中使用,这是您唯一的选择。
关于“为什么它无效” - 原因是因为无论使用什么混淆技术,破解者都可以使用调试器来查看代码在哪里运行。 然后,他们可以使用它来解决您设置的任何保护机制,例如序列号或“电话主页”系统。
我不认为该评论实际上是指“代码盗窃”,因为您的代码将被窃取并在另一个项目中使用。 因为他们使用了“cracker”这个词,我相信他们谈论的是软件盗版方面的“盗窃”。 破解者专门研究保护机制; 他们对将您的源代码用于其他目的不感兴趣。
Obfuscation is a form of security through obscurity, and while it provides some protection, the security is obviously quite limited.
For the purposes you describe, obscurity can certainly help, and in many cases, is an adequate protection against the risk of code theft. However, there is certainly still a risk that the code will be "unobfuscated" given sufficient time and effort. Unobfuscating the entire codebase would be effectively impossible, but if an interested party only wishes to determine how you did some certain part of your implementation, the risks are higher.
In the end, only you can determine whether the risk is worth it for you or your business. However, in many cases, this is the only option you have if you wish to sell your product to customers to use in their own environments.
Regarding the "why its ineffective" - the reason is because a cracker can use a debugger to see where your code is running regardless of what obfuscation technique is used. They can then use this to work around any protection mechanisms you've put in place, such as a serial number or "phone home" system.
I don't believe the comment was really referencing "code theft" in the sense that your code is going to be stolen and used in another project. Because they used the word "cracker," I believe they were talking about "theft" in terms of software piracy. Crackers specialize in working around protection mechanisms; they're not interested in using your source code for some other purpose.
大多数人倾向于编写看似模糊的代码,但这并没有阻止破解者,那么有什么区别呢?
编辑:
好的,严肃的时间。 如果您确实想制作一些难以破坏的东西,请研究多态编码(不要与多态性混淆)。 编写能够自我变异的代码,破坏代码是一件非常痛苦的事情,并且会让他们不断猜测。
http://en.wikipedia.org/wiki/Polymorphic_code
最后,没有什么是不可能的反向工程。
Most people tend to write what appears to be obfuscated code and that hasn't stopped the crackers so what's the difference?
EDIT:
Ok, serious time. If you really want to make something that's hard to break, look into polymorphic coding (not to be confused with polymorphism). Make code that is self-mutating, and it is a serious pain to break and will keep them guessing.
http://en.wikipedia.org/wiki/Polymorphic_code
In the end, nothing is impossible to reverse engineer.
您担心人们窃取您产品中使用的特定算法。 要么你是Fair Isaac,要么你需要使用 x++; 以外的方式来让自己脱颖而出。 如果您解决了代码中的某些问题,而其他人花几个小时无法解决该问题,那么您应该拥有计算机科学博士学位和/或专利来保护您的发明。 99% 的软件产品不因为算法而不成功或特殊。 它们之所以成功,是因为它们的作者付出了艰辛的努力,将众所周知且易于理解的概念整合到一个产品中,满足客户的需求,并以比支付其他人重复做同样产品的价格更便宜的价格出售该产品。
You are worried about people stealing the specific algorithms used in your product. Either you are Fair Isaac or you need to differentiate yourself using more than the way you x++;. If you solved some problem in code that cannot be solved by someone else puzzling over it for a few hours, you should have a PhD in computer science and/or patents to protect your invention. 99% of software products are not successful or special because of the algorithms. They are successful because their authors did the heavy lifting to put together well-known and easily understood concepts into a product that does what their customers need and sell it for cheaper than it would cost to pay others to re-do the same.
这样看; SO 团队对您输入问题的 WMD 编辑器进行了逆向工程,以修复一些错误并进行一些增强。 该代码被混淆了。 你永远无法阻止聪明的人攻击你的代码,你所能期望的最好的结果就是让诚实的人保持诚实并使其难以被破坏。
Look at it this way; the WMD editor that you typed your question into was reverse engineered by the SO team in order to fix some bugs and make som enhancements. That code was obfuscated. You are never going to stop intelligent motivated people from hacking your code, the best you can hope for is to keep the honest people honest and make it somewhat hard to break.
如果您曾经看过反汇编程序的输出,您就会意识到为什么混淆总是会失败。
If you've ever seen the output from a disassembler, you'd realize why obfuscation will always fail.
我倾向于认为,如果你想保护你的源代码,混淆实际上并不是很有效。 对于该领域真正的专家(我不是指这里的软件专家或破解者,我的意思是代码功能领域的专家),通常他或她不需要看到代码,只需查看它如何对特殊输入、边缘情况等做出反应,以了解如何实现与该受保护功能等效的副本或代码。 因此,这对于保护您的专有技术没有多大帮助。
I tend to think that obfuscation is really not very effective if you want to protect your source. For the real expert in the field (I don't mean a software expert here or a cracker, I mean the expert in the field of the functionality of the code), usually he or she doesn't need to see the code, just see how it does react against special inputs, edge cases, etc., to get an idea of how to implement a copy or a code that is equivalent to that protected functionality. Thus, not very helpful in protecting your know-how.
如果您拥有必须不惜一切代价保护的代码中的 IP,那么您应该在安全的远程服务器上将软件的功能作为服务提供。
良好的混淆可以在一定程度上保护您,但这完全取决于破解它所需的努力与拥有代码的“奖励”。 如果您正在谈论阻止普通商业用户,那么商业混淆器就足够了。
If you have IP in code which must be protected at all costs, then you should make your software's functionality available as a service, on a secured remote server.
Good obfuscation will protect you up to a point, but it's all about the amount of effort required to break it against the 'reward' of having the code. If you are talking about stopping your average business user, then a commercial obfuscator should be sufficient.
简短的回答是“是”和“否”; 这完全取决于您想要阻止什么。 Secure Programming Cookbook 在第 653 页对此有一些有趣的评论(不方便获取)在谷歌图书预览中)。 它将防篡改分为四类:零日(减慢攻击者的速度,使他们需要很长时间才能完成他们想要的事情)、保护专有算法以防止逆向工程、“因为我可以”攻击并且我可以'不记得第四个了。 你必须问我想阻止什么,如果你真的担心有人查看你的源代码,那么混淆就有一定的价值。 单独使用它通常只会让试图破坏您的应用程序的人感到烦恼,并且像任何良好的安全措施一样,与其他防篡改技术结合使用时效果最佳。
Short answer is yes and no; it depends entirely on what you are trying to prevent. Section twelve of Secure Programming Cookbook has some interesting comments on this on page 653 (which is conveniently unavailable in google books preview). It classifies anti-tampering into four categories: Zero day (slowing down an attacker so it takes them a long time to accomplish what they want), protection of a proprietary algorithm to prevent reverse engineering, "because I can" attacks and I can't remember the 4th one. You have to ask what am I trying to prevent, and if you are really concerned about an individual getting a look at your source code then obfuscation has some value. Used on it's own it's usually just an annoyance to someone attempting to mess with your application and like any good security measure it works best when used in combination with other anti-tampering techniques.