如何混淆字符串常量?
我们有一个包含敏感信息的应用程序,我正在尽力保护它。敏感信息包括:
- 主要算法
- 加密/解密算法的密钥
我一直在研究混淆代码,但它似乎没有多大帮助,因为我仍然可以反编译它。然而,我最担心的是,当您反编译代码时,用于加密序列号等的密钥清晰可见,即使它是模糊的。
谁能建议我如何保护这些字符串?
我意识到其中一种方法可能是从应用程序本身中删除任何解密,虽然这可能部分可行,但有些功能必须使用加密/解密 - 主要是保存配置文件并传递“授权”向 DLL 发送令牌以执行计算。
We have an application which contains sensitive information and I'm trying my best to secure it. The sensitive information includes:
- The main algorithm
- The keys for an encryption/decryption algorithm
I've been looking at Obfuscating the code but it doesn't seem to help much as I can still decompile it. However, my biggest concern is that the keys used for encryption of serial numbers etc are clearly visible when you decompile the code, even when it's Obfuscated.
Can anyone suggest how I can secure these strings?
I realise one of the methods might be to remove any decryption from the application itself, while this may be possible in part, there are some features which have to use encryption/decryption - mainly to save a config file and to pass an 'authorisation' token to a DLL to perform a calculation.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
有很多方法可以做你想做的事,但它并不便宜而且并不容易。
值得吗?
在考虑是否保护软件时,我们首先必须回答一些问题:
如果这些对保护您的算法/数据产生了重大的经济要求,那么您应该考虑这样做。例如,如果服务的价值和客户的成本都很高,但对代码进行逆向工程的成本远低于自己开发代码的成本,那么人们可能会尝试它。
因此,这引出了您的问题:
令人沮丧的
混淆
您建议的选项,即混淆代码,会扰乱上面的经济性 - 它试图显着增加他们的成本(上面 5),但不会增加您的成本(6)太多。 加密功能中心对此进行了一些有趣的研究。问题是,与 DVD 加密一样,如果 3、4 和 5 之间存在足够的差异,那么最终就会有人这样做,它注定会失败。
检测
另一种选择可能是隐写术的形式,它允许您识别谁解密了您的数据并开始分发它。例如,如果您的数据中有 100 个不同的浮点值,并且 LSB< 中存在 1 位错误/a> 每个值都不会导致您的应用程序出现问题,请将唯一的(对于每个客户)标识符编码到这些位中。问题是,如果有人可以访问您的应用程序数据的多个副本,那么很明显它们有所不同,从而更容易识别隐藏的消息。
保护
SaaS - 软件即服务
更安全的选择可能是将软件的关键部分作为服务提供,而不是将其包含在您的应用程序中。
从概念上讲,您的应用程序将收集运行算法所需的所有数据,将其打包为对云中服务器(由您控制)的请求,然后您的服务将计算您的结果并将其传回客户端,客户端将显示它。
这将您的所有专有、机密数据和算法保留在您完全控制的域内,并消除了客户端提取其中任何一个的可能性。
明显的缺点是客户端与您的服务提供挂钩,受到您的服务器及其互联网连接的支配。不幸的是,正是出于这些原因,许多人反对 SaaS。从好的方面来说,它们总是能及时更新错误修复,并且您的计算集群的性能可能比运行用户界面的 PC 更高。
虽然这将是一个巨大的步骤,并且可能会产生上述 6 的巨大成本,但它是保持算法和数据完全安全的少数方法之一。
软件保护加密狗
虽然传统的软件保护加密狗可以防止软件盗版,但它们无法防止软件盗版提取代码中的算法和数据。
出现较新的代码移植加密狗(例如 精锐†)能够做你想做的事。使用这些设备,您可以从应用程序中取出代码并将其移植到安全加密狗处理器。与 SaaS 一样,您的应用程序将捆绑数据,将其传递到加密狗(可能是连接到您的计算机的 USB 设备)并读回结果。
与 SaaS 不同,数据带宽不太可能成为问题,但应用程序的性能可能会受到 SDP 性能的限制。
† 这是我通过谷歌搜索找到的第一个示例。
可信平台
另一个可能在未来变得可行的选择是使用 可信平台模块 和 可信执行技术可保护代码关键区域的安全。每当客户安装您的软件时,他们都会向您提供其硬件的指纹,您将为他们提供该特定系统的解锁密钥。
然后,该密钥将允许在受信任的环境中解密和执行代码,其中加密的代码和数据在受信任的平台之外将无法访问。如果可信环境发生任何变化,密钥就会失效,并且功能也会丢失。
对于客户来说,这样做的优点是他们的数据保留在本地,并且他们不需要购买新的加密狗来提高性能,但它有可能会产生持续的支持需求,并且您的客户可能会对此感到沮丧他们必须跳过障碍才能使用他们购买并付费的软件 - 失去你的好感。
结论
你想做的事情并不简单或便宜。它可能需要对软件、基础设施或两者进行大量投资。在开始这条道路之前,您需要知道这是值得投资的。
There are ways to do what you want, but it isn't cheap and it isn't easy.
Is it worth it?
When looking at whether to protect software, we first have to answer a number of questions:
If these produce a significant economic imperative to protect your algorithm/data then you should look into doing it. For instance if the value of the service and cost to customers are both high, but the cost of reverse engineering your code is much lower than the cost of developing it themselves, then people may attempt it.
So, this leads on to your question
Discouragement
Obfuscation
The option you suggest, obfuscating the code, messes with the economics above - it tries to significantly increase the cost to them (5 above) without increasing the cost to you (6) very much. The research by the Center for Encrypted Functionalities has done some interesting research on this. The problem is that as with DVD encryption it is doomed to failure if there is enough of a differential between 3, 4 and 5 then eventually someone will do it.
Detection
Another option might be a form of Steganography, which allows you to identify who decrypted your data and started distributing it. For instance, if you have 100 different float values as part of your data, and a 1bit error in the LSB of each of those values wouldn't cause a problem with your application, encode a unique (to each customer) identifier into those bits. The problem is, if someone has access to multiple copies of your application data, it would be obvious that it differs, making it easier to identify the hidden message.
Protection
SaaS - Software as a Service
A more secure option might be to provide the critical part of your software as a service, rather than include it in your application.
Conceptually, your application would collect up all of the data required to run your algorithm, package it up as a request to a server (controlled by you) in the cloud, your service would then calculate your results and pass it back to the client, which would display it.
This keeps all of your proprietary, confidential data and algorithms within a domain that you control completely, and removes any possibility of a client extracting either.
The obvious downside is that clients are tied into your service provision, are at the mercy of your servers and their internet connection. Unfortunately many people object to SaaS for exactly these reasons. On the plus side, they are always up to date with bug fixes, and your compute cluster is likely to be higher performance than the PC they are running the user interface on.
This would be a huge step to take though, and could have a huge cost 6 above, but is one of the few ways to keep your algorithm and data completely secure.
Software Protection Dongles
Although traditional Software Protection Dongles would protect from software piracy, they wouldn't protect against algorithms and data in your code being extracted.
Newer Code Porting dongles (such as SenseLock†) appear to be able to do what you want though. With these devices, you take code out of your application and port it to the secure dongle processor. As with SaaS, your application would bundle up the data, pass it to the dongle (probably a USB device attached to your computer) and read back the results.
Unlike SaaS, data bandwidth would be unlikely to be an issue, but performance of your application may be limited by the performance of your SDP.
† This was the first example I could find with a google search.
Trusted platform
Another option, which may become viable in the future is to use a Trusted Platform Module and Trusted Execution Technology to secure critical areas of the code. Whenever a customer installs your software, they would provide you with a fingerprint of their hardware and you would provide them with a unlock key for that specific system.
This key would would then allow the code to be decrypted and executed within the trusted environment, where the encrypted code and data would be inaccessible outside of the trusted platform. If anything at all about the trusted environment changed, it would invalidate the key and that functionality would be lost.
For the customer this has the advantage that their data stays local, and they don't need to buy a new dongle to improve performance, but it has the potential to create an ongoing support requirement and the likelihood that your customers would become frustrated with the hoops they had to jump through to use software they have bought and paid for - losing you good will.
Conclusion
What you want to do is not simple or cheap. It could require a big investment in software, infrastructure or both. You need to know that it is worth the investment before you start along this road.
如果有人有足够的动机去打破它,所有的努力都将是徒劳的。目前还没有人能够弄清楚这一点,即使是最大的软件公司也是如此。
我并不是说这是严厉的批评,只是你需要意识到你试图实现的目标目前被认为是不可能的。
混淆是通过模糊来实现安全,它确实有一些好处,因为它会阻止最无能的黑客尝试,但很大程度上这是浪费精力,也许可以更好地花在其他开发领域。
在回答您最初的问题时,您将遇到智能编译器的问题,它们可能会自动将字符串拼凑到编译的应用程序中,从而消除一些混淆工作作为编译优化。它也很难维护,所以我会重新考虑你的风险分析模型,也许会接受它可以被破解的事实,如果它有任何价值的话,可能会被破解。
All efforts will be futile if someone is motivated enough to break it. No one has managed to figure this out yet, even the biggest software companies.
I'm not saying this as a scathing criticism, just you need to be aware of what your trying to achieve is currently assumed to be impossible.
Obfuscation is security through obscurity, it does have some benefit as it will deter the most incompetent of hacker attempts, but largely it is wasted effort that could perhaps be better spent in other areas of development.
In answer to your original question, you are going to run into problems with intelligent compilers, they might automatically piece together the string into the compiled application removing some of your obfuscation efforts as a compilation optimisations. It would be hard to maintain as well, so I would reconsider your risk analysis model and perhaps resign yourself to the fact it can be cracked and if it has any value probably will be.
我最近读到了一个非常简单的OP解决方案。
简单地将常量声明为只读字符串,而不是 const 字符串。就这么简单。显然,const 变量被写入二进制文件中的堆栈区域,但以纯文本形式写入,而只读字符串被添加到构造函数中,并以字节数组而不是文本形式写入。
也就是说,如果您搜索它,您将找不到它。
这就是问题所在,对吧?
I recently read a very simple solution to OP.
Simple declare your constants as readonly string, not const string. That simple. Apparently const variables get written to a stack area in the binary but written as plain text whereas readonly strings get added to the constructor and written as a byte array instead of text.
I.e. If you search for it, you won't find it.
That was the question, right?
使用自定义算法(通过模糊实现安全性?),结合将密钥存储在应用程序内,是根本不安全。
如果您要存储某种密码,则可以使用单向哈希函数来确保解密的数据在代码中的任何位置都不可用。
如果您需要使用对称加密算法,请使用众所周知且经过测试的算法,例如 AES-256。但密钥显然不能存储在您的代码中。
[编辑]
既然您提到了序列号的加密,我相信您是一种单向哈希函数(例如SHA-256) 确实更适合您的需求。
这个想法是在构建期间将您的序列号哈希为哈希表示形式,该表示形式无法逆转(SHA-256 被认为是 相当安全的算法,与 MD5 相比)。在运行时,您只需对用户输入应用相同的哈希函数,并比较哈希值即可。这样攻击者就无法获得任何实际的序列号。
Using a custom algorithm (security through obscurity?), combined with storing the key inside the application, is simply not secure.
If you are storing some kind of a password, then you can use a one-way hashing function to ensure that decrypted data is unavailable anywhere in your code.
If you need to use a symmetric encryption algorithm, use a well known and tested one, like AES-256. But the key obviously cannot be stored inside your code.
[Edit]
Since you mentioned encryption of serial numbers, I believe you a one-way hashing function (like SHA-256) would really suit your needs better.
The idea is to hash your serial numbers during build time into their hashed representations, which cannot be reversed (SHA-256 is considered to be a pretty safe algorithm, compared to, say, MD5). During run time, you only need to apply the same hash function to the user input, and compare hashed values only. This way none of the actual serial numbers are available to the attacker.
@Tom Gullen 已经给出了正确的答案。
我只是得到了一些关于如何让用户更难访问你的密钥和算法的建议。
至于算法:不要在编译时编译算法,而是在运行时编译。为了能够做到这一点,您需要指定一个包含算法方法的接口。该接口用于运行它。然后将算法的源代码添加为加密字符串(嵌入资源)。在运行时对其进行解密并使用 CodeDom 将其编译为 .NET 类。
密钥:通常的方法是将密钥的分散部分存储在应用程序的不同位置。将每个部分存储为
byte[]
而不是string
,以便更难找到它们。如果您的所有用户都有互联网连接:改为使用 SSL 获取算法源代码和密钥。
请注意,所有内容都将在运行时拼凑在一起,任何有更多知识的人都可以检查/调试您的应用程序以找到所有内容。
@Tom Gullen have given a proper answer.
I merely got some suggestions on how you can make it harder for the users to access your keys and algorithm.
As for the algorithm: Do not compile your algorithm at compile time, but at runtime. To be able to do this you need to specify an interface which contains the methods for the algorithm. The interface is used to run it. Then add the source code for the algorithm as an encrypted string (embedded resource). Decrypt it at runtime and use CodeDom to compile it into a .NET class.
Keys: The usual way is to store spread parts of your key in different places in the application. Store each part as
byte[]
instead ofstring
to make it a bit harder to find them.If all your users have an internet connection: Fetch the algorithm source code and the keys using SSL instead.
Note that everything will be pieced together at runtime, anyone with a bit of more knowledge can inspect/debug your application to find everything.
我不认为你可以轻易地混淆字符串常量,所以如果可能的话,不要使用它们:)你可以使用程序集资源,你可以根据需要加密它们。
i dont think you can easily obfuscate string constants, so if possible, dont use them :) you can use assembly resources instead, those you can encrypt however you want.
取决于您想要做什么,但是您可以使用非对称加密吗?这样你只需要存储公钥而无需混淆它们。
Depends what you're trying to do but can you use asymmetric encryption? That way you only need to store public keys with no need to obfuscate them.