用于文件解密的神经网络 - 可能吗?
我之前已经使用过神经网络,并且了解它们的大部分基础知识。我特别有常规多层感知器的经验。现在有人问我以下情况是否可行,并且不知何故觉得解决这个问题很有挑战性:)
情况
让我们假设我有一个可以加密和解密常规 ASCII 编码文件的程序。我完全不知道具体的加密方法或使用的密钥。我所知道的是,该程序可以反转加密,从而读取原始内容。
我想要什么?
现在我的问题是:你认为有可能训练(某种)神经网络以可接受的努力复制精确的解密算法吗?
到目前为止我的想法和工作
我在加密方面没有太多经验。有人建议只采用 AES 加密,这样我就可以编写一个小程序来批量加密 ASCII 编码文件。因此,这将涵盖监督学习的学习数据收集。使用加密文件作为神经网络的输入和原始文件作为训练数据,我可以训练任何网络。但现在我陷入困境,你建议如何将输入和输出数据提供给神经网络。那么你们会使用多少个输入和输出神经元呢? 由于我不知道加密文件会是什么样子,因此以二进制形式传递数据可能是最好的主意。但我不能只使用数千个输入和输出神经元并同时传递所有位。也许是循环网络并一点一点地喂食?听起来也不是很有效。
另一个问题是,你无法部分解密 - 这意味着你不能大致正确。你要么做对了,要么做错了。换句话说,最终净误差必须为零。根据我迄今为止在人工神经网络方面的经验,这对于大型网络来说几乎是不可能实现的。那么这个问题可以解决吗?
I have already worked with Neural Networks before and know most basics about them. I especially have experience with regular Multi-Layer-Perceptrons. I was now asked by someone if the following is possible and somehow feel challenged to master the problem :)
The Situation
Let's assume I have a program that can encrypt and decrypt regular ASCII-Coded Files. I have no idea at all about the specific encryption method nor the key used. All I know is, that the program can reverse the encryption and thus read the original content.
What I want?
Now my question is: Do you think it is possible to train (some kind of) Neural Network which replicates the exact decryption-Algorithm with acceptable effort?
My ideas and work so far
I have not much of experience with encryption. Someone suggested just to assume AES encryption, so I could write a little program to batch-encrypt ASCII-Coded files. So this would cover the gathering of learning data for supervised learning. Using the encrypted files als input for the neural networks and the original files as training data I could train any net. But now I am stuck, how would you suggest to feed the input and output data to the Neural Network. So how many Input and Output-Neurons would you guys use?
Since I have no Idea what the encrypted files would look like, it might be the best idea to pass the data in binary form. But I can't just use thousands of input and output-neurons and pass all bits at the same time. Maybe recurrent networks and feed one bit after another? Also doesn't sound very effective.
Another problem is, that you can't decrypt partially - meaning you can't be roughly correct. You either got it right or not. To put it other words, in the end the net error has to be zero. From what I have experienced so far with ANN, this is nearly impossible to achieve for big networks. So is this problem solvable?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这正是问题所在。神经网络可以近似连续函数,这意味着输入值的微小变化会导致输出值,而加密函数/算法被设计为尽可能不连续。
That's exactly the problem. Neural Networks can approximate continuous functions, meaning that a small change in the input values causes a small change in the output value, while encryption functions/algorithm are designed to be as non-continuous as possible.
我认为如果这有效的话,人们就会这样做。据我所知,他们没有这样做。
说真的,如果你可以在神经网络中投入大量明文/密文对并构造一个解密器,那么这将是一种非常有效的已知明文或选择明文攻击。然而,我们针对当前密码的此类攻击根本不是很有效。这意味着要么整个开放密码社区错过了这个想法,要么它不起作用。我意识到这远不是一个结论性的论点(它实际上是来自权威的论点),但我认为这表明这种方法行不通。
I think if that worked, people would be doing it. As far as i know, they aren't doing it.
Seriously, if you could just throw a lot of plaintext/ciphertext pairs at a neural network and construct a decrypter, then it would be a very effective known-plaintext or chosen-plaintext attack. Yet the attacks of that kind we have against current ciphers are not very effective at all. That means that either the entire open cryptographic community has missed the idea, or it doesn't work. I realise that this is far from a conclusive argument (it's effectively an argument from authority), but i would suggest it's indicative that this approach won't work.
假设您有两个密钥 A 和 B,分别将密文 K 转换为 Pa 和 Pb。 Pa 和 Pb 都是密文 K 的“正确”解密。因此,如果您的神经网络只有 K 作为输入,则它无法实际预测正确答案。大多数加密破解方法都涉及查看结果是否符合您的要求。例如,可读文本更有可能是明文,而不是明显随机的垃圾。神经网络需要善于根据用户期望的内容来猜测是否得到正确的答案,而这永远不可能 100% 正确。
然而,神经网络理论上可以学习任何函数。因此,如果您有足够的密文/明文对用于特定的加密密钥,那么足够复杂的神经网络就可以学习精确地该特定密钥的解密算法。
另外关于连续与离散的问题,这个也基本解决了。输出具有类似 sigmoid 函数的功能,因此您只需选择 1 vs 0 的阈值即可。.5 即可。经过足够的训练,理论上您可以 100% 获得 1 vs 0 的正确答案。
上面假设您有一个足够大的网络来一次处理整个文件。对于任意大小的密文,您可能需要使用 RNN 一次执行块操作,但我不知道它是否仍然具有与传统网络相同的“计算任何函数”属性。
这并不是说这样的解决方案实际上是可行的。
Say you have two keys A and B that translate ciphertext K into Pa and Pb respectively. Pa and Pb are both "correct" decryptions of ciphertext K. So if your neural network has only K as input, it has no means of actually predicting the correct answer. Most ways of encryption cracking involve looking at the result to if it looks like what you're after. For example, readable text is more likely to be the plaintext than apparently random junk. A neural network would need to be good at guessing if it got the right answer according to what the user would expect the contents to be, which could never be 100% correct.
However, neural networks can in theory learn any function. So if you have enough cyphertext/plaintext pairs for a particular encryption key, then a sufficiently complex neural network can learn to be exactly the decryption algorithm for that particular key.
Also regarding the continuous vs discrete problem, this is basically solved. The outputs have something like the sigmoid function so you just have to pick a threshold for 1 vs 0. .5 could work. With enough training you could in theory get the correct answer for 1 vs 0 100% of the time.
The above assumes that you have one network big enough to process the entire file at once. For arbitrarily sized ciphertext, you would probably need to do blocks at a time with an RNN, but I don't know if that still has the same "compute any function" properties as for a traditional network.
None of this is to say that such a solution is practically doable.