压缩/加密算法输出保证
我的问题总体上是关于压缩/加密算法的,对我来说听起来完全是菜鸟问题。现在,我理解“一般来说”“这一切都取决于”,但假设我们正在谈论的算法都具有参考实现/发布的规范,并且总体上非常标准。更具体地说,我正在使用 AES-256 和 GZip/Deflate 的 .NET 实现,
所以这里是。是否可以假设,给定完全相同相同的输入,两种类型的算法将产生完全相同相同的输出。
例如,.NET 上的 aes(gzip("hello"), key, initVector))
输出与 Mac 或 Linux 上的输出相同吗?
My question here regards compression/encryption algorithms in general and to me sounds like a complete noobie one. Now, I understand that "in general" "it all depends", but suppose we're talking algorithms that all have reference implementation/published specs and are overall ever so standard. To be more specific, I'm using .NET implementations of AES-256 and GZip/Deflate
So here goes. Can it be assumed that, given exactly the same input, both types of algorithms will produce exactly the same output.
For example, will output of aes(gzip("hello"), key, initVector))
on .NET be identical to that of on a Mac or Linux?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
AES 是严格定义的,因此给定相同的输入、相同的算法和相同的密钥,您将得到相同的输出。
对于 zip 来说就不能这么说了。
问题不在于标准。有一个定义的标准:Deflate 流是 IETF RFC 1950,gzip 流是 IETF RFC 1952,因此任何人都可以从这些定义开始生成兼容的 zip 压缩器/解码器。
但 zip 属于 LZ 压缩机大家族,从构造上看,它既不是双射也不是单射。这意味着,从单一来源,有很多方法可以描述相同的输入,尽管不同,但都是有效的。
一个例子。
比方说,我的输入是: ABCABCABC
有效输出可以是:
9 个文字
3 个文字,后跟一个从偏移量 -3 开始的 6 字节长副本
3 个文字,后跟两个从偏移量 -3 开始的 3 个字节长的副本
6 个文字,后跟从偏移量 -6 开始的 3 个字节长的副本
等开始。
所有这些输出都是有效的并且描述(重新生成)相同的输入。显然,其中之一比其他方法更有效(压缩更多)。但这就是实施可能有所不同的地方。有些人会比其他人更强大。例如,已知 kzip 和 7zip 生成比 gzip 更好(压缩程度更高)的 zip 文件。即使 gzip 也有很多压缩选项,可以从相同的输入开始生成不同的压缩流。
现在,如果您想不断获得完全相同的二进制输出,您需要的不仅仅是“zip”:您需要强制执行精确 zip 实现和精确的压缩参数。然后,您将确保生成始终相同的二进制文件。
AES is rigourosly defined, so given same input, same algorithm, and same key, you will get the same output.
It cannot be said the same for zip.
The problem is not the standard. There IS a defined standard : Deflate stream is IETF RFC 1950, gzip stream is IETF RFC 1952, so anyone can produce a compatible zip compressor/decoder starting from these definitions.
But zip belong to the large family of LZ compressors, which, by construction, are neither bijective nor injective. Which means, from a single source, there are many many ways to describe the same input which are all valid although different.
An example.
Let's say, my input is : ABCABCABC
Valid outputs can be :
9 literals
3 literals followed by one copy of 6 bytes long starting at offset -3
3 literals followed by two copies of 3 bytes long each starting at offset -3
6 literals followed by one copy of 3 bytes long starting at offset -6
etc.
All these outputs are valid and describe (regenerate) the same input. Obviously, one of them is more efficient (compress more) than the others. But that's where implementation may differ. Some will be more powerful than others. For example, it is known that kzip and 7zip generate better (more compressed) zip files than gzip. Even gzip has a lot of compression options generating different compressed streams starting from a same input.
Now, if you want to constantly get exactly the same binary output, you need more than "zip" : you need to enforce a precise zip implementation, and a precise compression parameter. Then, you'll be sure that you generate always the same binary.
AES 是根据标准定义的,因此任何符合标准的实现确实会产生相同的输出。 GZip 是一个程序,因此该程序的不同版本可能会产生不同的输出。我希望更高版本能够重新扩充早期版本的输出,但反之则可能不可能。
正如其他人所说,如果要压缩,请压缩明文,而不是 AES 的密文。密文无法很好地压缩,因为它被设计为随机出现。
AES is defined to a standard, so any conforming implementation will indeed produce the same output. GZip is a program, so it is possible that different versions of the program will produce different outputs. I would expect a later version to be able to reinflate the output from an earlier version, but the reverse may not be possible.
As others have said, if you are going to compress, then compress the plaintext, not the cyphertext from AES. Cyphertext won't compress well as it is designed to appear random.