动态增长/流数据的哈希算法?
是否有任何算法可以从已知的哈希摘要中继续进行哈希处理?例如,客户端上传一个文件块到ServerA,我可以得到上传内容的md5
和,然后客户端将文件块的其余部分上传到ServerB,我可以传输状态md5
内部到 ServerB 并完成哈希?
有一个很酷的黑魔法黑客,基于md5 我几年前在 comp.lang.python 中找到,但它使用 ctypes
来表示 md5.so
或 _md5.dll
的特定版本,因此对于不同的 python 解释器版本或其他编程语言来说,它不是完全可移植的代码。此外,从 2.5 开始,md5
模块在 python 标准库中已被弃用,所以我需要找到一个更通用的解决方案。
更重要的是,哈希的状态可以存储在十六进制摘要本身中吗? (因此我可以继续使用现有的哈希摘要对数据流进行哈希处理,而不是肮脏的内部黑客攻击。)
Are there any algorithms that you can continue hashing from a known hash digest? For example, the client upload a chunk of file to ServerA, I can get a md5
sum of the uploaded content, then the client upload the rest of the file chunk to ServerB, can I transfer the state of md5
internals to ServerB and finish the hashing?
There was a cool black magic hack based on md5 I found years ago at comp.lang.python, but it's using ctypes
for a specific version of md5.so
or _md5.dll
, so it's not quite portable code for different python interpreter versions or other programming languages. Besides, the md5
module is deprecated in python standard library since 2.5 so I need to find a more general solution.
What's more, can the state of the hashing be stored in the hex digest itself? (So I can continue hashing a stream of data with an existing hash digest, not a dirty internal hack.)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
不是来自已知的摘要,而是来自已知的状态。您可以使用纯 python MD5 实现并保存其状态。 是使用 PyPy 中的 _md5.py 的示例:
以下 e.dan 指出,您还可以使用几乎任何校验算法(CRC、Adler、Fletcher),但它们不能很好地保护您免受有意的数据修改,只能防止随机错误。
编辑:当然,您还可以使用您引用的线程中的 ctypes 以更可移植的方式重新实现序列化方法(无需魔术常量)。我相信这应该是版本/架构独立的(在 python 2.4-2.7、i386 和 x86_64 上测试):
它不兼容 Python 3,因为它没有 _md5/md5 模块。
不幸的是,hashlib 的 openssl_md5 实现不适合此类黑客攻击,因为 OpenSSL EVP API 不提供任何调用/方法来可靠地序列化 EVP_MD_CTX 对象。
Not from the known digest, but from the known state. You can use a pure python MD5 implementation and save its state. Here is an example using _md5.py from from PyPy:
As e.dan noted, you can also use almost any checksuming algorithm (CRC, Adler, Fletcher), but they do not protect you well from the intentional data modification, only from the random errors.
EDIT: of course, you can also re-implement the serialization method using ctypes from the thread you referenced in a more portable way (without magic constants). I believe this should be version/architecture independent (tested on python 2.4-2.7, both i386 and x86_64):
It is not Python 3 compatible, since it does not have an _md5/md5 module.
Unfortunately hashlib's openssl_md5 implementation is not suitable for such hacks, since OpenSSL EVP API does not provide any calls/methods to reliably serialize EVP_MD_CTX objects.
这在理论上是可能的(md5 到目前为止应该包含您需要继续的所有状态),但看起来普通的 API 无法提供您需要的内容。如果您可以使用 CRC 来代替,这可能会容易得多,因为它们更常用于您需要的“流”情况。请参阅此处:
binascii.crc32(data[, crc])
crc32()
接受可选的crc
输入,这是要继续的校验和。希望有帮助。
This is theoretically possible (the md5 so far should contain all the state you need to continue) but it looks like the normal APIs don't provide what you need. If you can suffice with a CRC instead, this will probably be a lot easier, since those are more commonly used for the "streaming" cases like you need. See here:
binascii.crc32(data[, crc])
crc32()
accepts an optionalcrc
input which is the checksum to continue from.Hope that helps.
我也面临这个问题,并且没有找到现有的解决方案,所以我编写了一个库,使用 ctypes 来解构保存哈希器状态的 OpenSSL 数据结构: https://github.com/kislyuk/rehash。例子:
I was facing this problem too, and found no existing solution, so I wrote a library that uses ctypes to deconstruct the OpenSSL data structure holding the hasher state: https://github.com/kislyuk/rehash. Example:
嗨,对于那些迟到的人来说,就像我对 python3 所做的那样(在我的例子中是 3.11)
hashlib 有更新功能。
9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08
37268335dd6931045bdcdf92623ff819a64244b53d0e746d438797349d4da578
37268335dd6931045bdcdf92623ff819a64244b53d0e746d438797349d4da578
hi for those coming here late, as i did with python3 (in my case 3.11)
the hashlib has an update function.
9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08
37268335dd6931045bdcdf92623ff819a64244b53d0e746d438797349d4da578
37268335dd6931045bdcdf92623ff819a64244b53d0e746d438797349d4da578