我想创建一个 hashlib 实例,对其进行 update() ,然后以某种方式保留其状态。稍后,我想使用此状态数据重新创建对象,并继续 update()
它。最后,我想获取总累积数据的 hexdigest()
。状态持久性必须在多次运行中保持不变。
示例:
import hashlib
m = hashlib.sha1()
m.update('one')
m.update('two')
# somehow, persist the state of m here
#later, possibly in another process
# recreate m from the persisted state
m.update('three')
m.update('four')
print m.hexdigest()
# at this point, m.hexdigest() should be equal to hashlib.sha1().update('onetwothreefour').hextdigest()
编辑:
2010 年,我没有找到使用 python 执行此操作的好方法,最终用 C 编写了一个小型帮助程序应用程序来完成此操作。然而,下面有一些我当时无法获得或不知道的很好的答案。
I'd like to create a hashlib
instance, update()
it, then persist its state in some way. Later, I'd like to recreate the object using this state data, and continue to update()
it. Finally, I'd like to get the hexdigest()
of the total cumulative run of data. State persistence has to survive across multiple runs.
Example:
import hashlib
m = hashlib.sha1()
m.update('one')
m.update('two')
# somehow, persist the state of m here
#later, possibly in another process
# recreate m from the persisted state
m.update('three')
m.update('four')
print m.hexdigest()
# at this point, m.hexdigest() should be equal to hashlib.sha1().update('onetwothreefour').hextdigest()
EDIT:
I did not find a good way to do this with python in 2010 and ended up writing a small helper app in C to accomplish this. However, there are some great answers below that were not available or known to me at the time.
发布评论
评论(5)
您可以使用
ctypes
这样做,不需要C中的辅助应用程序:-rehash.py
resumable_SHA-256.py
演示
输出
注意:我要感谢 PM2Ring 提供的精彩代码。
You can do it this way using
ctypes
, no helper app in C is needed:-rehash.py
resumable_SHA-256.py
demo
output
Note: I would like to thank PM2Ring for his wonderful code.
hashlib.sha1 是 C 库的包装器,因此您无法对其进行 pickle。
它需要实现 Python 的
__getstate__
和__setstate__
方法来访问其内部状态您可以使用 sha1 的纯 Python 实现(如果它足够快以满足您的要求)
hashlib.sha1 is a wrapper around a C library so you won't be able to pickle it.
It would need to implement the
__getstate__
and__setstate__
methods for Python to access its internal stateYou could use a pure Python implementation of sha1 if it is fast enough for your requirements
我也面临这个问题,并且没有找到现有的解决方案,所以我最终编写了一个库,它的功能与 Devesh Saini 描述的非常相似: https://github.com/kislyuk/rehash。例子:
I was facing this problem too, and found no existing solution, so I ended up writing a library that does something very similar to what Devesh Saini described: https://github.com/kislyuk/rehash. Example:
动态增长/流数据的哈希算法?
Hash algorithm for dynamic growing/streaming data?
您可以轻松地围绕哈希对象构建一个包装器对象,该包装器对象可以透明地保留数据。
明显的缺点是它需要完整保留散列数据才能恢复状态 - 因此根据您正在处理的数据大小,这可能不适合您的需求。但它应该可以在几十 MB 的情况下正常工作。
不幸的是,hashlib 没有将哈希算法公开为正确的类,而是提供了构造哈希对象的工厂函数 - 因此我们无法在不加载保留符号的情况下正确地对它们进行子类化 - 我宁愿避免这种情况。这仅意味着您必须从一开始就构建您的包装类,无论如何,这并不是 Python 的开销。
这里有一个示例代码,甚至可能满足您的需求:
您可以访问“data”成员本身来直接获取和设置状态,或者您可以使用 python pickling 函数:
You can easily build a wrapper object around the hash object which can transparently persist the data.
The obvious drawback is that it needs to retain the hashed data in full in order to restore the state - so depending on the data size you are dealing with, this may not suit your needs. But it should work fine up to some tens of MB.
Unfortunattely the hashlib does not expose the hash algorithms as proper classes, it rathers gives factory functions that construct the hash objects - so we can't properly subclass those without loading reserved symbols - a situation I'd rather avoid. That only means you have to built your wrapper class from the start, which is not such that an overhead from Python anyway.
here is a sample code that might even fill your needs:
You can access the "data" member itself to get and set the state straight, or you can use python pickling functions: