Python,字典的校验和
我正在考虑创建一个字典的校验和来知道它是否被修改 目前我的想法是:
>>> import hashlib
>>> import pickle
>>> d = {'k': 'v', 'k2': 'v2'}
>>> z = pickle.dumps(d)
>>> hashlib.md5(z).hexdigest()
'8521955ed8c63c554744058c9888dc30'
也许存在更好的解决方案?
注意:我想创建一个字典的唯一 id 来创建一个好的 Etag。
编辑: 我可以在字典中拥有抽象数据。
I'm thinking to create a checksum of a dict to know if it was modified or not
For the moment i have that:
>>> import hashlib
>>> import pickle
>>> d = {'k': 'v', 'k2': 'v2'}
>>> z = pickle.dumps(d)
>>> hashlib.md5(z).hexdigest()
'8521955ed8c63c554744058c9888dc30'
Perhaps a better solution exists?
Note: I want to create an unique id of a dict to create a good Etag.
EDIT: I can have abstract data in the dict.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
像这样的事情:
获取字典中每个(键,值)元组的哈希值并将它们全部异或。
@katrielalex
如果字典包含不可散列的项目,你可以这样做:
或者甚至更好
Something like this:
Take the hash of each (key, value) tuple in the dict and XOR them alltogether.
@katrielalex
If the dict contains unhashable items you could do this:
or maybe even better
在Python 3中,哈希函数是用随机数初始化的,每个Python会话的随机数都是不同的。如果这对于预期的应用程序来说是不可接受的,请使用例如 zlib.adler32 来构建字典的校验和:
In Python 3, the hash function is initialized with a random number, which is different for each python session. If that is not acceptable for the intended application, use e.g. zlib.adler32 to build the checksum for a dict:
我会推荐一种与您建议的方法非常相似的方法,但有一些额外的保证:
sort_keys=True
:如果键的顺序发生变化,则保持相同的哈希ensure_ascii=True
:如果您有一些非 ASCII 字符,为了确保表示形式不会改变,我们将其用于我们的 ETag。
I would recommend an approach very similar to the one your propose, but with some extra guarantees:
sort_keys=True
: keep the same hash if the order of your keys changesensure_ascii=True
: in case you have some non-ascii characters, to make sure the representation does not changeWe use this for our ETag.
我不知道
pickle
是否能保证每次都以相同的方式序列化哈希值。如果您只有字典,我会选择对
keys()
、sorted()
的调用组合,根据排序的键/值对构建一个字符串并计算的校验和I don't know whether
pickle
guarantees you that the hash is serialized the same way every time.If you only have dictionaries, I would go for o combination of calls to
keys()
,sorted()
, build a string based on the sorted key/value pairs and compute the checksum on that我认为您可能没有意识到其中的一些微妙之处。第一个问题是项目在字典中出现的顺序不是由实现定义的。这意味着简单地要求字典的
str
是行不通的,因为你可能有,并且这些将散列为不同的值。如果字典中只有可散列的项目,则可以对它们进行散列,然后将它们的散列连接起来,如 @Bart 确实或只是
注意
排序
,因为您必须确保散列元组以相同的顺序出现,无论哪个排序项目出现在字典中的顺序。如果你的字典中有字典,你可以递归这个,但它会很复杂。但是,如果您在字典中允许任意数据,那么很容易破坏这样的任何实现,因为您可以简单地编写一个具有损坏的
__hash__
实现的对象并使用它。并且您不能使用id
,因为这样您可能会拥有比较不同的相同项目。这个故事的寓意是,Python 不支持散列字典是有原因的。
I think you may not realise some of the subtleties that go into this. The first problem is that the order that items appear in a dict is not defined by the implementation. This means that simply asking for
str
of a dict doesn't work, because you could haveand these will hash to different values. If you have only hashable items in the dict, you can hash them and then join up their hashes, as @Bart does or simply
Note the
sorted
, because you have to ensure that the hashed tuple comes out in the same order irrespective of which order the items appear in the dict. If you have dicts in the dict, you could recurse this, but it will be complicated.BUT it would be easy to break any implementation like this if you allow arbitrary data in the dictionary, since you can simply write an object with a broken
__hash__
implementation and use that. And you can't useid
, because then you might have equal items which compare different.The moral of the story is that hashing dicts isn't supported in Python for a reason.
正如你所说,你想根据字典内容生成一个Etag, OrderedDict保留字典顺序的 在这里可能是更好的候选者。只需迭代键、值对并构建 Etag 字符串即可。
As you said, you wanted to generate an Etag based on the dictionary content, OrderedDict which preserves the order of the dictionary may be better candidate here. Just iterator through the key,value pairs and construct your Etag string.