Python，字典的校验和

发布于 2024-11-27 22:06:01 字数 385 浏览 0 评论 0原文

我正在考虑创建一个字典的校验和来知道它是否被修改目前我的想法是：

>>> import hashlib
>>> import pickle
>>> d = {'k': 'v', 'k2': 'v2'}
>>> z = pickle.dumps(d)
>>> hashlib.md5(z).hexdigest()
'8521955ed8c63c554744058c9888dc30'

也许存在更好的解决方案？

注意：我想创建一个字典的唯一 id 来创建一个好的 Etag。

编辑： 我可以在字典中拥有抽象数据。

原文

I'm thinking to create a checksum of a dict to know if it was modified or not
For the moment i have that:

>>> import hashlib
>>> import pickle
>>> d = {'k': 'v', 'k2': 'v2'}
>>> z = pickle.dumps(d)
>>> hashlib.md5(z).hexdigest()
'8521955ed8c63c554744058c9888dc30'

Perhaps a better solution exists?

Note: I want to create an unique id of a dict to create a good Etag.

EDIT: I can have abstract data in the dict.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦情居士 2024-12-04 22:06:01

像这样的事情：

reduce(lambda x,y : x^y, [hash(item) for item in d.items()])

获取字典中每个（键，值）元组的哈希值并将它们全部异或。

@katrielalex
如果字典包含不可散列的项目，你可以这样做：

hash(str(d))

或者甚至更好

hash(repr(d))

Something like this:

reduce(lambda x,y : x^y, [hash(item) for item in d.items()])

Take the hash of each (key, value) tuple in the dict and XOR them alltogether.

@katrielalex
If the dict contains unhashable items you could do this:

hash(str(d))

or maybe even better

hash(repr(d))

回复收藏 0 原文

撩起发的微风 2024-12-04 22:06:01

在Python 3中，哈希函数是用随机数初始化的，每个Python会话的随机数都是不同的。如果这对于预期的应用程序来说是不可接受的，请使用例如 zlib.adler32 来构建字典的校验和：

import zlib

d={'key1':'value1','key2':'value2'}
checksum=0
for item in d.items():
    c1 = 1
    for t in item:
        c1 = zlib.adler32(bytes(repr(t),'utf-8'), c1)
    checksum=checksum ^ c1

print(checksum)

In Python 3, the hash function is initialized with a random number, which is different for each python session. If that is not acceptable for the intended application, use e.g. zlib.adler32 to build the checksum for a dict:

import zlib

d={'key1':'value1','key2':'value2'}
checksum=0
for item in d.items():
    c1 = 1
    for t in item:
        c1 = zlib.adler32(bytes(repr(t),'utf-8'), c1)
    checksum=checksum ^ c1

print(checksum)

回复收藏 0 原文

初雪 2024-12-04 22:06:01

我会推荐一种与您建议的方法非常相似的方法，但有一些额外的保证：

import hashlib, json
hashlib.md5(json.dumps(d, sort_keys=True, ensure_ascii=True).encode('utf-8')).hexdigest()

sort_keys=True：如果键的顺序发生变化，则保持相同的哈希
ensure_ascii=True：如果您有一些非 ASCII 字符，为了确保表示形式不会改变，

我们将其用于我们的 ETag。

I would recommend an approach very similar to the one your propose, but with some extra guarantees:

import hashlib, json
hashlib.md5(json.dumps(d, sort_keys=True, ensure_ascii=True).encode('utf-8')).hexdigest()

sort_keys=True: keep the same hash if the order of your keys changes
ensure_ascii=True: in case you have some non-ascii characters, to make sure the representation does not change

We use this for our ETag.

回复收藏 0 原文

花落人断肠 2024-12-04 22:06:01

我不知道 pickle 是否能保证每次都以相同的方式序列化哈希值。

如果您只有字典，我会选择对 keys()、sorted() 的调用组合，根据排序的键/值对构建一个字符串并计算的校验和

回复收藏 0 原文

俏︾媚 2024-12-04 22:06:01

我认为您可能没有意识到其中的一些微妙之处。第一个问题是项目在字典中出现的顺序不是由实现定义的。这意味着简单地要求字典的 str 是行不通的，因为你可能有

str(d1) == "{'a':1, 'b':2}"
str(d2) == "{'b':2, 'a':1}"

，并且这些将散列为不同的值。如果字典中只有可散列的项目，则可以对它们进行散列，然后将它们的散列连接起来，如 @Bart 确实或只是

hash(tuple(sorted(hash(x) for x in d.items())))

注意排序，因为您必须确保散列元组以相同的顺序出现，无论哪个排序项目出现在字典中的顺序。如果你的字典中有字典，你可以递归这个，但它会很复杂。

但是，如果您在字典中允许任意数据，那么很容易破坏这样的任何实现，因为您可以简单地编写一个具有损坏的__hash__实现的对象并使用它。并且您不能使用id，因为这样您可能会拥有比较不同的相同项目。

这个故事的寓意是，Python 不支持散列字典是有原因的。

I think you may not realise some of the subtleties that go into this. The first problem is that the order that items appear in a dict is not defined by the implementation. This means that simply asking for str of a dict doesn't work, because you could have

str(d1) == "{'a':1, 'b':2}"
str(d2) == "{'b':2, 'a':1}"

and these will hash to different values. If you have only hashable items in the dict, you can hash them and then join up their hashes, as @Bart does or simply

hash(tuple(sorted(hash(x) for x in d.items())))

Note the sorted, because you have to ensure that the hashed tuple comes out in the same order irrespective of which order the items appear in the dict. If you have dicts in the dict, you could recurse this, but it will be complicated.

BUT it would be easy to break any implementation like this if you allow arbitrary data in the dictionary, since you can simply write an object with a broken __hash__ implementation and use that. And you can't use id, because then you might have equal items which compare different.

The moral of the story is that hashing dicts isn't supported in Python for a reason.

回复收藏 0 原文