如何使用 MD5 哈希(或其他二进制数据)作为密钥名称?

发布于 2024-10-08 21:43:06 字数 290 浏览 4 评论 0原文

我一直在尝试使用 MD5 哈希作为 AppEngine 上的键名,但我编写的代码引发了 UnicodeDecodeError

from google.appengine.ext import db
import hashlib
key = db.Key.from_path('Post', hashlib.md5('thecakeisalie').digest())

我不想使用 hexdigest() 因为这不仅是一个拼凑,但也是一个较差的(base64 会做得更好)。

I've been trying to use a MD5 hash as a key name on AppEngine, but the code I wrote raises a UnicodeDecodeError

from google.appengine.ext import db
import hashlib
key = db.Key.from_path('Post', hashlib.md5('thecakeisalie').digest())

I don't want to use hexdigest() as that is not only a kludge, but an inferior one too (base64 would do a better job).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

锦欢 2024-10-15 21:43:06

App Engine Python 文档表示:

key_name 以 Unicode 形式存储
字符串(str 值转换为
ASCII 文本)。

密钥必须是 unicode 可编码字符串。您需要将digest()调用更改为hexdigest(),即:

k = hashlib.md5('thecakeisalie').hexdigest()

The App Engine Python docs says:

A key_name is stored as a Unicode
string (with str values converted as
ASCII text).

The key has to be an unicode-encodeable-string. You need to change the digest() call to hexdigest(), ie:

k = hashlib.md5('thecakeisalie').hexdigest()
一身骄傲 2024-10-15 21:43:06

使用 iso-8859-1 解码字节串

>>> hashlib.md5('thecakeisalie').digest().decode("iso-8859-1")
u"'\xfc\xce\x84h\xa9\x1e\x8a\x12;\xa5\xb1K\xea\xef\xd6"

这基本上是一个“NOP”转换。它创建一个与初始字符串长度相同的 unicode 对象,如果您愿意,可以通过 .encode("iso-8859-1") 将其转换回字符串

decode the bytestring with iso-8859-1

>>> hashlib.md5('thecakeisalie').digest().decode("iso-8859-1")
u"'\xfc\xce\x84h\xa9\x1e\x8a\x12;\xa5\xb1K\xea\xef\xd6"

This is basically a "NOP" conversion. It creates a unicode object that is the same length as the initial string and can be converted back to a string just by .encode("iso-8859-1") if you wish

听风吹 2024-10-15 21:43:06

让我们考虑一下数据大小。这里的最佳解决方案是 16 字节:

>>> hashlib.md5('thecakeisalie').digest() 
"'\xfc\xce\x84h\xa9\x1e\x8a\x12;\xa5\xb1K\xea\xef\xd6"

>>> len(hashlib.md5('thecakeisalie').digest())
16

您首先想到的是 hexdigest,但它与 16 字节不太接近:

>>> hashlib.md5('thecakeisalie').hexdigest() 
'27fcce8468a91e8a123ba5b14beaefd6'

>>> len(hashlib.md5('thecakeisalie').hexdigest())
32

但这不会给您 ascii 编码的字节,因此我们必须做其他事情。简单的做法是使用 python 表示:

>>> repr(hashlib.md5('thecakeisalie').digest())
'"\'\\xfc\\xce\\x84h\\xa9\\x1e\\x8a\\x12;\\xa5\\xb1K\\xea\\xef\\xd6"'

>>> len(repr(hashlib.md5('thecakeisalie').digest()))
54

我们可以通过删除“\x”转义符和周围的引号来摆脱一堆:

>>> repr(hashlib.md5('thecakeisalie').digest())[1:-1].replace('\\x','')
"'fcce84ha91e8a12;a5b1Keaefd6"

>>> len(repr(hashlib.md5('thecakeisalie').digest())[1:-1].replace('\\x',''))
28

这非常好,但 base64 做得更好一点:

>>> base64.b64encode(hashlib.md5('thecakeisalie').digest())
J/zOhGipHooSO6WxS+rv1g==
>>> len(base64.b64encode(hashlib.md5('thecakeisalie').digest()))
24

总体而言,base64 是大部分空间-高效,但我只选择 hexdigest,因为它可能是最优化的(时间效率)。


Gnibbler 的答案给出的长度是 16!

>>> hashlib.md5('thecakeisalie').digest().decode("iso-8859-1")
u"'\xfc\xce\x84h\xa9\x1e\x8a\x12;\xa5\xb1K\xea\xef\xd6"
>>> len(hashlib.md5('thecakeisalie').digest().decode("iso-8859-1"))
16

Let's think about data sizes. The optimal solution here is 16 bytes:

>>> hashlib.md5('thecakeisalie').digest() 
"'\xfc\xce\x84h\xa9\x1e\x8a\x12;\xa5\xb1K\xea\xef\xd6"

>>> len(hashlib.md5('thecakeisalie').digest())
16

The first thing you thought of was hexdigest, but it's not very close to 16 bytes:

>>> hashlib.md5('thecakeisalie').hexdigest() 
'27fcce8468a91e8a123ba5b14beaefd6'

>>> len(hashlib.md5('thecakeisalie').hexdigest())
32

But this won't give you ascii-encodable bytes, so we have to do something else. The simple thing to do is use the python representation:

>>> repr(hashlib.md5('thecakeisalie').digest())
'"\'\\xfc\\xce\\x84h\\xa9\\x1e\\x8a\\x12;\\xa5\\xb1K\\xea\\xef\\xd6"'

>>> len(repr(hashlib.md5('thecakeisalie').digest()))
54

We can get rid of a bunch of that by removing the "\x" escapes and the surrounding quotes:

>>> repr(hashlib.md5('thecakeisalie').digest())[1:-1].replace('\\x','')
"'fcce84ha91e8a12;a5b1Keaefd6"

>>> len(repr(hashlib.md5('thecakeisalie').digest())[1:-1].replace('\\x',''))
28

That's pretty good, but base64 does a little better:

>>> base64.b64encode(hashlib.md5('thecakeisalie').digest())
J/zOhGipHooSO6WxS+rv1g==
>>> len(base64.b64encode(hashlib.md5('thecakeisalie').digest()))
24

Overall, base64 is most space-efficient, but I'd just go with hexdigest as it's likely to be most optimized (time-efficient).


Gnibbler's answer gives a length of 16!

>>> hashlib.md5('thecakeisalie').digest().decode("iso-8859-1")
u"'\xfc\xce\x84h\xa9\x1e\x8a\x12;\xa5\xb1K\xea\xef\xd6"
>>> len(hashlib.md5('thecakeisalie').digest().decode("iso-8859-1"))
16
倾城月光淡如水﹏ 2024-10-15 21:43:06

我发现使用二进制数据的 base64 编码是一个合理的解决方案。根据您的代码,您可以执行以下操作:

import hashlib
import base64
print base64.b64encode(hashlib.md5('thecakeisalie').digest())

I find using a base64 encoding of the binary data a reasonable solution. Based on your code you could do something like:

import hashlib
import base64
print base64.b64encode(hashlib.md5('thecakeisalie').digest())
榕城若虚 2024-10-15 21:43:06

App Engine 中的实体键可以有 ID(4 字节整数)或名称(500 字节 UTF-8 编码字符串)。

MD5 摘要是 16 字节的二进制数据:对于整数来说太大,(可能是)无效的 UTF-8。必须使用某种形式的编码。

如果 hexdigest() 在 32 字节处过于冗长,则尝试在 24 字节处使用 base64。

无论您使用哪种编码方案,数据存储最终都会将其转换为 UTF-8,因此,以下内容乍一看像是最佳编码...

>>> u = hashlib.md5('thecakeisalie').digest().decode("iso-8859-1")
>>> len(u)
16

...当编码为最终表示形式时,它比 Base64 编码长两个字节:

>>> s = u.encode('utf-8')
>>> len(s)
26

An entity key in App Engine can have either an ID (a 4 byte integer), or a name (500 byte UTF-8 encoded string).

An MD5 digest is 16 bytes of binary data: too large for an integer, (likely to be) invalid UTF-8. Some form of encoding must be used.

If hexdigest() is too verbose at 32 bytes then try base64 at 24 bytes.

Whatever encoding scheme you use will ultimately be converted to UTF-8 by the datastore, so the following, which at first looks like an optimal encoding...

>>> u = hashlib.md5('thecakeisalie').digest().decode("iso-8859-1")
>>> len(u)
16

...when encoded into it's final representation is two bytes longer than the base64 encoding:

>>> s = u.encode('utf-8')
>>> len(s)
26
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文