如何使用 MD5 哈希(或其他二进制数据)作为密钥名称?
我一直在尝试使用 MD5 哈希作为 AppEngine 上的键名,但我编写的代码引发了 UnicodeDecodeError
from google.appengine.ext import db
import hashlib
key = db.Key.from_path('Post', hashlib.md5('thecakeisalie').digest())
我不想使用 hexdigest()
因为这不仅是一个拼凑,但也是一个较差的(base64 会做得更好)。
I've been trying to use a MD5 hash as a key name on AppEngine, but the code I wrote raises a UnicodeDecodeError
from google.appengine.ext import db
import hashlib
key = db.Key.from_path('Post', hashlib.md5('thecakeisalie').digest())
I don't want to use hexdigest()
as that is not only a kludge, but an inferior one too (base64 would do a better job).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
App Engine Python 文档表示:
密钥必须是 unicode 可编码字符串。您需要将digest()调用更改为hexdigest(),即:
The App Engine Python docs says:
The key has to be an unicode-encodeable-string. You need to change the digest() call to hexdigest(), ie:
使用 iso-8859-1 解码字节串
这基本上是一个“NOP”转换。它创建一个与初始字符串长度相同的 unicode 对象,如果您愿意,可以通过
.encode("iso-8859-1")
将其转换回字符串decode the bytestring with iso-8859-1
This is basically a "NOP" conversion. It creates a unicode object that is the same length as the initial string and can be converted back to a string just by
.encode("iso-8859-1")
if you wish让我们考虑一下数据大小。这里的最佳解决方案是 16 字节:
您首先想到的是 hexdigest,但它与 16 字节不太接近:
但这不会给您 ascii 编码的字节,因此我们必须做其他事情。简单的做法是使用 python 表示:
我们可以通过删除“\x”转义符和周围的引号来摆脱一堆:
这非常好,但 base64 做得更好一点:
总体而言,base64 是大部分空间-高效,但我只选择 hexdigest,因为它可能是最优化的(时间效率)。
Gnibbler 的答案给出的长度是 16!
Let's think about data sizes. The optimal solution here is 16 bytes:
The first thing you thought of was hexdigest, but it's not very close to 16 bytes:
But this won't give you ascii-encodable bytes, so we have to do something else. The simple thing to do is use the python representation:
We can get rid of a bunch of that by removing the "\x" escapes and the surrounding quotes:
That's pretty good, but base64 does a little better:
Overall, base64 is most space-efficient, but I'd just go with hexdigest as it's likely to be most optimized (time-efficient).
Gnibbler's answer gives a length of 16!
我发现使用二进制数据的 base64 编码是一个合理的解决方案。根据您的代码,您可以执行以下操作:
I find using a base64 encoding of the binary data a reasonable solution. Based on your code you could do something like:
App Engine 中的实体键可以有 ID(4 字节整数)或名称(500 字节 UTF-8 编码字符串)。
MD5 摘要是 16 字节的二进制数据:对于整数来说太大,(可能是)无效的 UTF-8。必须使用某种形式的编码。
如果 hexdigest() 在 32 字节处过于冗长,则尝试在 24 字节处使用 base64。
无论您使用哪种编码方案,数据存储最终都会将其转换为 UTF-8,因此,以下内容乍一看像是最佳编码...
...当编码为最终表示形式时,它比 Base64 编码长两个字节:
An entity key in App Engine can have either an ID (a 4 byte integer), or a name (500 byte UTF-8 encoded string).
An MD5 digest is 16 bytes of binary data: too large for an integer, (likely to be) invalid UTF-8. Some form of encoding must be used.
If hexdigest() is too verbose at 32 bytes then try base64 at 24 bytes.
Whatever encoding scheme you use will ultimately be converted to UTF-8 by the datastore, so the following, which at first looks like an optimal encoding...
...when encoded into it's final representation is two bytes longer than the base64 encoding: