Python:用字符串连接字节
我正在开发一个 2.6 版本的 python 项目,该项目将来也支持正在使用的 python 3。具体来说,我正在研究 摘要-md5 算法。
在 python 2.6 中,无需运行此导入:
from __future__ import unicode_literals
我能够编写如下代码:
a1 = hashlib.md5("%s:%s:%s" % (self.username, self.domain, self.password)).digest()
a1 = "%s:%s:%s" %(a1, challenge["nonce"], cnonce )
没有任何问题,我的身份验证工作正常。当我尝试使用导入的 unicode_literals 执行同一行代码时,出现异常:
UnicodeDecodeError: 'utf8' codec can't Decode byte 0xa8 inposition 0: Unexpected code byte
现在我对 python 比较陌生,所以我有点一直在想办法解决这个问题。如果我将格式化字符串中的 %s 替换为 %r 我可以连接该字符串,但身份验证不起作用。我读过的digest-md5规范说16个八位字节的二进制摘要必须附加到这些其他字符串中。
有什么想法吗?
I'm working on a python project in 2.6 that also has future support for python 3 being worked in. Specifically I'm working on a digest-md5 algorithm.
In python 2.6 without running this import:
from __future__ import unicode_literals
I am able to write a piece of code such as this:
a1 = hashlib.md5("%s:%s:%s" % (self.username, self.domain, self.password)).digest()
a1 = "%s:%s:%s" %(a1, challenge["nonce"], cnonce )
Without any issues, my authentication works fine. When I try the same line of code with the unicode_literals imported I get an exception:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa8 in position 0: unexpected code byte
Now I'm relatively new to python so I'm a bit stuck in figuring this out. if I replace the %s in the formatting string as %r I am able to concatenate the string, but the authentication doesn't work. The digest-md5 spec that I had read says that the 16 octet binary digest must be appended to these other strings.
Any thoughts?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您观察到的行为的原因是
from __future__ import unicode_literals
切换了 Python 处理字符串的方式:unicode_literals
未来中,不带 u 前缀的字符串是以 UCS-2 或 UCS-4 编码的 unicode 字符串(取决于编译 Python 时使用的编译器标志)。带有 b 前缀的字符串是数据类型bytes
的文字,与 3.x 之前的非 unicode 字符串非常相似。在任一版本的 Python 中,都必须转换字节字符串和 unicode 字符串。默认执行的转换取决于您系统的默认字符集;在你的情况下,这是UTF-8。如果不设置任何内容,它应该是 ascii,它拒绝 \x7f 以上的所有字符。
hashlib.md5(...).digest() 返回的消息摘要是一个字节字符串,我想您希望整个操作的结果也是一个字节字符串。如果需要,请将 nonce 和 cnonce-strings 转换为 byte-strings。:
或者,您可以将来自对
digest()
的调用的字节串转换为 unicode 字符串(不推荐) 。由于 UCS-2 的低 8 位相当于 ISO-8859-1,因此这可能满足您的需求:The reason for the behaviour you observed is that
from __future__ import unicode_literals
switches the way Python works with strings:unicode_literals
future, strings without the u prefix are unicode strings encoded in either UCS-2 or UCS-4 (depends on the compiler flag used when compiling Python). Strings with the b prefix are literals for the data typebytes
which are rather similar to pre-3.x non-unicode strings.In either version of Python, byte-strings and unicode-strings must be converted. The conversion performed by default depends on your system's default charset; in your case this is UTF-8. Without setting anything, it should be ascii, which rejects all characters above \x7f.
The message digest returned by hashlib.md5(...).digest() is a bytes-string, and I suppose you want the result of the whole operation to be a byte-string as well. If you want that, convert the nonce and cnonce-strings to byte-strings.:
Alternatively, you can convert the byte-string coming from the call to
digest()
to a unicode string (not recommended). As the lower 8 bit of UCS-2 are equivalent to ISO-8859-1, this might serve your needs:问题是,一旦导入 unicode_literals,“%s:%s:%s”就变成了 unicode 字符串。
哈希的输出是一个“常规”字符串。 Python 尝试将常规字符串解码为 unicode 字符串,但失败了(正如预期的那样。哈希输出应该看起来像噪音)。
将代码更改为:
我假设
cnonce
和challenge["nonce"]
是常规字符串。要更好地控制它们到字符串的转换(如果需要),请使用:The problem is that "%s:%s:%s" became a unicode string once you imported unicode_literals.
The output of the hash is a "regular" string. Python tried to decode the regular string into a unicode string and failed (as expected. The hash output is supposed to look like noise).
Change your code to this:
I'm assuming
cnonce
andchallenge["nonce"]
are regular strings. To have more control over their conversion to strings (if needed), use: