Python：用字符串连接字节

发布于 2024-09-08 00:21:34 字数 671 浏览 5 评论 0原文

我正在开发一个 2.6 版本的 python 项目，该项目将来也支持正在使用的 python 3。具体来说，我正在研究摘要-md5 算法。

在 python 2.6 中，无需运行此导入：

from __future__ import unicode_literals

我能够编写如下代码：

a1 = hashlib.md5("%s:%s:%s" % (self.username, self.domain, self.password)).digest() 
a1 = "%s:%s:%s" %(a1, challenge["nonce"], cnonce )

没有任何问题，我的身份验证工作正常。当我尝试使用导入的 unicode_literals 执行同一行代码时，出现异常：

UnicodeDecodeError: 'utf8' codec can't Decode byte 0xa8 inposition 0: Unexpected code byte

现在我对 python 比较陌生，所以我有点一直在想办法解决这个问题。如果我将格式化字符串中的 %s 替换为 %r 我可以连接该字符串，但身份验证不起作用。我读过的digest-md5规范说16个八位字节的二进制摘要必须附加到这些其他字符串中。

有什么想法吗？

原文

I'm working on a python project in 2.6 that also has future support for python 3 being worked in. Specifically I'm working on a digest-md5 algorithm.

In python 2.6 without running this import:

from __future__ import unicode_literals

I am able to write a piece of code such as this:

a1 = hashlib.md5("%s:%s:%s" % (self.username, self.domain, self.password)).digest() 
a1 = "%s:%s:%s" %(a1, challenge["nonce"], cnonce )

Without any issues, my authentication works fine. When I try the same line of code with the unicode_literals imported I get an exception:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xa8 in position 0: unexpected code byte

Now I'm relatively new to python so I'm a bit stuck in figuring this out. if I replace the %s in the formatting string as %r I am able to concatenate the string, but the authentication doesn't work. The digest-md5 spec that I had read says that the 16 octet binary digest must be appended to these other strings.

Any thoughts?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

尐偏执 2024-09-15 00:21:34

您观察到的行为的原因是 from __future__ import unicode_literals 切换了 Python 处理字符串的方式：

在 2.x 系列中，没有 u 前缀的字符串被视为字节序列，每个字节的范围可能在 \x00-\xff （含）范围内。带有 u 前缀的字符串是 ucs-2 编码的 unicode 序列。
在 Python 3.x 以及 unicode_literals 未来中，不带 u 前缀的字符串是以 UCS-2 或 UCS-4 编码的 unicode 字符串（取决于编译 Python 时使用的编译器标志）。带有 b 前缀的字符串是数据类型 bytes 的文字，与 3.x 之前的非 unicode 字符串非常相似。

在任一版本的 Python 中，都必须转换字节字符串和 unicode 字符串。默认执行的转换取决于您系统的默认字符集；在你的情况下，这是UTF-8。如果不设置任何内容，它应该是 ascii，它拒绝 \x7f 以上的所有字符。

hashlib.md5(...).digest() 返回的消息摘要是一个字节字符串，我想您希望整个操作的结果也是一个字节字符串。如果需要，请将 nonce 和 cnonce-strings 转换为 byte-strings。：

a1 = hashlib.md5("%s:%s:%s"  % (self.username, self.domain, self.password)).digest()
# note that UTF-8 may not be the encoding required by your counterpart, please check
a1 = b"%s:%s:%s" %(a1, challenge["nonce"].encode("UTF-8"), cnonce.encode("UTF-8") )

或者，您可以将来自对 digest() 的调用的字节串转换为 unicode 字符串（不推荐）。由于 UCS-2 的低 8 位相当于 ISO-8859-1，因此这可能满足您的需求：

a1 = hashlib.md5("%s:%s:%s"  % (self.username, self.domain, self.password)).digest()
a1 = "%s:%s:%s" %(a1.decode("ISO-8859-1"), challenge["nonce"], cnonce)

The reason for the behaviour you observed is that from __future__ import unicode_literals switches the way Python works with strings:

In the 2.x series, strings without the u prefix are treated as sequences of bytes, each of which may be in the range \x00-\xff (inclusive). Strings with the u prefix are ucs-2 encoded unicode sequences.
In Python 3.x -- as well as in the unicode_literals future, strings without the u prefix are unicode strings encoded in either UCS-2 or UCS-4 (depends on the compiler flag used when compiling Python). Strings with the b prefix are literals for the data type bytes which are rather similar to pre-3.x non-unicode strings.

In either version of Python, byte-strings and unicode-strings must be converted. The conversion performed by default depends on your system's default charset; in your case this is UTF-8. Without setting anything, it should be ascii, which rejects all characters above \x7f.

The message digest returned by hashlib.md5(...).digest() is a bytes-string, and I suppose you want the result of the whole operation to be a byte-string as well. If you want that, convert the nonce and cnonce-strings to byte-strings.:

a1 = hashlib.md5("%s:%s:%s"  % (self.username, self.domain, self.password)).digest()
# note that UTF-8 may not be the encoding required by your counterpart, please check
a1 = b"%s:%s:%s" %(a1, challenge["nonce"].encode("UTF-8"), cnonce.encode("UTF-8") )

Alternatively, you can convert the byte-string coming from the call to digest() to a unicode string (not recommended). As the lower 8 bit of UCS-2 are equivalent to ISO-8859-1, this might serve your needs:

a1 = hashlib.md5("%s:%s:%s"  % (self.username, self.domain, self.password)).digest()
a1 = "%s:%s:%s" %(a1.decode("ISO-8859-1"), challenge["nonce"], cnonce)

回复收藏 0 原文

缪败 2024-09-15 00:21:34

问题是，一旦导入 unicode_literals，“%s:%s:%s”就变成了 unicode 字符串。
哈希的输出是一个“常规”字符串。 Python 尝试将常规字符串解码为 unicode 字符串，但失败了（正如预期的那样。哈希输出应该看起来像噪音）。
将代码更改为：

a1 = a1 + str(':') + str(challenge["nonce"]) + str(':') + str(cnonce)

我假设 cnonce 和 challenge["nonce"] 是常规字符串。要更好地控制它们到字符串的转换（如果需要），请使用：

a1 += str(':') + challenge["nonce"].encode('UTF-8') + str(':') + cnonce.encode('UTF-8')

The problem is that "%s:%s:%s" became a unicode string once you imported unicode_literals.
The output of the hash is a "regular" string. Python tried to decode the regular string into a unicode string and failed (as expected. The hash output is supposed to look like noise).
Change your code to this:

a1 = a1 + str(':') + str(challenge["nonce"]) + str(':') + str(cnonce)

I'm assuming cnonce and challenge["nonce"] are regular strings. To have more control over their conversion to strings (if needed), use:

a1 += str(':') + challenge["nonce"].encode('UTF-8') + str(':') + cnonce.encode('UTF-8')

回复收藏 0 原文

~没有更多了~

关于作者

独自←快乐

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

Python：用字符串连接字节

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

Cooper

Great Marx

感性

mb_IvyEMzfd

止于盛夏

记忆で

友情链接

Python：用字符串连接字节

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

Cooper

Great Marx

感性

mb_IvyEMzfd

止于盛夏

记忆で

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。