Python:用字符串连接字节

发布于 2024-09-08 00:21:34 字数 671 浏览 5 评论 0原文

我正在开发一个 2.6 版本的 python 项目,该项目将来也支持正在使用的 python 3。具体来说,我正在研究 摘要-md5 算法。

在 python 2.6 中,无需运行此导入:

from __future__ import unicode_literals

我能够编写如下代码:

a1 = hashlib.md5("%s:%s:%s" % (self.username, self.domain, self.password)).digest() 
a1 = "%s:%s:%s" %(a1, challenge["nonce"], cnonce )

没有任何问题,我的身份验证工作正常。当我尝试使用导入的 unicode_literals 执行同一行代码时,出现异常:

UnicodeDecodeError: 'utf8' codec can't Decode byte 0xa8 inposition 0: Unexpected code byte

现在我对 python 比较陌生,所以我有点一直在想办法解决这个问题。如果我将格式化字符串中的 %s 替换为 %r 我可以连接该字符串,但身份验证不起作用。我读过的digest-md5规范说16个八位字节的二进制摘要必须附加到这些其他字符串中。

有什么想法吗?

I'm working on a python project in 2.6 that also has future support for python 3 being worked in. Specifically I'm working on a digest-md5 algorithm.

In python 2.6 without running this import:

from __future__ import unicode_literals

I am able to write a piece of code such as this:

a1 = hashlib.md5("%s:%s:%s" % (self.username, self.domain, self.password)).digest() 
a1 = "%s:%s:%s" %(a1, challenge["nonce"], cnonce )

Without any issues, my authentication works fine. When I try the same line of code with the unicode_literals imported I get an exception:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xa8 in position 0: unexpected code byte

Now I'm relatively new to python so I'm a bit stuck in figuring this out. if I replace the %s in the formatting string as %r I am able to concatenate the string, but the authentication doesn't work. The digest-md5 spec that I had read says that the 16 octet binary digest must be appended to these other strings.

Any thoughts?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

尐偏执 2024-09-15 00:21:34

您观察到的行为的原因是 from __future__ import unicode_literals 切换了 Python 处理字符串的方式:

  • 在 2.x 系列中,没有 u 前缀的字符串被视为字节序列,每个字节的范围可能在 \x00-\xff (含)范围内。带有 u 前缀的字符串是 ucs-2 编码的 unicode 序列。
  • 在 Python 3.x 以及 unicode_literals 未来中,不带 u 前缀的字符串是以 UCS-2 或 UCS-4 编码的 unicode 字符串(取决于编译 Python 时使用的编译器标志)。带有 b 前缀的字符串是数据类型 bytes 的文字,与 3.x 之前的非 unicode 字符串非常相似。

在任一版本的 Python 中,都必须转换字节字符串和 unicode 字符串。默认执行的转换取决于您系统的默认字符集;在你的情况下,这是UTF-8。如果不设置任何内容,它应该是 ascii,它拒绝 \x7f 以上的所有字符。

hashlib.md5(...).digest() 返回的消息摘要是一个字节字符串,我想您希望整个操作的结果也是一个字节字符串。如果需要,请将 nonce 和 cnonce-strings 转换为 byte-strings。:

a1 = hashlib.md5("%s:%s:%s"  % (self.username, self.domain, self.password)).digest()
# note that UTF-8 may not be the encoding required by your counterpart, please check
a1 = b"%s:%s:%s" %(a1, challenge["nonce"].encode("UTF-8"), cnonce.encode("UTF-8") )

或者,您可以将来自对 digest() 的调用的字节串转换为 unicode 字符串(不推荐) 。由于 UCS-2 的低 8 位相当于 ISO-8859-1,因此这可能满足您的需求:

a1 = hashlib.md5("%s:%s:%s"  % (self.username, self.domain, self.password)).digest()
a1 = "%s:%s:%s" %(a1.decode("ISO-8859-1"), challenge["nonce"], cnonce)

The reason for the behaviour you observed is that from __future__ import unicode_literals switches the way Python works with strings:

  • In the 2.x series, strings without the u prefix are treated as sequences of bytes, each of which may be in the range \x00-\xff (inclusive). Strings with the u prefix are ucs-2 encoded unicode sequences.
  • In Python 3.x -- as well as in the unicode_literals future, strings without the u prefix are unicode strings encoded in either UCS-2 or UCS-4 (depends on the compiler flag used when compiling Python). Strings with the b prefix are literals for the data type bytes which are rather similar to pre-3.x non-unicode strings.

In either version of Python, byte-strings and unicode-strings must be converted. The conversion performed by default depends on your system's default charset; in your case this is UTF-8. Without setting anything, it should be ascii, which rejects all characters above \x7f.

The message digest returned by hashlib.md5(...).digest() is a bytes-string, and I suppose you want the result of the whole operation to be a byte-string as well. If you want that, convert the nonce and cnonce-strings to byte-strings.:

a1 = hashlib.md5("%s:%s:%s"  % (self.username, self.domain, self.password)).digest()
# note that UTF-8 may not be the encoding required by your counterpart, please check
a1 = b"%s:%s:%s" %(a1, challenge["nonce"].encode("UTF-8"), cnonce.encode("UTF-8") )

Alternatively, you can convert the byte-string coming from the call to digest() to a unicode string (not recommended). As the lower 8 bit of UCS-2 are equivalent to ISO-8859-1, this might serve your needs:

a1 = hashlib.md5("%s:%s:%s"  % (self.username, self.domain, self.password)).digest()
a1 = "%s:%s:%s" %(a1.decode("ISO-8859-1"), challenge["nonce"], cnonce)
缪败 2024-09-15 00:21:34

问题是,一旦导入 unicode_literals,“%s:%s:%s”就变成了 unicode 字符串。
哈希的输出是一个“常规”字符串。 Python 尝试将常规字符串解码为 un​​icode 字符串,但失败了(正如预期的那样。哈希输出应该看起来像噪音)。
将代码更改为:

a1 = a1 + str(':') + str(challenge["nonce"]) + str(':') + str(cnonce)

我假设 cnoncechallenge["nonce"] 是常规字符串。要更好地控制它们到字符串的转换(如果需要),请使用:

a1 += str(':') + challenge["nonce"].encode('UTF-8') + str(':') + cnonce.encode('UTF-8')

The problem is that "%s:%s:%s" became a unicode string once you imported unicode_literals.
The output of the hash is a "regular" string. Python tried to decode the regular string into a unicode string and failed (as expected. The hash output is supposed to look like noise).
Change your code to this:

a1 = a1 + str(':') + str(challenge["nonce"]) + str(':') + str(cnonce)

I'm assuming cnonce and challenge["nonce"] are regular strings. To have more control over their conversion to strings (if needed), use:

a1 += str(':') + challenge["nonce"].encode('UTF-8') + str(':') + cnonce.encode('UTF-8')
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文