Python Unicode Ascii，序数不在范围内，令人沮丧的错误

发布于 2025-01-04 07:33:20 字数 1161 浏览 1 评论 0原文

这是我的问题...

数据库以 unicode 存储所有内容。 hashlib.sha256().digest() 接受 str 并返回 str。

当我尝试用数据填充哈希函数时，我得到了著名的错误：

UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position 1: ordinal not in range(128)

这是我的数据

>>> db_digest
u"'\x90\x017~1\xe0\xaf4\xf2\xec\xd5]:j\xef\xe6\x80\x88\x89\xfe\xf7\x99,c\xff\xb7\x06hXR\x99\xad\x91\x93lM:\xafT\xc9j\xec\xc3\xb7\xea[\x80\xe0e\xd6\\\xd8\x16'\xcb6\xc8\xaa\xdf\xc9 :\xff"
>>> 
>>> hashlib.sha256(db_digest)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\x90' in position 1: ordinal not in range(128)
>>> 
>>> asc_db_digest
"'\x90\x017~1\xe0\xaf4\xf2\xec\xd5]:j\xef\xe6\x80\x88\x89\xfe\xf7\x99,c\xff\xb7\x06hXR\x99\xad\x91\x93lM:\xafT\xc9j\xec\xc3\xb7\xea[\x80\xe0e\xd6\\\xd8\x16'\xcb6\xc8\xaa\xdf\xc9 :\xff"
>>> hashlib.sha256(asc_db_digest)
<sha256 HASH object @ 0x7f7da0f04300>

所以我所要求的只是将 db_digest 转换为 asc_db_digest

编辑我重新表述了这个问题，因为我似乎一开始就没有正确认识到这个问题。

原文

Here is my problem...

Database stores everything in unicode.
hashlib.sha256().digest() accepts str and returns str.

When I try to stuff hash function with the data, I get the famous error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position 1: ordinal not in range(128)

This is my data

>>> db_digest
u"'\x90\x017~1\xe0\xaf4\xf2\xec\xd5]:j\xef\xe6\x80\x88\x89\xfe\xf7\x99,c\xff\xb7\x06hXR\x99\xad\x91\x93lM:\xafT\xc9j\xec\xc3\xb7\xea[\x80\xe0e\xd6\\\xd8\x16'\xcb6\xc8\xaa\xdf\xc9 :\xff"
>>> 
>>> hashlib.sha256(db_digest)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\x90' in position 1: ordinal not in range(128)
>>> 
>>> asc_db_digest
"'\x90\x017~1\xe0\xaf4\xf2\xec\xd5]:j\xef\xe6\x80\x88\x89\xfe\xf7\x99,c\xff\xb7\x06hXR\x99\xad\x91\x93lM:\xafT\xc9j\xec\xc3\xb7\xea[\x80\xe0e\xd6\\\xd8\x16'\xcb6\xc8\xaa\xdf\xc9 :\xff"
>>> hashlib.sha256(asc_db_digest)
<sha256 HASH object @ 0x7f7da0f04300>

So all I am asking for is a way to turn db_digest into asc_db_digest

Edit
I have rephrased the question as it seems I haven't recognized teh problem correctly at first place.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

↙温凉少女 2025-01-11 07:33:20

如果您有一个仅包含 0 到 255（字节）代码点的 unicode 字符串，您可以使用 raw_unicode_escape 编码将其转换为 Python str：

>>> db_digest = u"'\x90\x017~1\xe0\xaf4\xf2\xec\xd5]:j\xef\xe6\x80\x88\x89\xfe\xf7\x99,c\xff\xb7\x06hXR\x99\xad\x91\x93lM:\xafT\xc9j\xec\xc3\xb7\xea[\x80\xe0e\xd6\\\xd8\x16'\xcb6\xc8\xaa\xdf\xc9 :\xff"
>>> hash_digest = "'\x90\x017~1\xe0\xaf4\xf2\xec\xd5]:j\xef\xe6\x80\x88\x89\xfe\xf7\x99,c\xff\xb7\x06hXR\x99\xad\x91\x93lM:\xafT\xc9j\xec\xc3\xb7\xea[\x80\xe0e\xd6\\\xd8\x16'\xcb6\xc8\xaa\xdf\xc9 :\xff"
>>> db_digest.encode('raw_unicode_escape')
"'\x90\x017~1\xe0\xaf4\xf2\xec\xd5]:j\xef\xe6\x80\x88\x89\xfe\xf7\x99,c\xff\xb7\x06hXR\x99\xad\x91\x93lM:\xafT\xc9j\xec\xc3\xb7\xea[\x80\xe0e\xd6\\\xd8\x16'\xcb6\xc8\xaa\xdf\xc9 :\xff"
>>> db_digest.encode('raw_unicode_escape') == hash_digest
True

If you have a unicode string that only contains code points from 0 to 255 (bytes) you can convert it to a Python str using the raw_unicode_escape encoding:

>>> db_digest = u"'\x90\x017~1\xe0\xaf4\xf2\xec\xd5]:j\xef\xe6\x80\x88\x89\xfe\xf7\x99,c\xff\xb7\x06hXR\x99\xad\x91\x93lM:\xafT\xc9j\xec\xc3\xb7\xea[\x80\xe0e\xd6\\\xd8\x16'\xcb6\xc8\xaa\xdf\xc9 :\xff"
>>> hash_digest = "'\x90\x017~1\xe0\xaf4\xf2\xec\xd5]:j\xef\xe6\x80\x88\x89\xfe\xf7\x99,c\xff\xb7\x06hXR\x99\xad\x91\x93lM:\xafT\xc9j\xec\xc3\xb7\xea[\x80\xe0e\xd6\\\xd8\x16'\xcb6\xc8\xaa\xdf\xc9 :\xff"
>>> db_digest.encode('raw_unicode_escape')
"'\x90\x017~1\xe0\xaf4\xf2\xec\xd5]:j\xef\xe6\x80\x88\x89\xfe\xf7\x99,c\xff\xb7\x06hXR\x99\xad\x91\x93lM:\xafT\xc9j\xec\xc3\xb7\xea[\x80\xe0e\xd6\\\xd8\x16'\xcb6\xc8\xaa\xdf\xc9 :\xff"
>>> db_digest.encode('raw_unicode_escape') == hash_digest
True

回复收藏 0 原文

故人爱我别走 2025-01-11 07:33:20

哈希对字节进行操作（Python 2.x 中的 bytes、str），而不是字符串（2.x 中的unicode、str< /code> 在 3.x 中）。因此，无论如何您都必须提供字节。尝试：

hashlib.sha1(salt.encode('utf-8') + data).digest()

hashes operates on bytes (bytes, str in Python 2.x), not strings (unicode in 2.x, str in 3.x). Therefore, you must supply bytes anyways. Try:

hashlib.sha1(salt.encode('utf-8') + data).digest()

回复收藏 0 原文

鲜血染红嫁衣 2025-01-11 07:33:20

哈希值将包含 0-255 范围内的“字符”。这些都是有效的 Unicode 字符，但不是 Unicode 字符串。你需要以某种方式转换它。最好的解决方案是将其编码为 base64 之类的内容。

还有一个 hacky 解决方案可以将返回的字节直接转换为伪 Unicode 字符串，就像您的数据库似乎正在做的那样：

hash_unicode = u''.join([unichr(ord(c)) for c in hash_digest])

您也可以采用其他方法，但这更危险，因为“字符串”将包含外部字符0-127 的 ASCII 范围，当您尝试使用它时可能会抛出错误。

asc_db_digest = ''.join([chr(ord(c)) for c in db_digest])

The hash will contain "characters" that are in the range 0-255. These are all valid Unicode characters, but it's not a Unicode string. You need to convert it somehow. The best solution would be to encode it into something like base64.

There's also a hacky solution to convert the bytes returned directly into a pseudo-Unicode string, exactly as your database appears to be doing it:

hash_unicode = u''.join([unichr(ord(c)) for c in hash_digest])

You can also go the other way, but this is more dangerous as the "string" will contain characters outside of the ASCII range of 0-127 and might throw errors when you try to use it.

asc_db_digest = ''.join([chr(ord(c)) for c in db_digest])

回复收藏 0 原文

~没有更多了~

关于作者

九命猫

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

Python Unicode Ascii，序数不在范围内，令人沮丧的错误

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

紫罗兰の梦幻

-2134

liuxuanli

意中人

○愚か者の日

xxhui

友情链接

Python Unicode Ascii，序数不在范围内，令人沮丧的错误

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

紫罗兰の梦幻

-2134

liuxuanli

意中人

○愚か者の日

xxhui

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。