Python:将混合解码的UTF-8字符转换为文本
使用RESTFUL服务,我有一个Python 3.x脚本从供应商下载文本数据,然后将其降到文本文件中。数据包含包含解码UTF-8字符的文本。这是我收到的文本的一个示例:
b'Sample data plus some Japanese characters \xe3\x81\xaa\xe3\x81\x9c\xe6\x97\xa5\xe9\x8a\x80\xe3\x81\xa0\xe3\x81\x91\xe9\x81\x95\xe3\x81\x86\xe3\x81\xae\xe3\x81\x8b\xef\xbc\x9f
\xe2\x80\x94\x80\x94\x80\x94\x80\x94 and then more data'
请注意,它存储在变量中,例如str_data
。我想在将这些解码字符存储到数据库之前将这些解码字符转换。当我检查type(str_data)
时,我得到:< class'Str'>即使它具有< class'byte'>类型结构(例如,b'stuff')。我尝试了我能想到的所有内容:encode(),decode()等,但无济于事。我想要的输出是:
Sample data plus some Japanese characters なぜ日銀だけ違うのか?— and then more data
任何帮助都很棒。谢谢。
Update
如果有帮助,这就是我删除数据的方式。
resp = requests.get(get_url)
f = open(self.export_file, "w")
f.write(str(resp.content))
f.close()
如果我在写作上不使用str()
,那么...
resp = requests.get(get_url)
f = open(self.export_file, "w")
**f.write(resp.content)**
f.close()
我会得到以下...
TypeError: write() argument must be str, not bytes
Using a RESTful service, I have a Python 3.x script download text data from a vendor and land it to a text file. The data contains text that includes decoded UTF-8 characters. Here's an example of the text I receive:
b'Sample data plus some Japanese characters \xe3\x81\xaa\xe3\x81\x9c\xe6\x97\xa5\xe9\x8a\x80\xe3\x81\xa0\xe3\x81\x91\xe9\x81\x95\xe3\x81\x86\xe3\x81\xae\xe3\x81\x8b\xef\xbc\x9f
\xe2\x80\x94\x80\x94\x80\x94\x80\x94 and then more data'
Note that this is stored in a variable, say str_data
. I'd like to convert those decoded characters before storing it into a database. When I check type(str_data)
I get: <class 'str'> even though it has <class 'byte'> type structure (e.g., b'stuff'). I have tried everything I can think of: encode(), decode(), etc. but to no avail. The output I want is this:
Sample data plus some Japanese characters なぜ日銀だけ違うのか?— and then more data
Any help would be great. Thank you.
Update
If it will help, here's how I pulled down the data.
resp = requests.get(get_url)
f = open(self.export_file, "w")
f.write(str(resp.content))
f.close()
If I don't use str()
on my write, like so...
resp = requests.get(get_url)
f = open(self.export_file, "w")
**f.write(resp.content)**
f.close()
I get the following...
TypeError: write() argument must be str, not bytes
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Some of the bytes in that string are not UTF-8 encoded, that's why you're having trouble. The Japanese characters are though.