Python：将混合解码的UTF-8字符转换为文本

发布于 2025-02-03 22:01:53 字数 1148 浏览 2 评论 0原文

使用RESTFUL服务，我有一个Python 3.x脚本从供应商下载文本数据，然后将其降到文本文件中。数据包含包含解码UTF-8字符的文本。这是我收到的文本的一个示例：

b'Sample data plus some Japanese characters \xe3\x81\xaa\xe3\x81\x9c\xe6\x97\xa5\xe9\x8a\x80\xe3\x81\xa0\xe3\x81\x91\xe9\x81\x95\xe3\x81\x86\xe3\x81\xae\xe3\x81\x8b\xef\xbc\x9f
\xe2\x80\x94\x80\x94\x80\x94\x80\x94 and then more data'

请注意，它存储在变量中，例如str_data。我想在将这些解码字符存储到数据库之前将这些解码字符转换。当我检查type（str_data）时，我得到：＆lt; class'Str'＆gt;即使它具有＆lt; class'byte'＆gt;类型结构（例如，b'stuff'）。我尝试了我能想到的所有内容：encode（），decode（）等，但无济于事。我想要的输出是：

Sample data plus some Japanese characters なぜ日銀だけ違うのか？— and then more data

任何帮助都很棒。谢谢。

Update

如果有帮助，这就是我删除数据的方式。

  resp = requests.get(get_url)
  f = open(self.export_file, "w")
  f.write(str(resp.content))
  f.close()

如果我在写作上不使用str（），那么...

  resp = requests.get(get_url)
  f = open(self.export_file, "w")
  **f.write(resp.content)**
  f.close()

我会得到以下...

TypeError: write() argument must be str, not bytes

原文

Using a RESTful service, I have a Python 3.x script download text data from a vendor and land it to a text file. The data contains text that includes decoded UTF-8 characters. Here's an example of the text I receive:

b'Sample data plus some Japanese characters \xe3\x81\xaa\xe3\x81\x9c\xe6\x97\xa5\xe9\x8a\x80\xe3\x81\xa0\xe3\x81\x91\xe9\x81\x95\xe3\x81\x86\xe3\x81\xae\xe3\x81\x8b\xef\xbc\x9f
\xe2\x80\x94\x80\x94\x80\x94\x80\x94 and then more data'

Note that this is stored in a variable, say str_data. I'd like to convert those decoded characters before storing it into a database. When I check type(str_data) I get: <class 'str'> even though it has <class 'byte'> type structure (e.g., b'stuff'). I have tried everything I can think of: encode(), decode(), etc. but to no avail. The output I want is this:

Sample data plus some Japanese characters なぜ日銀だけ違うのか？— and then more data

Any help would be great. Thank you.

Update

If it will help, here's how I pulled down the data.

  resp = requests.get(get_url)
  f = open(self.export_file, "w")
  f.write(str(resp.content))
  f.close()

If I don't use str() on my write, like so...

  resp = requests.get(get_url)
  f = open(self.export_file, "w")
  **f.write(resp.content)**
  f.close()

I get the following...

TypeError: write() argument must be str, not bytes

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

〆凄凉。 2025-02-10 22:01:53

>>> import ast
>>> ast.literal_eval(str_data).decode('utf-8', errors='replace')
'Sample data plus some Japanese characters なぜ日銀だけ違うのか？—������ and then more data'

Some of the bytes in that string are not UTF-8 encoded, that's why you're having trouble. The Japanese characters are though.

>>> import ast
>>> ast.literal_eval(str_data).decode('utf-8', errors='replace')
'Sample data plus some Japanese characters なぜ日銀だけ違うのか？—������ and then more data'

回复收藏 0 原文

~没有更多了~