Python DictWriter 写入 UTF-8 编码的 CSV 文件
- 我有一个包含 unicode 字符串的字典列表。
csv.DictWriter
可以将字典列表写入 CSV 文件。- 我希望 CSV 文件以 UTF8 编码。
csv
模块无法处理将 unicode 字符串转换为 UTF8。csv
模块文档有一个将所有内容转换为 UTF8 的示例:def utf_8_encoder(unicode_csv_data): 对于 unicode_csv_data 中的行: 产量行.encode('utf-8')
它还有一个
UnicodeWriter
类。
但是...我如何让 DictWriter
与这些一起工作?难道他们不需要将自己注入到其中,以捕获反汇编的字典并在将它们写入文件之前对其进行编码吗?我不明白。
- I have a list of dictionaries containing unicode strings.
csv.DictWriter
can write a list of dictionaries into a CSV file.- I want the CSV file to be encoded in UTF8.
- The
csv
module cannot handle converting unicode strings into UTF8. The
csv
module documentation has an example for converting everything to UTF8:def utf_8_encoder(unicode_csv_data): for line in unicode_csv_data: yield line.encode('utf-8')
It also has a
UnicodeWriter
class.
But... how do I make DictWriter
work with these? Wouldn't they have to inject themselves in the middle of it, to catch the disassembled dictionaries and encode them before it writes them to the file? I don't get it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
您可以根据需要使用一些代理类对 dict 值进行编码,如下所示:
You can use some proxy class to encode dict values as needed, like this:
当您使用内容调用
csv.writer
时,其想法是通过utf_8_encoder
传递内容,因为它会为您提供 (utf-8) 编码的内容。When you call
csv.writer
with your content, the idea is to pass the content throughutf_8_encoder
as it would give you the (utf-8) encoded content.我的解决方案有点不同。虽然上述所有解决方案都专注于拥有 unicode 兼容的 dict,但我的解决方案使 DictWriter 与 unicode 兼容。 python 文档中甚至建议使用这种方法 (1) 。
UTF8Recoder、UnicodeReader、UnicodeWriter 类取自 python 文档。 UnicodeWriter->writerow 也做了一些改变。
将其用作常规 DictWriter/DictReader。
这是代码:
My solution is a bit different. While all solutions above are focusing on having unicode compatible dict, my solutions makes DictWriter compatible with unicode. This approach is even suggested in python docs (1).
Classes UTF8Recoder, UnicodeReader, UnicodeWriter are taken from python docs. UnicodeWriter->writerow was changed a little bit too.
Use it as regular DictWriter/DictReader.
Here is the code:
当您将字典传递给
DictWriter.writerow()
时,您可以将这些值即时转换为 UTF-8。例如:输出foo.csv:
You can convert the values to UTF-8 on the fly as you pass the dict to
DictWriter.writerow()
. For example:Output foo.csv:
更新:第3方unicodecsv模块实现了这个7年前的答案为你。此代码下面的示例。还有一个不需要第三方模块的 Python 3 解决方案。
原始Python 2答案
如果使用Python 2.7或更高版本,请在传递给DictWriter之前使用字典理解将字典重新映射为utf-8:
您可以使用此想法来更新
UnicodeWriter
到DictUnicodeWriter
:Python 2 unicodecsv 示例:
Python 3:
此外,Python 3 的内置 csv 模块本身支持 Unicode:
UPDATE: The 3rd party unicodecsv module implements this 7-year old answer for you. Example below this code. There's also a Python 3 solution that doesn't required a 3rd party module.
Original Python 2 Answer
If using Python 2.7 or later, use a dict comprehension to remap the dictionary to utf-8 before passing to DictWriter:
You can use this idea to update
UnicodeWriter
toDictUnicodeWriter
:Python 2 unicodecsv Example:
Python 3:
Additionally, Python 3's built-in csv module supports Unicode natively:
有一个简单的解决方法,使用精彩的 UnicodeCSV 模块。获得后,只需将行更改
为
,它就会自动开始与 UTF-8 良好地配合。
注意:切换到 Python 3 也可以解决这个问题(感谢 jamescampbell 的提示)。无论如何,这是一个人应该做的事情。
There is a simple workaround using the wonderful UnicodeCSV module. After having it, just change the line
to
And it automagically begins playing nice with UTF-8.
Note: Switching to Python 3 will also rid you of this problem (thanks jamescampbell for the tip). And it's something one should do anyway.