CSV、DictWriter、unicode 和 utf-8
我在使用 DictWriter 和非 ASCII 字符时遇到问题。我的问题的简短版本:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import codecs
import csv
f = codecs.open("test.csv", 'w', 'utf-8')
writer = csv.DictWriter(f, ['field1'], delimiter='\t')
writer.writerow({'field1':u'å'.encode('utf-8')})
f.close()
给出了这个回溯:
Traceback (most recent call last):
File "test.py", line 10, in <module>writer.writerow({'field1':u'å'.encode('utf-8')})
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/csv.py", line 124, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/codecs.py", line 638, in write
return self.writer.write(data)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/codecs.py", line 303, in write data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
我有点迷失,因为根据我在文档中读到的内容,DictWriter 应该能够使用 UTF-8。
I am having problems with the DictWriter and non-ascii characters. A short version of my problem:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import codecs
import csv
f = codecs.open("test.csv", 'w', 'utf-8')
writer = csv.DictWriter(f, ['field1'], delimiter='\t')
writer.writerow({'field1':u'å'.encode('utf-8')})
f.close()
Gives this Traceback:
Traceback (most recent call last):
File "test.py", line 10, in <module>writer.writerow({'field1':u'å'.encode('utf-8')})
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/csv.py", line 124, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/codecs.py", line 638, in write
return self.writer.write(data)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/codecs.py", line 303, in write data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
I am bit lost as the DictWriter ought to be able to work with UTF-8 from what I have read in the documentation.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
使用
codecs.open
获得的对象需要在其write
方法中包含一个unicode 字符串——这就是重点。csv.DictWriter
当然是使用 utf8 编码的字节字符串来调用该方法,因此出现了异常。将
f
的创建更改为f = open("test.csv", 'wb')
(将codecs
从图片中取出)并一切应该都很好。The object you obtain with
codecs.open
wants a unicode string in itswrite
method -- that's the whole point.csv.DictWriter
of course is calling that method with a utf8-encoded byte string instead, whence the exception.Change
f
's creation tof = open("test.csv", 'wb')
(takingcodecs
out of the picture) and things should work just fine.