在 Python 中创建 utf-8 csv 文件

发布于 2024-09-06 21:51:39 字数 1958 浏览 2 评论 0原文

我无法在 Python 中创建 utf-8 csv 文件。

我正在尝试阅读它的文档，并在示例部分中，它说：

对于所有其他编码，如下 UnicodeReader 和 UnicodeWriter 可以使用类。他们采取附加编码参数构造函数并确保数据传递给真正的读者或作者编码为 UTF-8：

好的。所以我有这样的代码：

values = (unicode("Ñ", "utf-8"), unicode("é", "utf-8"))
f = codecs.open('eggs.csv', 'w', encoding="utf-8")
writer = UnicodeWriter(f)
writer.writerow(values)

我不断收到此错误：

line 159, in writerow
    self.stream.write(data)
  File "/usr/lib/python2.6/codecs.py", line 686, in write
    return self.writer.write(data)
  File "/usr/lib/python2.6/codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 22: ordinal not in range(128)

有人能给我一盏明灯，这样我就可以理解我到底做错了什么，因为我在调用 UnicodeWriter 类之前设置了所有编码？

class UnicodeWriter:
    """
    A CSV writer which will write rows to CSV file "f",
    which is encoded in the given encoding.
    """

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        # Redirect output to a queue
        self.queue = cStringIO.StringIO()
        self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
        self.stream = f
        self.encoder = codecs.getincrementalencoder(encoding)()

    def writerow(self, row):
        self.writer.writerow([s.encode("utf-8") for s in row])
        # Fetch UTF-8 output from the queue ...
        data = self.queue.getvalue()
        data = data.decode("utf-8")
        # ... and reencode it into the target encoding
        data = self.encoder.encode(data)
        # write to the target stream
        self.stream.write(data)
        # empty queue
        self.queue.truncate(0)

    def writerows(self, rows):
        for row in rows:
            self.writerow(row)

原文

I can't create an utf-8 csv file in Python.

I'm trying to read it's docs, and in the examples section, it says:

For all other encodings the following
UnicodeReader and UnicodeWriter
classes can be used. They take an
additional encoding parameter in their
constructor and make sure that the
data passes the real reader or writer
encoded as UTF-8:

Ok. So I have this code:

values = (unicode("Ñ", "utf-8"), unicode("é", "utf-8"))
f = codecs.open('eggs.csv', 'w', encoding="utf-8")
writer = UnicodeWriter(f)
writer.writerow(values)

And I keep getting this error:

line 159, in writerow
    self.stream.write(data)
  File "/usr/lib/python2.6/codecs.py", line 686, in write
    return self.writer.write(data)
  File "/usr/lib/python2.6/codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 22: ordinal not in range(128)

Can someone please give me a light so I can understand what the hell am I doing wrong since I set all the encoding everywhere before calling UnicodeWriter class?

class UnicodeWriter:
    """
    A CSV writer which will write rows to CSV file "f",
    which is encoded in the given encoding.
    """

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        # Redirect output to a queue
        self.queue = cStringIO.StringIO()
        self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
        self.stream = f
        self.encoder = codecs.getincrementalencoder(encoding)()

    def writerow(self, row):
        self.writer.writerow([s.encode("utf-8") for s in row])
        # Fetch UTF-8 output from the queue ...
        data = self.queue.getvalue()
        data = data.decode("utf-8")
        # ... and reencode it into the target encoding
        data = self.encoder.encode(data)
        # write to the target stream
        self.stream.write(data)
        # empty queue
        self.queue.truncate(0)

    def writerows(self, rows):
        for row in rows:
            self.writerow(row)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

妳是的陽光 2024-09-13 21:51:39

您不必使用codecs.open； UnicodeWriter 接受 Unicode 输入并负责将所有内容编码为 UTF-8。当 UnicodeWriter 写入您传递给它的文件句柄时，所有内容都已采用 UTF-8 编码（因此它可以与您使用 open 打开的普通文件一起使用）。

通过使用 codecs.open，您实际上可以在 UnicodeWriter 中将 Unicode 对象转换为 UTF-8 字符串，然后尝试再次将这些字符串重新编码为 UTF-8，就好像这些字符串一样strings 包含 Unicode 字符串，这显然失败了。

回复收藏 0 原文

装纯掩盖桑 2024-09-13 21:51:39

正如您所发现的，如果您使用普通打开，它就会起作用。

出现这种情况的原因是您尝试对 UTF-8 进行编码两次。一旦进入

f = codecs.open('eggs.csv', 'w', encoding="utf-8")

，然后在 UnicodeWriter.writeRow

# ... and reencode it into the target encoding
data = self.encoder.encode(data)

要检查这是否有效，请使用您的原始代码并注释掉该行。

格瑞兹

As you have figured out it works if you use plain open.

The reason for this is that you tried to encode UTF-8 twice. Once in

f = codecs.open('eggs.csv', 'w', encoding="utf-8")

and then later in UnicodeWriter.writeRow

# ... and reencode it into the target encoding
data = self.encoder.encode(data)

To check that this works use your original code and outcomment that line.

Greetz

回复收藏 0 原文

ゞ记忆︶ㄣ 2024-09-13 21:51:39

我不久前遇到了 csv / unicode 挑战，并将其扔到了 bitbucket 上： http://bitbucket.org/ knownactress/dude_csv ..如果您的需求很简单，可能适合您:)

回复收藏 0 原文

花伊自在美 2024-09-13 21:51:39

您不需要对所有内容进行“双重编码”。

您的应用程序应该完全以 Unicode 运行。

仅在 codecs.open 中进行编码，以将 UTF-8 字节写入外部文件。在您的应用程序中不要进行其他编码。

回复收藏 0 原文

~没有更多了~

关于作者

伴我老

暂无简介

0 文章

0 评论

21 人气

关注发私信

友情链接

文江博客

在 Python 中创建 utf-8 csv 文件

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

内心激荡

JSmiles

赏烟花じ飞满天

左秋

迪街小绵羊

瞳孔里扚悲伤

友情链接

在 Python 中创建 utf-8 csv 文件

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

内心激荡

JSmiles

赏烟花じ飞满天

左秋

迪街小绵羊

瞳孔里扚悲伤

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。