Python Unicode CSV 导出（使用 Django）

发布于 2024-09-27 08:40:00 字数 986 浏览 3 评论 0原文

我正在使用 Django 应用程序将字符串导出到 CSV 文件。该字符串是通过前端表单提交的消息。但是，当输入中提供 unicode 单引号时，我收到此错误。

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' 
  in position 200: ordinal not in range(128)

我一直在尝试使用下面的代码将 unicode 转换为 ascii，但仍然遇到类似的错误。

UnicodeEncodeError: 'ascii' codec can't encode characters in 
position 0-9: ordinal not in range(128)

我已经浏览了数十个网站并了解了很多有关 unicode 的知识，但是，我仍然无法将这个 unicode 转换为 ascii。我不在乎算法是否删除 unicode 字符。注释行表示我尝试过的一些不同选项，但错误仍然存在。

import csv
import unicodedata

...

#message = unicode( unicodedata.normalize(
#                            'NFKD',contact.message).encode('ascii','ignore'))
#dmessage = (contact.message).encode('utf-8','ignore')
#dmessage = contact.message.decode("utf-8")
#dmessage = "%s" % dmessage
dmessage = contact.message

csv_writer.writerow([
        dmessage,
])

有人对删除 unicode 字符有任何建议吗？我可以将它们导出到 CSV？这个看似简单的问题却让我头晕目眩。非常感谢任何帮助。谢谢，乔

原文

I'm using a Django app to export a string to a CSV file. The string is a message that was submitted through a front end form. However, I've been getting this error when a unicode single quote is provided in the input.

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' 
  in position 200: ordinal not in range(128)

I've been trying to convert the unicode to ascii using the code below, but still get a similar error.

UnicodeEncodeError: 'ascii' codec can't encode characters in 
position 0-9: ordinal not in range(128)

I've sifted through dozens of websites and learned a lot about unicode, however, I'm still not able to convert this unicode to ascii. I don't care if the algorithm removes the unicode characters. The commented lines indicate some various options I've tried, but the error persists.

import csv
import unicodedata

...

#message = unicode( unicodedata.normalize(
#                            'NFKD',contact.message).encode('ascii','ignore'))
#dmessage = (contact.message).encode('utf-8','ignore')
#dmessage = contact.message.decode("utf-8")
#dmessage = "%s" % dmessage
dmessage = contact.message

csv_writer.writerow([
        dmessage,
])

Does anyone have any advice in removing unicode characters to I can export them to CSV? This seemingly easy problem has kept my head spinning. Any help is much appreciated.
Thanks,
Joe

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

伤痕我心 2024-10-04 08:40:00

您无法将 Unicode 字符 u'\u2019'（U+2019 右单引号）编码为 ASCII，因为 ASCII 中没有该字符。 ASCII只是基本的拉丁字母、数字和标点符号；你不会得到任何像这个字符一样的重音字母或“智能引号”。

所以你必须选择另一种编码。现在通常明智的做法是导出为 UTF-8，它可以容纳任何 Unicode 字符。不幸的是，如果您的目标用户正在使用 Office（他们可能正在使用），他们将无法读取 CSV 中的 UTF-8 编码字符。相反，Excel 将使用该机器的系统默认代码页（也被误导性地称为“ANSI”代码页）读取文件，并最终得到 mojibake，如 ' 而不是 '。

因此，这意味着如果您希望字符正确显示，您必须猜测用户的系统默认代码页。对于西方用户，这将是代码页 1252。安装非西方 Windows 的用户将看到错误的字符，但对此您无能为力（除了组织一次给 Microsoft 写信的活动，以消除愚蠢的废话）已经 ANSI 并像其他人一样使用 UTF-8）。

代码页 1252 可以包含 U+2019 (')，但显然还有更多的字符它无法表示。为了避免这些字符出现 UnicodeEncodeError 错误，您可以使用 ignore 参数（或 replace 将它们替换为问号）。

dmessage= contact.message.encode('cp1252', 'ignore')

或者，放弃并删除所有非 ASCII 字符，这样无论语言环境如何，每个人都会获得同样糟糕的体验：

dmessage= contact.message.encode('ascii', 'ignore')

You can't encode the Unicode character u'\u2019' (U+2019 Right Single Quotation Mark) into ASCII, because ASCII doesn't have that character in it. ASCII is only the basic Latin alphabet, digits and punctuation; you don't get any accented letters or ‘smart quotes’ like this character.

So you will have to choose another encoding. Now normally the sensible thing to do would be to export to UTF-8, which can hold any Unicode character. Unfortunately for you if your target users are using Office (and they probably are), they're not going to be able to read UTF-8-encoded characters in CSV. Instead Excel will read the files using the system default code page for that machine (also misleadingly known as the ‘ANSI’ code page), and end up with mojibake like â€™ instead of ’.

So that means you have to guess the user's system default code page if you want the characters to show up correctly. For Western users, that will be code page 1252. Users with non-Western Windows installs will see the wrong characters, but there's nothing you can do about that (other than organise a letter-writing campaign to Microsoft to just drop the stupid nonsense with ANSI already and use UTF-8 like everyone else).

Code page 1252 can contain U+2019 (’), but obviously there are many more characters it can't represent. To avoid getting UnicodeEncodeError for those characters you can use the ignore argument (or replace to replace them with question marks).

dmessage= contact.message.encode('cp1252', 'ignore')

alternatively, to give up and remove all non-ASCII characters, so that everyone gets an equally bad experience regardless of locale:

dmessage= contact.message.encode('ascii', 'ignore')

回复收藏 0 原文

一袭白衣梦中忆 2024-10-04 08:40:00

编码是一件痛苦的事情，但是如果您在 django 中工作，您是否尝试过 django.utils.encoding 中的 smart_unicode(str) ？我发现这通常可以解决问题。

我发现的唯一其他选择是对字符串使用内置的 python encode() 和 decode() ，但是您必须指定这些字符串的编码老实说，这是一种痛苦。

回复收藏 0 原文

以歌曲疗慰 2024-10-04 08:40:00

[警告：我不是 djangoist； django 可能有更好的解决方案]。

一般非 django 特定的答案：

如果您有少量已知的非 ASCII 字符，并且有用户可接受的 ASCII 等效项，您可以设置一个转换表并使用 unicode.translate 方法：

smashcii = {
    0x2019 : u"'",
    # etc
    #

smashed = input_string.translate(smashcii)

[caveat: I'm not a djangoist; django may have a better solution].

General non-django-specific answer:

If you have a smallish number of known non-ASCII characters and there are user-acceptable ASCII equivalents for them, you can set up a translation table and use the unicode.translate method:

smashcii = {
    0x2019 : u"'",
    # etc
    #

smashed = input_string.translate(smashcii)

回复收藏 0 原文

~没有更多了~

关于作者

无语#

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

Python Unicode CSV 导出（使用 Django）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

淡笑忘祈一世凡恋

我们的影子

素年丶

南笙

18215568913

qq_xk7Ean

友情链接

Python Unicode CSV 导出（使用 Django）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

淡笑忘祈一世凡恋

我们的影子

素年丶

南笙

18215568913

qq_xk7Ean

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。