python 模块,如 csv-DictReader,具有完整的 utf8 支持
我需要从项目中的 csv 导入数据,并且需要像 DictReader 这样的对象,但是具有完整的 utf8 支持,有人知道有这样的模块或应用程序吗?
I need import data from a csv in my project and i need a object like DictReader, but with full utf8 supports, anyone knows a module or app with this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您的数据未采用 UTF-8 编码。它(大部分)以 cp1252 编码。该数据似乎包括西班牙名字。最常见的非 ASCII 字符是“\xd1”(即带波形符的拉丁大写字母 N)——这是导致异常的字符。
文件中的非 ASCII 字符之一是“\x8d”。它不在 cp1252 中。它出现在名称 VASQUEZ 中字母 A 应该出现的位置。其中,“\x94”(cp1252 中的双引号)出现在名称中间。其余的也可能代表错误。
我建议您运行这个小代码片段来打印其中包含可疑字符的行:
并修复数据。
那么您需要一个具有完整和通用解码支持的csv DictReader。完整意味着解码字段名(又名字典键)以及数据。广义意味着没有对编码进行硬编码。
import csv
输出:
以下是示例文件的结果(仅限第一个数据行,Python 2.7.1,Windows 7):
Your data is NOT encoded in UTF-8. It is (mostly) encoded in cp1252. The data appears to include Spanish names. The most prevalent non-ASCII character is '\xd1` (i.e. Latin capital letter N with tilde) -- this is the character that caused the exception.
One of the non-ASCII characters in the file is '\x8d'. It is NOT in cp1252. It appears where the letter A should appear in the name VASQUEZ. Of the others, '\x94' (curly double quote in cp1252) appears in the middle of a name. The remaining ones may also represent errors.
I suggest that you run this little code fragment to print lines with suspicious characters in them:
and fix up the data.
Then you need a csv DictReader with full and generalised decoding support. Full means decoding the fieldnames aka dict keys as well as the data. Generalised means no hardcoding of the encoding.
import csv
Output:
and here is what you get with your sample file (first data row only, Python 2.7.1, Windows 7):
正如这篇文章的答案所说:
你可以请参阅下面我的示例代码。我正在使用你的 csv 文件(请参阅评论)。
输出:
您可以看到“Ñ”已正确编码。
As the answer to this post said :
You can see below my example code. I'm using your csv file (see comments).
Ouput:
You can see that the 'Ñ' is correctly encoded.