数据编码错误
我的问题是,将 csv 文件中的数据复制到数据库表中时出现以下编码错误。
psycopg2.DataError:编码“UTF8”的字节序列无效:0xf8 提示:如果字节序列与服务器期望的编码(由“client_encoding”控制)不匹配,也会发生此错误。
我没有使用任何编码和解码命令。为了将数据从文件复制到表中,我使用以下代码。
cur.copy_from(myFile, myTable)
这些文件包含大量特殊字符和奇怪的数据。但我想存储所有这些数据。
EDIT
该表是:
创建表myTable(id整数,名称字符变化(10000));
csv文件的样本是:
"1";"This is |_|¨^~~ || ¨text wuth special charater like Bjш;; ø"
"2";"Test data -._.- (2010/10/11) "
My problem is I got following encoding error while copying data from csv files into a database table.
psycopg2.DataError: invalid byte sequence for encoding "UTF8": 0xf8
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
I am not using any encoding and decoding command. And in order to copy data from file to a table I am using following code.
cur.copy_from(myFile, myTable)
And these files contains lot of special characters and wierd data. But I want to store all these data.
EDIT
The table is :
create table myTable(id integer, name character varying(10000));
and the sample of csv file is:
"1";"This is |_|¨^~~ || ¨text wuth special charater like Bjш;; ø"
"2";"Test data -._.- (2010/10/11) "
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您写道您没有指定任何编码,并且 psycopg2 默认为 UTF-8 那么。
0xf8
不是有效的单字节 UTF-8 代码点。您的源文件是否可能位于 ISO-8859-1 中,其中
0xf8
对应于ø
?编辑:
有几个地方可以解决这个问题,其中哪个是正确的取决于您的情况。
如果您必须反复导入 ISO-8859-1 文件,您可能需要使用 编码 以使脚本保持一致。
如果您只需要执行一次导入,为什么不简单地将文件转换为 Python 之外的预期格式,例如 iconv 或 重新编码?
You write that you are not specifying any encoding, and it seems like psycopg2 defaults to UTF-8 then.
0xf8
isn't a valid single-byte UTF-8 code point.Is your source file possibly in ISO-8859-1 where
0xf8
corresponds toø
?Edit:
There are several places where this problem could be addressed, and which of them is correct depends on your situation.
If you repeatedly will have to import ISO-8859-1-files you might want to work with encoding to make your script consistent.
If you only need to do this import once, why not simply convert the files to the expected format outside of Python, with for example iconv or recode?
数据库中列的数据类型是什么?它应该适合您想要放入的任何内容。
如果要存储字节数据,请使用二进制数据类型。
如果要存储文本数据,请使用字符数据类型。
您不能指望您的数据库可以将
.jpg
文件存储为文本,因为它不是文本。What is the data type of the column in the database? It should fit whatever you want to put into.
If you want to store byte data, use a binary data type.
If you want to store text data, use a character data type.
You cannot expect that your database can store a
.jpg
file as text, simply because it isn't text.如果您想按原样存储它,则不能使用字符数据类型。或者至少没有检查编码有效性的地方。听起来输入数据不是 UTF8 编码的。
您可以修复编码或切换到其他数据类型。
如果您有多个具有不同编码的输入文件,当您尝试比较所有编码中不存在的字符时,您可能会遇到有趣的比较问题。
If you want to store it as is, you cannot use a character datatype. Or at least not one where validity to an encoding is checked. It sounds like the input data is not UTF8 encoded.
You can either fix the encoding or switch to another datatype.
If you have multiple input files with different encodings you might run into interesting comparing problems when you try to compare characters that don't exist in all encodings.