pysqlite插入unicode数据8位字节串错误

发布于 2024-11-04 08:37:14 字数 1201 浏览 0 评论 0原文

我知道以前曾问过这个问题的类似排列,但答案似乎并没有说明我在这里做错了什么。

我正在尝试插入这一行: (Pdb) 打印行 ['886', '39', '83474', '0', '0', '0', '0', '0', '1.00', 'D', '20070813', 'R', ' C'、'B'、“SOCK 4PK”、 '\xe9\x9e\x8b\xe5\xad\x90\xe5\xb0\xba\xe5\xaf\xb86-9.5/24-27.5CM', 'PR']

放入此表中: 创建表项目(“whs”int,“dept”int,“item”int,“dsun”int,“oh”int,“ohrtv”int,“adjp”int,“adjn”int,“销售”文本,“ stat" 文本,"lsldt" int,"cat1" 文本,"cat2" 文本,"cat3" 文本,"des1" 文本,"sgn3"文本,“单位”文本);

sgn3 列似乎引起了问题。定义为TEXT,插入的数据为utf-8。为什么我收到 sqlite3 错误?

编程错误:'除非您使用可以解释 8 位 bytestr...= str 的 text_factory,否则不得使用 8 位字节字符串。强烈建议您将应用程序切换为 Unicode 字符串。

这是执行插入的代码:

query = 'insert into %s values(%s)' % (
    self.tablename,
    ','.join(['?' for field in row])
)
self.con.execute(query, row)

这是创建要插入的记录生成器的过程:

def encode_utf_8(self, csv_data, csv_encoding):
    """Decodes from 'csv_encoding' and encodes to utf-8.  

    Accepts any open csv file encoding using any scheme recognized by 
    python. Returns a generator.  

    """
    for line in csv_data:
        try:
            yield line.decode(csv_encoding).encode('utf-8')
        except UnicodeDecodeError:
            next

I know similar permutations of this question have been asked before, but the answers don't seem to shed light on what I am doing wrong here.

I am trying to insert this row:
(Pdb) print row
['886', '39', '83474', '0', '0', '0', '0', '0', '1.00', 'D', '20070813', 'R', 'C', 'B', "SOCK 4PK", '\xe9\x9e\x8b\xe5\xad\x90\xe5\xb0\xba\xe5\xaf\xb86-9.5/24-27.5CM', 'PR']

into this table:
CREATE TABLE item ("whs" int,"dept" int,"item" int,"dsun" int,"oh" int,"ohrtv" int,"adjp" int," adjn" int,"sell" text,"stat" text,"lsldt" int,"cat1" text,"cat2" text,"cat3" text,"des1" text,"sgn3" text,"unit" text);

The sgn3 column seems to causing the problems. It is defined as TEXT, and the data to be inserted is utf-8. Why am I receiving the sqlite3 error?

ProgrammingError: 'You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestr...= str). It is highly recommended that you instead just switch your application to Unicode strings.'

Here is the code doing the insert:

query = 'insert into %s values(%s)' % (
    self.tablename,
    ','.join(['?' for field in row])
)
self.con.execute(query, row)

And here is the procedure that creates the generator of records to be inserted:

def encode_utf_8(self, csv_data, csv_encoding):
    """Decodes from 'csv_encoding' and encodes to utf-8.  

    Accepts any open csv file encoding using any scheme recognized by 
    python. Returns a generator.  

    """
    for line in csv_data:
        try:
            yield line.decode(csv_encoding).encode('utf-8')
        except UnicodeDecodeError:
            next

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

﹉夏雨初晴づ 2024-11-11 08:37:14

这是我见过的最有帮助的错误消息之一。照它说的做就行了。向其提供 unicode 对象,而不是 UTF-8 编码的 str 对象。换句话说,丢失 .encode('utf-8') 或者稍后再解码('utf-8') ...到底是什么<代码>csvdata?

如果您在现有代码中遇到 UnicodeDecodeError:

(1) 您应该做一些比您打算用它做的事情更有用的事情(将其隐藏起来)

(2) 您可能希望更改 next 到 pass

回复评论

哈哈,这是一个非常有用的错误
留言

哈哈???我不是在开玩笑;我不是在开玩笑。它准确地告诉你该做什么。

csvdata 在本例中是一个 csv 文件
在 python 2.x 中使用 big5 编码

你叫什么“csv 文件”:

(1) csvdata = open('my_big5_file', 'rb')
(2) csvdata = csv.reader(open('my_big5_file', 'rb'))
(3) other; please specify 

如果我选择不编码为 utf-8,我的
行是ascii 对吗?

完全错误bytes_read_from_file.decode('big5') 生成一个 unicode 对象。您可能想阅读Python Unicode HOWTO

所以我需要在保存到数据库之前将它们显式更改为 unicode?

不,它们已经是 unicode 了。但是,根据 csvdata 的内容,您可能需要编码为 utf8 以通过 csv 机制获取它们,然后稍后对其进行解码。

That is one of the most helpful error messages that I've ever seen. Just do what it says. Feed it unicode objects, not UTF-8-encoded str objects. In other words, lose the .encode('utf-8') or maybe follow that later by decode('utf-8') ...what exactly is csvdata?

If you ever get a UnicodeDecodeError in your existing code:

(1) You should do something much more useful than what you intended to do with it (sweep it under the carpet)

(2) You may wish to change next to pass

Response to comment

haha, it is a very useful error
message

haha??? I wasn't joking; it tells you exactly what to do.

csvdata is a csv file in this case
encoding using big5 in python 2.x

What are you calling "a csv file":

(1) csvdata = open('my_big5_file', 'rb')
(2) csvdata = csv.reader(open('my_big5_file', 'rb'))
(3) other; please specify 

if I chose not to encode to utf-8, my
rows are ascii right?

Utterly wrong. bytes_read_from_file.decode('big5') produces a unicode object. You may like to read the Python Unicode HOWTO.

so i need to explicitly change them to unicode before saving to the database?

No, they are unicode already. However depending on what csvdata is, you may want to encode into utf8 to get them through the csv mechanism and then decode them later.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文