python x64 中的编码问题
我正在尝试编写一个小脚本,用于从保存在文件中的存档列表写入 sqlite 表。到目前为止的代码是这样的:
import os import _sqlite3 import sys
print sys.path[0] mydir = sys.path[0] print (mydir) def listdir(mydir):
lis=[]
for root, dirs, files in os.walk(mydir):
for name in files:
lis.append(os.path.join(root,name))
return lis
filename = "list.txt" print ("writting in %s" % filename) file = open(filename, 'w' ) for i in listdir(mydir):
file.write(i)
file.write("\n") file.close()
con =
_sqlite3.connect("%s/conection"%mydir) c=con.cursor()
c.execute(''' drop table files ''') c.execute('create table files (name text, other text)') file = open(filename,'r') for line in file :
a = 1
for t in [("%s"%line, "%i"%a)]:
c.execute('insert into files values(?,?)',t)
a=a+1 c.execute('select * from files') print c.fetchall() con.commit() c.close()
当我运行时,我得到以下信息:
Traceback (most recent call last): File "C:\Users\josh\FORGE.py", line 32, in <module>
c.execute('insert into files values(?,?)',t) ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.
我已经尝试使用 unicode() 内置函数,但仍然无法工作,说他无法解码字符 0xed 或其他东西。
我知道问题出在列表字符串的编码上,但我找不到正确的方法。有什么想法吗?提前致谢!
i´m trying to write a little script for writting a sqlite table from an archive list saved in a file. the code so far is this:
import os import _sqlite3 import sys
print sys.path[0] mydir = sys.path[0] print (mydir) def listdir(mydir):
lis=[]
for root, dirs, files in os.walk(mydir):
for name in files:
lis.append(os.path.join(root,name))
return lis
filename = "list.txt" print ("writting in %s" % filename) file = open(filename, 'w' ) for i in listdir(mydir):
file.write(i)
file.write("\n") file.close()
con =
_sqlite3.connect("%s/conection"%mydir) c=con.cursor()
c.execute(''' drop table files ''') c.execute('create table files (name text, other text)') file = open(filename,'r') for line in file :
a = 1
for t in [("%s"%line, "%i"%a)]:
c.execute('insert into files values(?,?)',t)
a=a+1 c.execute('select * from files') print c.fetchall() con.commit() c.close()
when i run i get the following:
Traceback (most recent call last): File "C:\Users\josh\FORGE.py", line 32, in <module>
c.execute('insert into files values(?,?)',t) ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.
i´ve tried with the unicode() built in function but still won´t work, saying that he can´t decode the character 0xed or something.
I know the problem is on the encoding of the list strings, but i can´t find a way to put them right. any ideas? thanks in advance!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
(零)。请重新格式化您的代码
after
for line in file:
执行类似line = line.decode('encoding-of-the-file')
的操作,编码为诸如utf-8
或iso-8859-1
- 您必须知道您的输入编码如果您不知道编码或不关心干净的解码,您可以猜测最可能的编码并执行
line.decode('uft-8', 'ignore')
,省略所有不可解码的字符。此外,您还可以使用'replace'
,它将这些字符替换为“Unicode 替换字符”(\ufffd)仅在内部以及与数据库通信期间使用
unicode
对象,例如u'this is unicode'
(3)。不要使用
file
作为变量名,另请参阅此处:Python UnicodeDecodeError 的最佳实践
(zero). please reformat your code
after
for line in file:
do something likeline = line.decode('encoding-of-the-file')
, with encoding being something likeutf-8
, oriso-8859-1
-- you have to know your input encodingIf you don't know the encoding or not care about having a clean decoding, you can guess the most probable encoding and do a
line.decode('uft-8', 'ignore')
, omitting all characters not decodable. Also, you can use'replace'
, which replaces these chars with the 'Unicode Replacement Character' (\ufffd)use internally and during communication with the database only
unicode
objects, e.g.u'this is unicode'
(3). Don't use
file
as variable namealso look here: Best Practices for Python UnicodeDecodeError