Python:编码问题
我想将数据从一个数据库复制到另一个数据库。因此我为此目的编写了一个Python脚本。
名字是德语,但我认为这不会成为理解我的问题的问题。
该脚本执行以下
db = mysql.connect(db='', charset="utf8", use_unicode=True, **v.MySQLServer[server]);
...
cursor = db.cursor();
cursor.execute('select * from %s.%s where %s = %d;' % (eingangsDatenbankName, tabelle, syncFeldname, v.NEU))
daten = cursor.fetchall()
for zeile in daten:
sql = 'select * from %s.%s where ' % (hauptdatenbankName, tabelle)
...
for i in xrange(len(spalten)):
sql += " %s, " % db_util.formatierFeld(unicode(str(zeile[i]), "utf-8"), feldTypen[i])
操作 方法“db_util.formatierFeld”看起来像这样
def formatierFeld(inhalt, feldTyp):
if inhalt.lower() == "none":
return "NULL" #Stringtypen
if "char" in feldTyp.lower() or "text" in feldTyp.lower() or "blob" in feldTyp.lower() or "date".lower() in feldTyp.lower() or "time" in feldTyp.lower():
return '"%s"' % inhalt
else:
return '%s' % inhalt
好吧,对于你们中的一些人来说,这些东西看起来很奇怪,但我可以向你们保证,我必须这样做,所以请不要关于风格等的讨论。
好的,当运行此代码时,当我遇到带有变音符号的单词时,我会收到以下错误消息。
Traceback (most recent call last):
File "db_import.py", line 222, in <module>
main()
File "db_import.py", line 219, in main
importieren(server, lokaleMaschine, dbEingang, dbHaupt)
File "db_import.py", line 145, in importieren
sql += " %s, " % db_util.formatierFeld(unicode(str(zeile[i]), "utf-8"), feldTypen[i])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 1: ordinal not in range(128)
实际上我不明白为什么这个字符串不能以这种方式构建。我认为这应该有效,因为我明确告诉程序在这里使用 unicode。
有人猜测这里出了什么问题吗?
I want to copy data from one database to another database. Therefore I wrote a Python script for this purpose.
Names are in german, but I don't think that will be a problem for understanding my question.
The script does the following
db = mysql.connect(db='', charset="utf8", use_unicode=True, **v.MySQLServer[server]);
...
cursor = db.cursor();
cursor.execute('select * from %s.%s where %s = %d;' % (eingangsDatenbankName, tabelle, syncFeldname, v.NEU))
daten = cursor.fetchall()
for zeile in daten:
sql = 'select * from %s.%s where ' % (hauptdatenbankName, tabelle)
...
for i in xrange(len(spalten)):
sql += " %s, " % db_util.formatierFeld(unicode(str(zeile[i]), "utf-8"), feldTypen[i])
The method "db_util.formatierFeld" looks like this
def formatierFeld(inhalt, feldTyp):
if inhalt.lower() == "none":
return "NULL" #Stringtypen
if "char" in feldTyp.lower() or "text" in feldTyp.lower() or "blob" in feldTyp.lower() or "date".lower() in feldTyp.lower() or "time" in feldTyp.lower():
return '"%s"' % inhalt
else:
return '%s' % inhalt
Well, to some of you this stuff will seem quite odd, but I can asure you I MUST do it this way, so please no discussion about style etc.
Okay, when running this code I get the following error message when I run into words with umlauts.
Traceback (most recent call last):
File "db_import.py", line 222, in <module>
main()
File "db_import.py", line 219, in main
importieren(server, lokaleMaschine, dbEingang, dbHaupt)
File "db_import.py", line 145, in importieren
sql += " %s, " % db_util.formatierFeld(unicode(str(zeile[i]), "utf-8"), feldTypen[i])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 1: ordinal not in range(128)
Actually I do not understand why this string can't be build that way. I my opinion this should work since I explicitly tell the program to use unicode here.
Anybody has a guess what is going wrong here?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
由于表达式的深层嵌套,该错误变得更加难以解释。
?
异常从何而来 很难说。但是,我认为它来自
str(zeile[i])
。如果zeile[i]
是包含非 ASCII 字符的 unicode,则您无法使用str
将其转换为字节字符串。相反,您必须使用可以表示它包含的所有字符的编解码器将其编码为字节字符串。然而...
如果 zeile[i] 是一个 unicode 字符串,那么这是毫无意义的。首先尝试将其编码为字节字符串,然后尝试将其解码回 unicode 字符串。您可以跳过所有这些,只执行
zeile[i]
。formatierFeld
其实并不重要,因为执行永远不会走那么远。The error is made more difficult to interpret by the deep nesting of expressions you have.
In the line
where does the exception come from? It's difficult to say. However, I would suppose that it comes from
str(zeile[i])
. Ifzeile[i]
is unicode containing non-ASCII characters, then you cannot convert it to a byte string usingstr
. Instead, you must encode it to a byte string using a codec which can represent all of the characters it contains.However...
This is pointless, if
zeile[i]
is a unicode string. First you try to encode it to a byte string, then you try to decode it back into a unicode string. You could skip all that and just dozeile[i]
.formatierFeld
doesn't really matter because execution never gets that far.