Python：编码问题

发布于 2024-11-09 08:42:06 字数 1688 浏览 0 评论 0原文

我想将数据从一个数据库复制到另一个数据库。因此我为此目的编写了一个Python脚本。

名字是德语，但我认为这不会成为理解我的问题的问题。

该脚本执行以下

db = mysql.connect(db='', charset="utf8", use_unicode=True, **v.MySQLServer[server]);
...
cursor = db.cursor();

cursor.execute('select * from %s.%s where %s = %d;' % (eingangsDatenbankName, tabelle, syncFeldname, v.NEU))
daten = cursor.fetchall()

for zeile in daten:
    sql = 'select * from %s.%s where ' % (hauptdatenbankName, tabelle)
    ...
    for i in xrange(len(spalten)):
        sql += " %s, " % db_util.formatierFeld(unicode(str(zeile[i]), "utf-8"), feldTypen[i])

操作方法“db_util.formatierFeld”看起来像这样

def formatierFeld(inhalt, feldTyp):

    if inhalt.lower() == "none":
        return "NULL"    #Stringtypen
    if "char" in feldTyp.lower() or "text" in feldTyp.lower() or "blob" in feldTyp.lower() or "date".lower() in feldTyp.lower() or "time" in feldTyp.lower():
        return '"%s"' % inhalt 
    else:
        return '%s' % inhalt

好吧，对于你们中的一些人来说，这些东西看起来很奇怪，但我可以向你们保证，我必须这样做，所以请不要关于风格等的讨论。

好的，当运行此代码时，当我遇到带有变音符号的单词时，我会收到以下错误消息。

Traceback (most recent call last):
  File "db_import.py", line 222, in <module>
    main()
  File "db_import.py", line 219, in main
    importieren(server, lokaleMaschine, dbEingang, dbHaupt)
  File "db_import.py", line 145, in importieren
    sql += " %s, " %  db_util.formatierFeld(unicode(str(zeile[i]), "utf-8"), feldTypen[i])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 1: ordinal not in range(128)

实际上我不明白为什么这个字符串不能以这种方式构建。我认为这应该有效，因为我明确告诉程序在这里使用 unicode。

有人猜测这里出了什么问题吗？

原文

I want to copy data from one database to another database. Therefore I wrote a Python script for this purpose.

Names are in german, but I don't think that will be a problem for understanding my question.

The script does the following

db = mysql.connect(db='', charset="utf8", use_unicode=True, **v.MySQLServer[server]);
...
cursor = db.cursor();

cursor.execute('select * from %s.%s where %s = %d;' % (eingangsDatenbankName, tabelle, syncFeldname, v.NEU))
daten = cursor.fetchall()

for zeile in daten:
    sql = 'select * from %s.%s where ' % (hauptdatenbankName, tabelle)
    ...
    for i in xrange(len(spalten)):
        sql += " %s, " % db_util.formatierFeld(unicode(str(zeile[i]), "utf-8"), feldTypen[i])

The method "db_util.formatierFeld" looks like this

def formatierFeld(inhalt, feldTyp):

    if inhalt.lower() == "none":
        return "NULL"    #Stringtypen
    if "char" in feldTyp.lower() or "text" in feldTyp.lower() or "blob" in feldTyp.lower() or "date".lower() in feldTyp.lower() or "time" in feldTyp.lower():
        return '"%s"' % inhalt 
    else:
        return '%s' % inhalt

Well, to some of you this stuff will seem quite odd, but I can asure you I MUST do it this way, so please no discussion about style etc.

Okay, when running this code I get the following error message when I run into words with umlauts.

Traceback (most recent call last):
  File "db_import.py", line 222, in <module>
    main()
  File "db_import.py", line 219, in main
    importieren(server, lokaleMaschine, dbEingang, dbHaupt)
  File "db_import.py", line 145, in importieren
    sql += " %s, " %  db_util.formatierFeld(unicode(str(zeile[i]), "utf-8"), feldTypen[i])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 1: ordinal not in range(128)

Actually I do not understand why this string can't be build that way. I my opinion this should work since I explicitly tell the program to use unicode here.

Anybody has a guess what is going wrong here?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

橪书 2024-11-16 08:42:06

由于表达式的深层嵌套，该错误变得更加难以解释。

？

sql += " %s, " % db_util.formatierFeld(unicode(str(zeile[i]), "utf-8"), feldTypen[i])

异常从何而来很难说。但是，我认为它来自 str(zeile[i])。如果 zeile[i] 是包含非 ASCII 字符的 unicode，则您无法使用 str 将其转换为字节字符串。相反，您必须使用可以表示它包含的所有字符的编解码器将其编码为字节字符串。

然而...

unicode(str(zeile[i]), "utf-8")

如果 zeile[i] 是一个 unicode 字符串，那么这是毫无意义的。首先尝试将其编码为字节字符串，然后尝试将其解码回 unicode 字符串。您可以跳过所有这些，只执行 zeile[i]。 formatierFeld 其实并不重要，因为执行永远不会走那么远。

The error is made more difficult to interpret by the deep nesting of expressions you have.

In the line

sql += " %s, " % db_util.formatierFeld(unicode(str(zeile[i]), "utf-8"), feldTypen[i])

where does the exception come from? It's difficult to say. However, I would suppose that it comes from str(zeile[i]). If zeile[i] is unicode containing non-ASCII characters, then you cannot convert it to a byte string using str. Instead, you must encode it to a byte string using a codec which can represent all of the characters it contains.

However...

unicode(str(zeile[i]), "utf-8")

This is pointless, if zeile[i] is a unicode string. First you try to encode it to a byte string, then you try to decode it back into a unicode string. You could skip all that and just do zeile[i]. formatierFeld doesn't really matter because execution never gets that far.

回复收藏 0 原文

~没有更多了~