从远程数据库获取UTF8字符串

发布于 2024-10-09 01:33:42 字数 219 浏览 9 评论 0原文

我的应用程序从远程 MySQL 数据库下载一些数据。问题是 db 将字符串存储为 utf8。但我收到的数据是 ascii 解码的。如何解决这个问题？

代码：

cursor = conn.cursor()
query = """MY QUERY HERE"""
cursor.execute(query)
result = cursor.fetchall()

原文

My application downloads some data from remote MySQL database. Problem is that db stores strings as utf8. But data I receive is ascii decoded. How to get around this ?

The code :

cursor = conn.cursor()
query = """MY QUERY HERE"""
cursor.execute(query)
result = cursor.fetchall()

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

君勿笑 2024-10-16 01:33:42

也许有一个例子——这里我创建了一个unicode字符串“u”，将其编码为utf8，将其从utf8解码回unicode字符串，将其编码为ascii（这会引发异常，因为该字符串中的扩展字符可以不被编码为ascii），然后最后编码为ascii，用“？”替换错误：

Python 2.6.4 (r264:75706, Dec  7 2009, 18:43:55) 
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> u = u'abc\u2020123'
>>> u
u'abc\u2020123'
>>> u.encode('utf8')
'abc\xe2\x80\xa0123'
>>> s = _
>>> s.decode('utf8')
u'abc\u2020123'
>>> u.encode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2020' in position 3: ordinal not in range(128)
>>> u.encode('ascii', 'replace')
'abc?123'
>>>

大概，您从数据库中获取utf8字符串，您应该将它们从utf8解码为unicode字符串，然后可能重新编码它们在输出上用于消耗程序输出的任何内容...通常您需要一个类似以下的模型：

输入数据 - 从输入编码转换为 unicode [string.decode('utf8')]
处理数据 - 仅处理unicode 对象
输出结果 -- 从 unicode 转换为输出编码 [string.encode('utf8')]

这为您提供了编码/解码的清晰分离，并避免将编码处理代码传播到整个应用程序，因为核心只处理 unicode 。

Perhaps an example is in order -- here I create a unicode string "u", encode it as utf8, decode that from utf8 back to a unicode string, encode it as ascii (which throws an exception since the extended character in this string can't be encoded as ascii), then finally encode as ascii replacing errors with the "?":

Python 2.6.4 (r264:75706, Dec  7 2009, 18:43:55) 
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> u = u'abc\u2020123'
>>> u
u'abc\u2020123'
>>> u.encode('utf8')
'abc\xe2\x80\xa0123'
>>> s = _
>>> s.decode('utf8')
u'abc\u2020123'
>>> u.encode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2020' in position 3: ordinal not in range(128)
>>> u.encode('ascii', 'replace')
'abc?123'
>>>

Presumably, you're getting utf8 strings back from the db, you should decode these from utf8 to a unicode string, then probably re-encode them on output for whatever is consuming the output of your program... Typically you want a model something like:

Input data -- transform from input encoding to unicode [string.decode('utf8')]
Process data -- dealing only with unicode objects
Output result -- transform from unicode to output encoding [string.encode('utf8')]

This gives you a clean separation of encoding/decoding and avoids spreading encoding-handling code all over your application since the core only deals with unicode.

回复收藏 0 原文