如何在Python中正确比较来自psycopg2的unicode字符串?
我在比较从 PostgreSQL 数据库获取的 UTF-8 字符串时遇到问题:
>>> db_conn = psycopg2.connect("dbname='foo' user='foo' host='localhost' password='xxx'")
>>> db_cursor = db_conn.cursor()
>>> sql_com = ("""SELECT my_text FROM table WHERE id = 1""")
>>> db_cursor.execute(sql_com)
>>> sql_result = db_cursor.fetchone()
>>> db_conn.commit()
>>> db_conn.close()
>>> a = sql_result[0]
>>> a
u'M\xfcnchen'
>>> type(a)
<type 'unicode'>
>>> print a
München
>>> b = u'München'
>>> type(b)
<type 'unicode'>
>>> print b
München
>>> a == b
False
我真的很困惑为什么会这样,我可以有人告诉我应该如何将数据库中带有变音符号的字符串与另一个字符串进行比较,所以比较是真的?我的数据库是UTF8:
postgres@localhost:$ psql -l
List of databases
Name | Owner | Encoding
-----------+----------+----------
foo | foo | UTF8
I have a problem with comparing a UTF-8 string obtained from PostgreSQL database:
>>> db_conn = psycopg2.connect("dbname='foo' user='foo' host='localhost' password='xxx'")
>>> db_cursor = db_conn.cursor()
>>> sql_com = ("""SELECT my_text FROM table WHERE id = 1""")
>>> db_cursor.execute(sql_com)
>>> sql_result = db_cursor.fetchone()
>>> db_conn.commit()
>>> db_conn.close()
>>> a = sql_result[0]
>>> a
u'M\xfcnchen'
>>> type(a)
<type 'unicode'>
>>> print a
München
>>> b = u'München'
>>> type(b)
<type 'unicode'>
>>> print b
München
>>> a == b
False
I am really confused why is this so, I can someone tell me how should I compare a string with an Umlaut from the database to another string, so the comparison is true? My database is UTF8:
postgres@localhost:$ psql -l
List of databases
Name | Owner | Encoding
-----------+----------+----------
foo | foo | UTF8
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这显然是控制台区域设置的问题。
u"München"
在 Unicode 中为u'M\xfcnchen'
,在 UTF-8 中为'M\xc3\xbcnchen'
。如果采用 ISO8859-1 或 CP1252,后者就是您的München
。Psycopg2 似乎为您提供了正确的 Unicode 值,正如它应该的那样。
This is clearly a problem with locale of your console.
u"München"
isu'M\xfcnchen'
in Unicode and'M\xc3\xbcnchen'
in UTF-8. That latter is yourMünchen
if taken as ISO8859-1 or CP1252.Psycopg2 seems to supply you with correct Unicode values, as it should.
如果你输入
type(b) ?? 你会得到什么?
也许您不需要将字符串逐字转换为 unicode 文本,因为 Python 会自动记录这一点。
编辑:我从我的 python CLI 中得到这个:
当您以不同的编码获取打印结果时
If you type
What do you get from type(b) ??
Maybe you don't need to literally transform the string into unicode text as Python will automatically note this.
EDIT: I get this from my python CLI:
While you are gettin' your print result in a different encoding