cx_Oracle - 将查询结果编码为 Raw
编辑:
以下打印显示了我的预期值。
(sys.stdout.encoding 和 sys.stdin.encoding 都是“UTF-8”)。
为什么变量值与其打印值不同?我需要将原始值放入变量中。
>>username = 'Jo\xc3\xa3o'
>>username.decode('utf-8').encode('latin-1')
'Jo\xe3o'
>>print username.decode('utf-8').encode('latin-1')
João
原始问题:
我在查询 BD 并将值解码为 Python 时遇到问题。
(返回相同的值)确认了我的数据库 NLS_LANG,
select property_value from database_properties where property_name='NLS_CHARACTERSET';
'''AL32UTF8 stores characters beyond U+FFFF as four bytes (exactly as Unicode defines
UTF-8). Oracle’s “UTF8” stores these characters as a sequence of two UTF-16 surrogate
characters encoded using UTF-8 (or six bytes per character)'''
os.environ["NLS_LANG"] = ".AL32UTF8"
....
conn_data = str('%s/%s@%s') % (db_usr, db_pwd, db_sid)
sql = "select user_name apex.users where user_id = '%s'" % userid
...
cursor.execute(sql)
ldap_username = cursor.fetchone()
...
位置
print ldap_username
>>'Jo\xc3\xa3o'
我使用我都尝试过的
ldap_username.decode('utf-8')
>>u'Jo\xe3o'
unicode(ldap_username, 'utf-8')
>>u'Jo\xe3o'
如何
u'João'.encode('utf-8')
>>'Jo\xc3\xa3o'
将查询结果返回到正确的“João”?
EDIT:
The following print shows my intended value.
(both sys.stdout.encoding and sys.stdin.encoding are 'UTF-8').
Why is the variable value different than its print value? I need to get the raw value into a variable.
>>username = 'Jo\xc3\xa3o'
>>username.decode('utf-8').encode('latin-1')
'Jo\xe3o'
>>print username.decode('utf-8').encode('latin-1')
João
Original question:
I'm having an issue querying a BD and decoding the values into Python.
I confirmed my DB NLS_LANG using
select property_value from database_properties where property_name='NLS_CHARACTERSET';
'''AL32UTF8 stores characters beyond U+FFFF as four bytes (exactly as Unicode defines
UTF-8). Oracle’s “UTF8” stores these characters as a sequence of two UTF-16 surrogate
characters encoded using UTF-8 (or six bytes per character)'''
os.environ["NLS_LANG"] = ".AL32UTF8"
....
conn_data = str('%s/%s@%s') % (db_usr, db_pwd, db_sid)
sql = "select user_name apex.users where user_id = '%s'" % userid
...
cursor.execute(sql)
ldap_username = cursor.fetchone()
...
where
print ldap_username
>>'Jo\xc3\xa3o'
I've both tried (which return the same)
ldap_username.decode('utf-8')
>>u'Jo\xe3o'
unicode(ldap_username, 'utf-8')
>>u'Jo\xe3o'
where
u'João'.encode('utf-8')
>>'Jo\xc3\xa3o'
how to get the queries result back to the proper 'João' ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我想,你已经有了正确的“João”。
>> 之间的区别'Jo\xc3\xa3o'
和>>>>> print 'Jo\xc3\xa3o'
是前者在对象上调用repr
,而后者调用str
(或者可能是unicode
代码>,在你的情况下)。这就是字符串的表示方式。一些例子可能会让这一点更清楚:
注意第二个和第三个结果是如何相同的。原始的
ldap_username
当前是一个ASCII字符串。您可以在 Python 提示符上看到这一点:当它显示 ACSII 对象时,它显示为'ASCII string'
,而 Unicode 对象显示为u'Unicode string'
-- 键是前导u
。因此,由于您的
ldap_username
读作'Jo\xc3\xa3o'
,并且是一个 ASCII 字符串,因此以下内容适用:总结:您需要确定string(不确定时使用
type
),并基于此,解码为 Unicode,或编码为 ASCII。You already have the proper 'João', methinks. The difference between
>>> 'Jo\xc3\xa3o'
and>>> print 'Jo\xc3\xa3o'
is that the former callsrepr
on the object, while the latter callsstr
(or probablyunicode
, in your case). It's just how the string is represented.Some examples might make this more clear:
Notice how the second and third result are identical. The original
ldap_username
currently is an ASCII string. You can see this on the Python prompt: when it is displaying an ACSII object, it shows as'ASCII string'
, while Unicode objects are shown asu'Unicode string'
-- the key being the leadingu
.So, as your
ldap_username
reads as'Jo\xc3\xa3o'
, and is an ASCII string, the following applies:Summed up: you need to determine the type of the string (use
type
when unsure), and based on that, decode to Unicode, or encode to ASCII.