cx_Oracle - 将查询结果编码为 Raw

发布于 2024-12-11 17:30:57 字数 1347 浏览 4 评论 0原文

编辑：

以下打印显示了我的预期值。

（sys.stdout.encoding 和 sys.stdin.encoding 都是“UTF-8”）。

为什么变量值与其打印值不同？我需要将原始值放入变量中。

>>username = 'Jo\xc3\xa3o'
>>username.decode('utf-8').encode('latin-1')
'Jo\xe3o'
>>print username.decode('utf-8').encode('latin-1')
João

原始问题：

我在查询 BD 并将值解码为 Python 时遇到问题。

（返回相同的值）确认了我的数据库 NLS_LANG，

select property_value from database_properties where property_name='NLS_CHARACTERSET';

'''AL32UTF8 stores characters beyond U+FFFF as four bytes (exactly as Unicode defines 
UTF-8). Oracle’s “UTF8” stores these characters as a sequence of two UTF-16 surrogate
characters encoded using UTF-8 (or six bytes per character)'''

os.environ["NLS_LANG"] = ".AL32UTF8"

....
conn_data = str('%s/%s@%s') % (db_usr, db_pwd, db_sid)

sql = "select user_name apex.users where user_id = '%s'" % userid

...

cursor.execute(sql)
ldap_username = cursor.fetchone()
...

位置

print ldap_username
>>'Jo\xc3\xa3o'

我使用我都尝试过的

ldap_username.decode('utf-8')
>>u'Jo\xe3o'
unicode(ldap_username, 'utf-8')
>>u'Jo\xe3o'

如何

u'João'.encode('utf-8')
>>'Jo\xc3\xa3o'

将查询结果返回到正确的“João”？

原文

EDIT:

The following print shows my intended value.

(both sys.stdout.encoding and sys.stdin.encoding are 'UTF-8').

Why is the variable value different than its print value? I need to get the raw value into a variable.

>>username = 'Jo\xc3\xa3o'
>>username.decode('utf-8').encode('latin-1')
'Jo\xe3o'
>>print username.decode('utf-8').encode('latin-1')
João

Original question:

I'm having an issue querying a BD and decoding the values into Python.

I confirmed my DB NLS_LANG using

select property_value from database_properties where property_name='NLS_CHARACTERSET';

'''AL32UTF8 stores characters beyond U+FFFF as four bytes (exactly as Unicode defines 
UTF-8). Oracle’s “UTF8” stores these characters as a sequence of two UTF-16 surrogate
characters encoded using UTF-8 (or six bytes per character)'''

os.environ["NLS_LANG"] = ".AL32UTF8"

....
conn_data = str('%s/%s@%s') % (db_usr, db_pwd, db_sid)

sql = "select user_name apex.users where user_id = '%s'" % userid

...

cursor.execute(sql)
ldap_username = cursor.fetchone()
...

where

print ldap_username
>>'Jo\xc3\xa3o'

I've both tried (which return the same)

ldap_username.decode('utf-8')
>>u'Jo\xe3o'
unicode(ldap_username, 'utf-8')
>>u'Jo\xe3o'

where

u'João'.encode('utf-8')
>>'Jo\xc3\xa3o'

how to get the queries result back to the proper 'João' ?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夜夜流光相皎洁 2024-12-18 17:30:57

我想，你已经有了正确的“João”。 >> 之间的区别'Jo\xc3\xa3o' 和 >>>>> print 'Jo\xc3\xa3o' 是前者在对象上调用 repr ，而后者调用 str （或者可能是 unicode代码>，在你的情况下）。这就是字符串的表示方式。

一些例子可能会让这一点更清楚：

>>> print 'Jo\xc3\xa3o'.decode('utf-8')
João
>>> 'Jo\xc3\xa3o'.decode('utf-8')
u'Jo\xe3o'
>>> print repr('Jo\xc3\xa3o'.decode('utf-8'))
u'Jo\xe3o'

注意第二个和第三个结果是如何相同的。原始的ldap_username当前是一个ASCII字符串。您可以在 Python 提示符上看到这一点：当它显示 ACSII 对象时，它显示为 'ASCII string'，而 Unicode 对象显示为 u'Unicode string' -- 键是前导u。

因此，由于您的 ldap_username 读作 'Jo\xc3\xa3o'，并且是一个 ASCII 字符串，因此以下内容适用：

>>> 'Jo\xc3\xa3o'.decode('utf-8')
u'Jo\xe3o'
>>> print 'Jo\xc3\xa3o'.decode('utf-8') # To Unicode...
João
>>> u'João'.encode('utf-8')             # ... back to ASCII
'Jo\xc3\xa3o'

总结：您需要确定string（不确定时使用type），并基于此，解码为 Unicode，或编码为 ASCII。

You already have the proper 'João', methinks. The difference between >>> 'Jo\xc3\xa3o' and >>> print 'Jo\xc3\xa3o' is that the former calls repr on the object, while the latter calls str (or probably unicode, in your case). It's just how the string is represented.

Some examples might make this more clear:

>>> print 'Jo\xc3\xa3o'.decode('utf-8')
João
>>> 'Jo\xc3\xa3o'.decode('utf-8')
u'Jo\xe3o'
>>> print repr('Jo\xc3\xa3o'.decode('utf-8'))
u'Jo\xe3o'

Notice how the second and third result are identical. The original ldap_username currently is an ASCII string. You can see this on the Python prompt: when it is displaying an ACSII object, it shows as 'ASCII string', while Unicode objects are shown as u'Unicode string' -- the key being the leading u.

So, as your ldap_username reads as 'Jo\xc3\xa3o', and is an ASCII string, the following applies:

>>> 'Jo\xc3\xa3o'.decode('utf-8')
u'Jo\xe3o'
>>> print 'Jo\xc3\xa3o'.decode('utf-8') # To Unicode...
João
>>> u'João'.encode('utf-8')             # ... back to ASCII
'Jo\xc3\xa3o'

Summed up: you need to determine the type of the string (use type when unsure), and based on that, decode to Unicode, or encode to ASCII.

回复收藏 0 原文

~没有更多了~