通过 python-ldap 使用 Active Directory 中的 unicode 编码字符串
我已经提出了这个问题,但经过一些测试后,我决定创建一个包含一些更具体信息的新问题:
我正在使用 python-ldap (和 Python 2.7)从我们的 Active Directory 读取用户帐户。这确实有效,但我对特殊字符有问题。当打印在控制台上时,它们确实看起来像 UTF-8 编码的字符串。目标是将它们写入 MySQL 数据库,但我并没有从一开始就将这些字符串转换为正确的 UTF-8。
示例(fullentries 是我的包含所有 AD 条目的数组):
fullentries[23][1].decode('utf-8', 'ignore')
print fullentries[23][1].encode('utf-8', 'ignore')
print fullentries[23][1].encode('latin1', 'ignore')
print repr(fullentries[23][1])
第二个测试手动插入字符串,如下所示:
testentry = "M\xc3\xbcller"
testentry.decode('utf-8', 'ignore')
print testentry.encode('utf-8', 'ignore')
print testentry.encode('latin1', 'ignore')
print repr(testentry)
第一个示例的输出是:
M\xc3\xbcller
M\xc3\xbcller
u'M\\xc3\\xbcller'
编辑:如果我尝试用 .replace('\\ \\','\\) 输出保持不变。
第二个例子的输出:
Müller
M�ller
'M\xc3\xbcller'
有没有办法让AD输出正确编码?我已经阅读了很多文档,但它们都指出 LDAPv3 为您提供严格的 UTF-8 编码字符串。 Active Directory 使用 LDAPv3。
我的旧问题这个主题在这里: Writing UTF-8 String to MySQL with Python
编辑:添加了 repr(s) 信息
I already came up with this problem, but after some testing I decided to create a new question with some more specific Infos:
I am reading user accounts with python-ldap (and Python 2.7) from our Active Directory. This does work well, but I have problems with special chars. They do look like UTF-8 encoded strings when printed on the console. The goal is to write them into a MySQL DB, but I don't get those strings into proper UTF-8 from the beginning.
Example (fullentries is my array with all the AD entries):
fullentries[23][1].decode('utf-8', 'ignore')
print fullentries[23][1].encode('utf-8', 'ignore')
print fullentries[23][1].encode('latin1', 'ignore')
print repr(fullentries[23][1])
A second test with a string inserted by hand as follows:
testentry = "M\xc3\xbcller"
testentry.decode('utf-8', 'ignore')
print testentry.encode('utf-8', 'ignore')
print testentry.encode('latin1', 'ignore')
print repr(testentry)
The output of the first example ist:
M\xc3\xbcller
M\xc3\xbcller
u'M\\xc3\\xbcller'
Edit: If I try to replace the double backslashes with .replace('\\\\','\\) the output remains the same.
The output of the second example:
Müller
M�ller
'M\xc3\xbcller'
Is there any way to get the AD output properly encoded? I already read a lot of documentation, but it all states that LDAPv3 gives you strictly UTF-8 encoded strings. Active Directory uses LDAPv3.
My older question this topic is here: Writing UTF-8 String to MySQL with Python
Edit: Added repr(s) infos
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先,要知道
打印
到Windows控制台通常是导致数据混乱的步骤,因此对于您的测试,您应该打印repr(s)
来查看您的精确字节有在你的字符串中。您需要了解 AD 中的数据是如何编码的。同样,
print repr(s)
会让您看到数据的内容。更新:
好吧,看起来你不知何故得到了奇怪的字符串。可能有一种方法可以让它们变得更好,但您可以在任何情况下进行调整,尽管这并不完美:
您可能想研究是否可以以更自然的格式获取数据。
First, know that
print
ing to a Windows console is often the step that garbles data, so for your tests, you shouldprint repr(s)
to see the precise bytes you have in your string.You need to find out how the data from AD is encoded. Again,
print repr(s)
will let you see the content of the data.UPDATED:
OK, it looks like you're getting strange strings somehow. There might be a way to get them better, but you can adapt in any case, though it isn't pretty:
You might want to look into whether you can get the data in a more natural format.