Python3 unicode 字符串与 pyobjc
我正在转换许多使用PyoBJC到Python3的Python2脚本,并且很难让它们工作。这个问题似乎与Python3中的Unicode变化有关。
以下对PyOBJC方法的调用在Python2中起作用:
import Quartz as Quartz
filename = '/path/to/myfile.pdf'
provider = Quartz.CGDataProviderCreateWithFilename(filename)
但是在Python 3中,我得到value eRror:depythonifying'char',获得1
的'str',
这可以通过首先编码字符串来修复:
filenameNonU = filename.encode('utf-8')
provider = Quartz.CGDataProviderCreateWithFilename(filenameNonU)
.. .. 。 depythonifying'char',get'int'get'int'
get:valueerror:使用codec
'raw-unicode-escape' 适用于ASCII范围; do 不是标记带有Unicode Chars的字符串的错误,而只是从该方法返回none
,因此似乎只是获得正确的编解码器的问题。
因此,我的问题是:以与Python2使用的格式相同的格式,我需要做什么,以便PyOBJC方法可以正确处理它们?
Python2返回类似的内容:
A\xcc\x88\xc6\x92\xe2\x88\x82
对于高于128的Unicode字符;当编码UTF-8时,我在Python3中得到相同的结果,除B前缀外。
RAW_UNICODE_ESCAPE给出了A \\ U0308 \\ U0192 \\ U2202
,它是另一种格式。
这个问题的方法将指针用作OBJC的论点并非偶然。但是,Python的好处之一是(到目前为止)可以看不见类型和指针之类的东西。
I'm converting a lot of python2 scripts that use pyobjc to python3, and having trouble getting them to work. The problem seems to relate to the Unicode changes in python3.
The following call to a pyobjc method works in python2:
import Quartz as Quartz
filename = '/path/to/myfile.pdf'
provider = Quartz.CGDataProviderCreateWithFilename(filename)
but in python 3, I get ValueError: depythonifying 'char', got 'str' of 1
This can be fixed by encoding the string first:
filenameNonU = filename.encode('utf-8')
provider = Quartz.CGDataProviderCreateWithFilename(filenameNonU)
... and the script works, unless the string includes 'non-ASCII' characters (e.g. Ä∂∫ß
), in which case, I get: ValueError: depythonifying 'char', got 'int' of wrong magnitude
Using the codec 'raw-unicode-escape'
works for ASCII range; and does not flag an error for strings with Unicode chars, but just returns None
from the method, so it seems like it's just a question of getting the right codec.
So, my question is: what do I need to do to get my strings in the same format as python2 was using, so that the pyobjc method will deal with them correctly?
python2 returns something like:
A\xcc\x88\xc6\x92\xe2\x88\x82
for Unicode characters higher than 128; and I get the same result in python3 when encoded utf-8, except for the b prefix.
raw_unicode_escape gives something like A\\u0308\\u0192\\u2202
, which is a different format.
It's no coincidence that the methods with this problem use pointers as their arguments in ObjC. But one of the benefits of python is that it (up to now) handles things like types and pointers invisibly.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我已经与PyoBJC的维护者Ronald Oussoren取得了联系,他确认有一个错误,导致了255上以上的角色的问题。
现在已经在PyoBJC 8.5中固定了这一点。
为了避免疑问,通过参数为
utf8
,对字符串的正确编码。I've got in touch with Ronald Oussoren, the maintainer of pyObjC, and he's confirmed there's a bug causing the problem with characters above 255.
This has now been fixed in pyobjc 8.5.
For the avoidance of doubt, the correct encoding for strings passed as arguments should be
utf8
.