Python3 unicode 字符串与 pyobjc

发布于 2025-01-20 02:41:15 字数 1076 浏览 4 评论 0原文

我正在转换许多使用PyoBJC到Python3的Python2脚本，并且很难让它们工作。这个问题似乎与Python3中的Unicode变化有关。

以下对PyOBJC方法的调用在Python2中起作用：

import Quartz as Quartz
filename = '/path/to/myfile.pdf'
provider = Quartz.CGDataProviderCreateWithFilename(filename)

但是在Python 3中，我得到value eRror：depythonifying'char'，获得1的'str'，

这可以通过首先编码字符串来修复：

filenameNonU = filename.encode('utf-8')
provider = Quartz.CGDataProviderCreateWithFilename(filenameNonU)

.. .. 。 depythonifying'char'，get'int'get'int'

get：valueerror：使用codec'raw-unicode-escape' 适用于ASCII范围； do 不是标记带有Unicode Chars的字符串的错误，而只是从该方法返回none，因此似乎只是获得正确的编解码器的问题。

因此，我的问题是：以与Python2使用的格式相同的格式，我需要做什么，以便PyOBJC方法可以正确处理它们？

Python2返回类似的内容：

A\xcc\x88\xc6\x92\xe2\x88\x82

对于高于128的Unicode字符；当编码UTF-8时，我在Python3中得到相同的结果，除B前缀外。

RAW_UNICODE_ESCAPE给出了A \\ U0308 \\ U0192 \\ U2202，它是另一种格式。

这个问题的方法将指针用作OBJC的论点并非偶然。但是，Python的好处之一是（到目前为止）可以看不见类型和指针之类的东西。

原文

I'm converting a lot of python2 scripts that use pyobjc to python3, and having trouble getting them to work. The problem seems to relate to the Unicode changes in python3.

The following call to a pyobjc method works in python2:

import Quartz as Quartz
filename = '/path/to/myfile.pdf'
provider = Quartz.CGDataProviderCreateWithFilename(filename)

but in python 3, I get ValueError: depythonifying 'char', got 'str' of 1

This can be fixed by encoding the string first:

filenameNonU = filename.encode('utf-8')
provider = Quartz.CGDataProviderCreateWithFilename(filenameNonU)

... and the script works, unless the string includes 'non-ASCII' characters (e.g. Ä∂∫ß), in which case, I get: ValueError: depythonifying 'char', got 'int' of wrong magnitude

Using the codec 'raw-unicode-escape' works for ASCII range; and does not flag an error for strings with Unicode chars, but just returns None from the method, so it seems like it's just a question of getting the right codec.

So, my question is: what do I need to do to get my strings in the same format as python2 was using, so that the pyobjc method will deal with them correctly?

python2 returns something like:

A\xcc\x88\xc6\x92\xe2\x88\x82

for Unicode characters higher than 128; and I get the same result in python3 when encoded utf-8, except for the b prefix.

raw_unicode_escape gives something like A\\u0308\\u0192\\u2202, which is a different format.

It's no coincidence that the methods with this problem use pointers as their arguments in ObjC. But one of the benefits of python is that it (up to now) handles things like types and pointers invisibly.

分享到QQ

分享到微博