可以在Python扩展对象中使用宽字符成员吗?
使用 PyMemberDef
中的 T_STRING
定义,为 Python C 扩展中具有 char *
基类型的对象创建成员非常简单宣言。
为什么似乎没有 wchar_t *
的等效项?如果真的有的话,它是什么?
例如
struct object
包含 char *text
PyMemberDef
数组有 {"text", T_STRING, offsetof(struct object, text), READONLY ,“这是一个普通的字符串。”}
与类似
struct object
包含 wchar_t *wtext
PyMemberDef
数组有 {"wtext", T_WSTRING, offsetof(struct object, wtext), READONLY, "This is a Wide String"}
我知道像 PyUnicode_AsString()
及其相关方法可用于以 UTF-8 对数据进行编码,将其存储在基本字符字符串中,然后进行解码,但这样做需要包装通用的 getattr
和 setattr
包含编码文本的方法/函数,当您希望结构中元素大小固定的字符数组并且不希望可存储在其中的有效字符数发生变化时,它不是很有用。
It's simple to create a member for an object in a Python C extension with a base type of char *
, using the T_STRING
define in the PyMemberDef
declaration.
Why does there not seem to be an equivalent for wchar_t *
? And if there actually is one, what is it?
e.g.
struct object
contains char *text
PyMemberDef
array has {"text", T_STRING, offsetof(struct object, text), READONLY, "This is a normal character string."}
versus something like
struct object
contains wchar_t *wtext
PyMemberDef
array has {"wtext", T_WSTRING, offsetof(struct object, wtext), READONLY, "This is a wide character string"}
I understand that something like PyUnicode_AsString()
and its related methods can be used to encode the data in UTF-8, store it in a basic char string, and decode later, but doing it that way would require wrapping the generic getattr
and setattr
methods/functions with ones that account for the encoded text, and it's not very useful when you want character arrays of fixed element size within a struct and don't want the effective number of characters that can be stored in it to vary.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
直接使用
wchar_t
是不可移植的。相反,Python 将Py_UNICODE
类型 定义为Unicode 字符的存储单元。根据平台的不同,
Py_UNICODE
可以定义为wchar_t
(如果可用)或无符号短整型/整数/长整型,其宽度将根据 Python 的配置方式而变化( UCS2 与 UCS4)以及所使用的体系结构和 C 编译器。您可以在unicodeobject.h
。
对于您的用例,您的对象可以具有一个 Unicode 字符串属性,使用
T_OBJECT
:您可以在对象的
初始值设定项中执行类型检查:如果您需要迭代Unicode 字符串,有一个返回
Py_UNICODE *
的宏:Using a
wchar_t
directly is not portable. Instead, Python defines thePy_UNICODE
type as the storage unit for a Unicode character.Depending on the platform,
Py_UNICODE
may be defined aswchar_t
if available, or an unsigned short/integer/long, the width of which will vary depending on how Python is configured (UCS2 vs UCS4) and the architecture and C compiler used. You can find the relevant definitions inunicodeobject.h
.For your use case, your object can have an attribute that is a Unicode string, using
T_OBJECT
:You can perform type checking in the object's initializer:
If you ever need to iterate over the low-level characters in the Unicode string, there is a macro which returns a
Py_UNICODE *
: