可以在Python扩展对象中使用宽字符成员吗?

发布于 2024-11-11 03:34:58 字数 802 浏览 4 评论 0原文

使用 PyMemberDef 中的 T_STRING 定义,为 Python C 扩展中具有 char * 基类型的对象创建成员非常简单宣言。

为什么似乎没有 wchar_t * 的等效项?如果真的有的话,它是什么?

例如

struct object 包含 char *text

PyMemberDef 数组有 {"text", T_STRING, offsetof(struct object, text), READONLY ,“这是一个普通的字符串。”}

与类似

struct object 包含 wchar_t *wtext

PyMemberDef 数组有 {"wtext", T_WSTRING, offsetof(struct object, wtext), READONLY, "This is a Wide String"}

我知道像 PyUnicode_AsString() 及其相关方法可用于以 UTF-8 对数据进行编码,将其存储在基本字符字符串中,然后进行解码,但这样做需要包装通用的 getattrsetattr 包含编码文本的方法/函数,当您希望结构中元素大小固定的字符数组并且不希望可存储在其中的有效字符数发生变化时,它不是很有用。

It's simple to create a member for an object in a Python C extension with a base type of char *, using the T_STRING define in the PyMemberDef declaration.

Why does there not seem to be an equivalent for wchar_t *? And if there actually is one, what is it?

e.g.

struct object contains char *text

PyMemberDef array has {"text", T_STRING, offsetof(struct object, text), READONLY, "This is a normal character string."}

versus something like

struct object contains wchar_t *wtext

PyMemberDef array has {"wtext", T_WSTRING, offsetof(struct object, wtext), READONLY, "This is a wide character string"}

I understand that something like PyUnicode_AsString() and its related methods can be used to encode the data in UTF-8, store it in a basic char string, and decode later, but doing it that way would require wrapping the generic getattr and setattr methods/functions with ones that account for the encoded text, and it's not very useful when you want character arrays of fixed element size within a struct and don't want the effective number of characters that can be stored in it to vary.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

萤火眠眠 2024-11-18 03:34:58

直接使用 wchar_t 是不可移植的。相反,Python 将 Py_UNICODE 类型 定义为Unicode 字符的存储单元。

根据平台的不同,Py_UNICODE 可以定义为 wchar_t(如果可用)或无符号短整型/整数/长整型,其宽度将根据 Python 的配置方式而变化( UCS2 与 UCS4)以及所使用的体系结构和 C 编译器。您可以在 unicodeobject.h

对于您的用例,您的对象可以具有一个 Unicode 字符串属性,使用 T_OBJECT

static struct PyMemberDef attr_members[] = {
  { "wtext", T_OBJECT, offsetof(PyAttrObject, wtext), READONLY, "wide string"}
  ...

您可以在对象的

...
if (!PyUnicode_CheckExact(arg)) {
    PyErr_Format(PyExc_ValueError, "arg must be a unicode string");
    return NULL;
}
Py_INCREF(arg);
self->wtext = arg;
...

初始值设定项中执行类型检查:如果您需要迭代Unicode 字符串,有一个返回 Py_UNICODE * 的宏:

int i = 0;
Py_ssize_t size = PyUnicode_GetSize(self->wtext);
Py_UNICODE *chars = PyUnicode_AS_UNICODE(self->wtext);
for (i = 0; i < size; i++) {
    // use chars[i]
    ...

Using a wchar_t directly is not portable. Instead, Python defines the Py_UNICODE type as the storage unit for a Unicode character.

Depending on the platform, Py_UNICODE may be defined as wchar_t if available, or an unsigned short/integer/long, the width of which will vary depending on how Python is configured (UCS2 vs UCS4) and the architecture and C compiler used. You can find the relevant definitions in unicodeobject.h.

For your use case, your object can have an attribute that is a Unicode string, using T_OBJECT:

static struct PyMemberDef attr_members[] = {
  { "wtext", T_OBJECT, offsetof(PyAttrObject, wtext), READONLY, "wide string"}
  ...

You can perform type checking in the object's initializer:

...
if (!PyUnicode_CheckExact(arg)) {
    PyErr_Format(PyExc_ValueError, "arg must be a unicode string");
    return NULL;
}
Py_INCREF(arg);
self->wtext = arg;
...

If you ever need to iterate over the low-level characters in the Unicode string, there is a macro which returns a Py_UNICODE *:

int i = 0;
Py_ssize_t size = PyUnicode_GetSize(self->wtext);
Py_UNICODE *chars = PyUnicode_AS_UNICODE(self->wtext);
for (i = 0; i < size; i++) {
    // use chars[i]
    ...
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文