当前位置：文江博客话题详情

可以在Python扩展对象中使用宽字符成员吗？

发布于 2024-11-11 03:34:58 字数 802 浏览 4 评论 0原文

使用 PyMemberDef 中的 T_STRING 定义，为 Python C 扩展中具有 char * 基类型的对象创建成员非常简单宣言。

为什么似乎没有 wchar_t * 的等效项？如果真的有的话，它是什么？

例如

struct object 包含 char *text

PyMemberDef 数组有 {"text", T_STRING, offsetof(struct object, text), READONLY ，“这是一个普通的字符串。”}

与类似

struct object 包含 wchar_t *wtext

PyMemberDef 数组有 {"wtext", T_WSTRING, offsetof(struct object, wtext), READONLY, "This is a Wide String"}

我知道像 PyUnicode_AsString() 及其相关方法可用于以 UTF-8 对数据进行编码，将其存储在基本字符字符串中，然后进行解码，但这样做需要包装通用的 getattr 和 setattr 包含编码文本的方法/函数，当您希望结构中元素大小固定的字符数组并且不希望可存储在其中的有效字符数发生变化时，它不是很有用。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

萤火眠眠 2024-11-18 03:34:58

直接使用 wchar_t 是不可移植的。相反，Python 将 Py_UNICODE 类型定义为Unicode 字符的存储单元。

根据平台的不同，Py_UNICODE 可以定义为 wchar_t（如果可用）或无符号短整型/整数/长整型，其宽度将根据 Python 的配置方式而变化（ UCS2 与 UCS4）以及所使用的体系结构和 C 编译器。您可以在 unicodeobject.h。

对于您的用例，您的对象可以具有一个 Unicode 字符串属性，使用 T_OBJECT：

static struct PyMemberDef attr_members[] = {
  { "wtext", T_OBJECT, offsetof(PyAttrObject, wtext), READONLY, "wide string"}
  ...

您可以在对象的

...
if (!PyUnicode_CheckExact(arg)) {
    PyErr_Format(PyExc_ValueError, "arg must be a unicode string");
    return NULL;
}
Py_INCREF(arg);
self->wtext = arg;
...

初始值设定项中执行类型检查：如果您需要迭代Unicode 字符串，有一个返回 Py_UNICODE * 的宏：

int i = 0;
Py_ssize_t size = PyUnicode_GetSize(self->wtext);
Py_UNICODE *chars = PyUnicode_AS_UNICODE(self->wtext);
for (i = 0; i < size; i++) {
    // use chars[i]
    ...

Using a wchar_t directly is not portable. Instead, Python defines the Py_UNICODE type as the storage unit for a Unicode character.

Depending on the platform, Py_UNICODE may be defined as wchar_t if available, or an unsigned short/integer/long, the width of which will vary depending on how Python is configured (UCS2 vs UCS4) and the architecture and C compiler used. You can find the relevant definitions in unicodeobject.h.

For your use case, your object can have an attribute that is a Unicode string, using T_OBJECT:

static struct PyMemberDef attr_members[] = {
  { "wtext", T_OBJECT, offsetof(PyAttrObject, wtext), READONLY, "wide string"}
  ...

You can perform type checking in the object's initializer:

...
if (!PyUnicode_CheckExact(arg)) {
    PyErr_Format(PyExc_ValueError, "arg must be a unicode string");
    return NULL;
}
Py_INCREF(arg);
self->wtext = arg;
...

If you ever need to iterate over the low-level characters in the Unicode string, there is a macro which returns a Py_UNICODE *:

int i = 0;
Py_ssize_t size = PyUnicode_GetSize(self->wtext);
Py_UNICODE *chars = PyUnicode_AS_UNICODE(self->wtext);
for (i = 0; i < size; i++) {
    // use chars[i]
    ...

回复收藏 0 原文

~没有更多了~