测试 wchar_t* 的可转换字符
我正在与一个将字符串作为 wchar_t 数组处理的库进行交流。我需要将它们转换为 char 数组,以便我可以将它们交给 Python(使用 SWIG 和 Python 的 PyString_FromString 函数)。显然并非所有宽字符都可以转换为字符。根据 wcstombs 的文档,我应该能够做一些事情,比如
wcstombs(NULL, wideString, wcslen(wideString))
测试字符串中是否有不可转换的字符——如果有的话,它应该返回 -1。然而,在我的测试用例中它总是返回-1。这是我的测试函数:
void getString(wchar_t* target, int size) {
int i;
for(i = 0; i < size; ++i) {
target[i] = L'a' + i;
}
printf("Generated %d characters, nominal length %d, compare %d\n", size,
wcslen(target), wcstombs(NULL, target, size));
}
这会生成这样的输出:
Generated 32 characters, nominal length 39, compare -1
Generated 16 characters, nominal length 20, compare -1
Generated 4 characters, nominal length 6, compare -1
知道我做错了什么吗?
与此相关的是,如果您知道一种直接从 wchar_t*s 转换为 Python unicode 字符串的方法,那就太好了。 :) 谢谢!
I'm working on talking to a library that handles strings as wchar_t arrays. I need to convert these to char arrays so that I can hand them over to Python (using SWIG and Python's PyString_FromString function). Obviously not all wide characters can be converted to chars. According to the documentation for wcstombs, I ought to be able to do something like
wcstombs(NULL, wideString, wcslen(wideString))
to test the string for unconvertable characters -- it's supposed to return -1 if there are any. However, in my test case it's always returning -1. Here's my test function:
void getString(wchar_t* target, int size) {
int i;
for(i = 0; i < size; ++i) {
target[i] = L'a' + i;
}
printf("Generated %d characters, nominal length %d, compare %d\n", size,
wcslen(target), wcstombs(NULL, target, size));
}
This is generating output like this:
Generated 32 characters, nominal length 39, compare -1
Generated 16 characters, nominal length 20, compare -1
Generated 4 characters, nominal length 6, compare -1
Any idea what I'm doing wrong?
On a related note, if you know of a way to convert directly from wchar_t*s to Python unicode strings, that'd be welcome. :) Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
显然,正如您所发现的,以零终止输入数据至关重要。
关于最后一段,我将从 Wide 转换为 UTF8 并调用 PyUnicode_FromString。
请注意,我假设您使用的是 Python 2.x,在 Python 3.x 中可能完全不同。
Clearly, as you found, it's essential to zero-terminate your input data.
Regarding the final paragraph, I would convert from wide to UTF8 and call PyUnicode_FromString.
Note that I am assuming you are using Python 2.x, it's presumably all different in Python 3.x.