使用 iconv 同时保持代码正确性
我目前正在使用 iconv 来转换具有不同编码的文档。
iconv() 函数具有以下原型:
size_t iconv (
iconv_t cd,
const char* * inbuf,
size_t * inbytesleft,
char* * outbuf,
size_t * outbytesleft
);
到目前为止,我只需转换 char*
类型的缓冲区,但我也意识到我可能必须转换 char* 类型的缓冲区代码>wchar_t*。事实上,
iconv
甚至有一个专门用于此类缓冲区的编码名称“wchar_t”
:此编码适应操作系统设置:也就是说,在我的计算机上,它指的是Windows 上为 UCS-2,Linux 上为 UTF-32。
但问题在于:如果我有一个 wchar_t*
缓冲区,我可以将其 reinterpret_cast
到 char*
缓冲区,以便在 < code>iconv,但随后我面临实现定义的行为:我无法确定所有编译器在强制转换方面的行为是否相同。
我应该在这里做什么?
I'm currently using iconv
to convert documents with different encodings.
The iconv()
function has the following prototype:
size_t iconv (
iconv_t cd,
const char* * inbuf,
size_t * inbytesleft,
char* * outbuf,
size_t * outbytesleft
);
So far, I only had to convert buffers of type char*
but I also realized I could have to convert buffers of type wchar_t*
. In fact, iconv
even has a dedicated encoding name "wchar_t"
for such buffers: this encoding adapts to the operating system settings: that is, on my computers, it refers to UCS-2 on Windows and to UTF-32 on Linux.
But here lies the problem: if I have a buffer of wchar_t*
I can reinterpret_cast
it to a buffer of char*
to use it in iconv
, but then I face implementation defined behavior: I cannot be sure that the all compilers will behave the same regarding the cast.
What should I do here ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
reinterpret_cast
是安全的,并且不是实现定义的,至少在任何实际实现上都是如此。该语言明确允许将任何对象重新解释为字符数组,并且获取该字符数组的方式是使用
reinterpret_cast
。reinterpret_cast<char const*>
is safe and not implementation defined, at least not on any real implementations.The language explicitly allows any object to be reinterpreted as an array of characters and the way you get that array of characters is using
reinterpret_cast
.