使用 iconv 同时保持代码正确性

发布于 2024-12-02 22:55:59 字数 634 浏览 6 评论 0原文

我目前正在使用 iconv 来转换具有不同编码的文档。

iconv() 函数具有以下原型：

size_t iconv (
  iconv_t cd,
  const char* * inbuf,
  size_t * inbytesleft,
  char* * outbuf,
  size_t * outbytesleft
);

到目前为止，我只需转换 char* 类型的缓冲区，但我也意识到我可能必须转换 char* 类型的缓冲区代码>wchar_t*。事实上，iconv甚至有一个专门用于此类缓冲区的编码名称“wchar_t”：此编码适应操作系统设置：也就是说，在我的计算机上，它指的是Windows 上为 UCS-2，Linux 上为 UTF-32。

但问题在于：如果我有一个 wchar_t* 缓冲区，我可以将其 reinterpret_cast 到 char* 缓冲区，以便在 < code>iconv，但随后我面临实现定义的行为：我无法确定所有编译器在强制转换方面的行为是否相同。

我应该在这里做什么？

原文

I'm currently using iconv to convert documents with different encodings.

The iconv() function has the following prototype:

size_t iconv (
  iconv_t cd,
  const char* * inbuf,
  size_t * inbytesleft,
  char* * outbuf,
  size_t * outbytesleft
);

So far, I only had to convert buffers of type char* but I also realized I could have to convert buffers of type wchar_t*. In fact, iconv even has a dedicated encoding name "wchar_t" for such buffers: this encoding adapts to the operating system settings: that is, on my computers, it refers to UCS-2 on Windows and to UTF-32 on Linux.

But here lies the problem: if I have a buffer of wchar_t* I can reinterpret_cast it to a buffer of char* to use it in iconv, but then I face implementation defined behavior: I cannot be sure that the all compilers will behave the same regarding the cast.

What should I do here ?

分享到QQ

分享到微博