iconv编码转换问题

发布于 2024-08-19 18:41:03 字数 1171 浏览 9 评论 0原文

我在将字符串从 utf8 转换为 gb2312 时遇到问题。我的转换函数如下

void convert(const char *from_charset,const char *to_charset, char *inptr, char *outptr)
{
    size_t inleft = strlen(inptr);
    size_t outleft = inleft;
    iconv_t cd;     /* conversion descriptor */

    if ((cd = iconv_open(to_charset, from_charset)) == (iconv_t)(-1)) 
    {
            fprintf(stderr, "Cannot open converter from %s to %s\n", from_charset, to_charset);
            exit(8);
    }

    /* return code of iconv() */
    int rc = iconv(cd, &inptr, &inleft, &outptr, &outleft);
    if (rc == -1) 
    {
            fprintf(stderr, "Error in converting characters\n");

            if(errno == E2BIG)
                    printf("errno == E2BIG\n");
            if(errno == EILSEQ)
                    printf("errno == EILSEQ\n");
            if(errno == EINVAL)
                    printf("errno == EINVAL\n");

            iconv_close(cd);
            exit(8);
    }
    iconv_close(cd);
}

这是我如何使用它的示例:

int len = 1000;
char *result = new char[len];
convert("UTF-8", "GB2312", some_string, result);

编辑:我大多数时候都会遇到 E2BIG 错误。

I am having trouble converting strings from utf8 to gb2312. My convert function is below

void convert(const char *from_charset,const char *to_charset, char *inptr, char *outptr)
{
    size_t inleft = strlen(inptr);
    size_t outleft = inleft;
    iconv_t cd;     /* conversion descriptor */

    if ((cd = iconv_open(to_charset, from_charset)) == (iconv_t)(-1)) 
    {
            fprintf(stderr, "Cannot open converter from %s to %s\n", from_charset, to_charset);
            exit(8);
    }

    /* return code of iconv() */
    int rc = iconv(cd, &inptr, &inleft, &outptr, &outleft);
    if (rc == -1) 
    {
            fprintf(stderr, "Error in converting characters\n");

            if(errno == E2BIG)
                    printf("errno == E2BIG\n");
            if(errno == EILSEQ)
                    printf("errno == EILSEQ\n");
            if(errno == EINVAL)
                    printf("errno == EINVAL\n");

            iconv_close(cd);
            exit(8);
    }
    iconv_close(cd);
}

This is an example of how I used it:

int len = 1000;
char *result = new char[len];
convert("UTF-8", "GB2312", some_string, result);

edit: I most of the time get a E2BIG error.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

叹梦 2024-08-26 18:41:03

outleft 应该是输出缓冲区的大小(例如 1000 字节),而不是传入字符串的大小。

转换时,字符串长度通常会在过程中发生变化,直到转换完成后才能知道它会持续多长。 E2BIG意味着输出缓冲区不够大,在这种情况下您需要给它更多的输出缓冲区空间(注意它已经转换了一些数据并相应地调整了传递给它的四个变量)。

outleft should be the size of the output buffer (e.g. 1000 bytes), not the size of the incoming string.

When converting, the string length usually changes in the process and you cannot know how long it is going to be until afterwards. E2BIG means that the output buffer wasn't large enough, in which case you need to give it more output buffer space (notice that it has already converted some of the data and adjusted the four variables passed to it accordingly).

迷雾森÷林ヴ 2024-08-26 18:41:03

正如其他人所指出的,E2BIG 意味着输出缓冲区对于转换来说不够大,并且您使用了错误的 outleft 值。

但我也注意到你的功能还存在一些其他可能的问题。也就是说,根据函数的工作方式,调用者无法知道输出字符串中有多少字节。您的convert()函数既不会终止输出缓冲区,也没有办法告诉调用者它写入outptr的字节数。

如果您想处理以 null 结尾的字符串(这似乎就是您想要做的,因为您的输入字符串是以 null 结尾的),您可能会发现以下方法要好得多:


char *
convert (const char *from_charset, const char *to_charset, const char *input)
{
 size_t inleft, outleft, converted = 0;
 char *output, *outbuf, *tmp;
 const char *inbuf;
 size_t outlen;
 iconv_t cd;

 if ((cd = iconv_open (to_charset, from_charset)) == (iconv_t) -1)
  return NULL;

 inleft = strlen (input);
 inbuf = input;

 /* we'll start off allocating an output buffer which is the same size
  * as our input buffer. */
 outlen = inleft;

 /* we allocate 4 bytes more than what we need for nul-termination... */
 if (!(output = malloc (outlen + 4))) {
  iconv_close (cd);
  return NULL;
 }

 do {
  errno = 0;
  outbuf = output + converted;
  outleft = outlen - converted;

  converted = iconv (cd, (char **) &inbuf, &inleft, &outbuf, &outleft);
  if (converted != (size_t) -1 || errno == EINVAL) {
   /*
    * EINVAL  An  incomplete  multibyte sequence has been encoun­-
    *         tered in the input.
    *
    * We'll just truncate it and ignore it.
    */
   break;
  }

  if (errno != E2BIG) {
   /*
    * EILSEQ An invalid multibyte sequence has been  encountered
    *        in the input.
    *
    * Bad input, we can't really recover from this. 
    */
   iconv_close (cd);
   free (output);
   return NULL;
  }

  /*
   * E2BIG   There is not sufficient room at *outbuf.
   *
   * We just need to grow our outbuffer and try again.
   */

  converted = outbuf - out;
  outlen += inleft * 2 + 8;

  if (!(tmp = realloc (output, outlen + 4))) {
   iconv_close (cd);
   free (output);
   return NULL;
  }

  output = tmp;
  outbuf = output + converted;
 } while (1);

 /* flush the iconv conversion */
 iconv (cd, NULL, NULL, &outbuf, &outleft);
 iconv_close (cd);

 /* Note: not all charsets can be nul-terminated with a single
  * nul byte. UCS2, for example, needs 2 nul bytes and UCS4
  * needs 4. I hope that 4 nul bytes is enough to terminate all
  * multibyte charsets? */

 /* nul-terminate the string */
 memset (outbuf, 0, 4);

 return output;
}

As others have noted, E2BIG means that the output buffer wasn't large enough for the conversion and you were using the wrong value for outleft.

But I've also noticed some other possible problems with your function. Namely, with the way your function works, your caller has no way of knowing how many bytes are in the output string. Your convert() function neither nul-terminates the output buffer nor does it have a means of telling its caller the number of bytes it wrote to outptr.

If you want to deal with nul-terminates strings (and it appears that's what you want to do since your input string is nul-terminated), you might find the following approach to be much better:


char *
convert (const char *from_charset, const char *to_charset, const char *input)
{
 size_t inleft, outleft, converted = 0;
 char *output, *outbuf, *tmp;
 const char *inbuf;
 size_t outlen;
 iconv_t cd;

 if ((cd = iconv_open (to_charset, from_charset)) == (iconv_t) -1)
  return NULL;

 inleft = strlen (input);
 inbuf = input;

 /* we'll start off allocating an output buffer which is the same size
  * as our input buffer. */
 outlen = inleft;

 /* we allocate 4 bytes more than what we need for nul-termination... */
 if (!(output = malloc (outlen + 4))) {
  iconv_close (cd);
  return NULL;
 }

 do {
  errno = 0;
  outbuf = output + converted;
  outleft = outlen - converted;

  converted = iconv (cd, (char **) &inbuf, &inleft, &outbuf, &outleft);
  if (converted != (size_t) -1 || errno == EINVAL) {
   /*
    * EINVAL  An  incomplete  multibyte sequence has been encoun­-
    *         tered in the input.
    *
    * We'll just truncate it and ignore it.
    */
   break;
  }

  if (errno != E2BIG) {
   /*
    * EILSEQ An invalid multibyte sequence has been  encountered
    *        in the input.
    *
    * Bad input, we can't really recover from this. 
    */
   iconv_close (cd);
   free (output);
   return NULL;
  }

  /*
   * E2BIG   There is not sufficient room at *outbuf.
   *
   * We just need to grow our outbuffer and try again.
   */

  converted = outbuf - out;
  outlen += inleft * 2 + 8;

  if (!(tmp = realloc (output, outlen + 4))) {
   iconv_close (cd);
   free (output);
   return NULL;
  }

  output = tmp;
  outbuf = output + converted;
 } while (1);

 /* flush the iconv conversion */
 iconv (cd, NULL, NULL, &outbuf, &outleft);
 iconv_close (cd);

 /* Note: not all charsets can be nul-terminated with a single
  * nul byte. UCS2, for example, needs 2 nul bytes and UCS4
  * needs 4. I hope that 4 nul bytes is enough to terminate all
  * multibyte charsets? */

 /* nul-terminate the string */
 memset (outbuf, 0, 4);

 return output;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文