C 中的 UTF-32 到 UTF-8 转换器,缓冲区充满空/零

发布于 2024-10-31 06:34:59 字数 1485 浏览 1 评论 0原文

我一直在努力让它发挥作用。该程序应该采用两个参数,一个用于缓冲区大小,另一个用于文件名,并将该文件从 UTF-32 转换为 UTF-8。我一直在使用 fgetc() 函数用 Unicode 代码点填充 int 数组。我已经测试了 printint 输出缓冲区的内容,它具有所有这些空字符而不是每个代码点。

例如,对于仅包含字符“A”的文件: 缓冲区[0]为0 缓冲区[1]为0 缓冲区 [2] 为 0 buffer [3] 是 41

U+7F 以上的任何代码点最终都会被分开。

这是初始化缓冲区的代码:

int main(int argc, char** argv) {
  if (argc != 3) {
    printf("Must input a buffer size and a file name :D");
    return 0;
  }

  FILE* input = fopen(argv[2], "r");
  if (!input) {
    printf("The file %s does not exist.", argv[1]);
    return 0;
  } else {
    int bufferLimit = atoi(argv[1]);
    int buffer[bufferLimit];
    int charReplaced = 0;
    int fileEndReached = 0;
    int i = 0;
    int j = 0;

    while(1) {
      // fill the buffer with the characters from the file.
      for(i = 0; i < bufferLimit; i++){
        buffer[i] = fgetc(input);
        // if EOF reached, move onto next step and mark that
        // it has finished.
        if (buffer[i] == EOF) {
          fileEndReached = 1;
          break;
        }
      }
      // output buffer of chars until EOF or end of buffer
      for(j = 0; j <= i; j++) {
        if(buffer[j] == EOF) {
          break;
        }
        // check for Character Replacements
        charReplaced += !convert(buffer[j]);
      }
      if(fileEndReached != 0) {
        break;
      } 
    }  
    //return a 1 if any Character Replacements were used
    if(charReplaced != 0) {
      return 1;
    }
  }
}

I've been trying forever to get this working. The program is supposed to take two arguments, on for the buffer size and another for a file name and convert that file form UTF-32 to UTF-8. I've been using the fgetc() function to fill an int array with the Unicode codepoint. I've tested printint out the contents of my buffer, and it has all these null characters instead of each codepoint.

For example, for a file consisting of only the character 'A':
buffer [0] is 0
buffer [1] is 0
buffer [2] is 0
buffer [3] is 41

The codepoints for anything above U+7F end up getting split apart.

Here is the code for initializing my buffer:

int main(int argc, char** argv) {
  if (argc != 3) {
    printf("Must input a buffer size and a file name :D");
    return 0;
  }

  FILE* input = fopen(argv[2], "r");
  if (!input) {
    printf("The file %s does not exist.", argv[1]);
    return 0;
  } else {
    int bufferLimit = atoi(argv[1]);
    int buffer[bufferLimit];
    int charReplaced = 0;
    int fileEndReached = 0;
    int i = 0;
    int j = 0;

    while(1) {
      // fill the buffer with the characters from the file.
      for(i = 0; i < bufferLimit; i++){
        buffer[i] = fgetc(input);
        // if EOF reached, move onto next step and mark that
        // it has finished.
        if (buffer[i] == EOF) {
          fileEndReached = 1;
          break;
        }
      }
      // output buffer of chars until EOF or end of buffer
      for(j = 0; j <= i; j++) {
        if(buffer[j] == EOF) {
          break;
        }
        // check for Character Replacements
        charReplaced += !convert(buffer[j]);
      }
      if(fileEndReached != 0) {
        break;
      } 
    }  
    //return a 1 if any Character Replacements were used
    if(charReplaced != 0) {
      return 1;
    }
  }
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

疾风者 2024-11-07 06:34:59

fgetc() 返回一个字节,而不是 unicode 代码点。

从那时起,基于这个错误的假设,整个事情就崩溃了。

fgetc() returns a byte, not a unicode code point.

From there on based on that false assumption the whole thing falls down.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文