C 中的 UTF-32 到 UTF-8 转换器，缓冲区充满空/零

发布于 2024-10-31 06:34:59 字数 1485 浏览 12 评论 0原文

我一直在努力让它发挥作用。该程序应该采用两个参数，一个用于缓冲区大小，另一个用于文件名，并将该文件从 UTF-32 转换为 UTF-8。我一直在使用 fgetc() 函数用 Unicode 代码点填充 int 数组。我已经测试了 printint 输出缓冲区的内容，它具有所有这些空字符而不是每个代码点。

例如，对于仅包含字符“A”的文件：缓冲区[0]为0 缓冲区[1]为0 缓冲区 [2] 为 0 buffer [3] 是 41

U+7F 以上的任何代码点最终都会被分开。

这是初始化缓冲区的代码：

int main(int argc, char** argv) {
  if (argc != 3) {
    printf("Must input a buffer size and a file name :D");
    return 0;
  }

  FILE* input = fopen(argv[2], "r");
  if (!input) {
    printf("The file %s does not exist.", argv[1]);
    return 0;
  } else {
    int bufferLimit = atoi(argv[1]);
    int buffer[bufferLimit];
    int charReplaced = 0;
    int fileEndReached = 0;
    int i = 0;
    int j = 0;

    while(1) {
      // fill the buffer with the characters from the file.
      for(i = 0; i < bufferLimit; i++){
        buffer[i] = fgetc(input);
        // if EOF reached, move onto next step and mark that
        // it has finished.
        if (buffer[i] == EOF) {
          fileEndReached = 1;
          break;
        }
      }
      // output buffer of chars until EOF or end of buffer
      for(j = 0; j <= i; j++) {
        if(buffer[j] == EOF) {
          break;
        }
        // check for Character Replacements
        charReplaced += !convert(buffer[j]);
      }
      if(fileEndReached != 0) {
        break;
      } 
    }  
    //return a 1 if any Character Replacements were used
    if(charReplaced != 0) {
      return 1;
    }
  }
}

原文

I've been trying forever to get this working. The program is supposed to take two arguments, on for the buffer size and another for a file name and convert that file form UTF-32 to UTF-8. I've been using the fgetc() function to fill an int array with the Unicode codepoint. I've tested printint out the contents of my buffer, and it has all these null characters instead of each codepoint.

For example, for a file consisting of only the character 'A':
buffer [0] is 0
buffer [1] is 0
buffer [2] is 0
buffer [3] is 41

The codepoints for anything above U+7F end up getting split apart.

Here is the code for initializing my buffer:

int main(int argc, char** argv) {
  if (argc != 3) {
    printf("Must input a buffer size and a file name :D");
    return 0;
  }

  FILE* input = fopen(argv[2], "r");
  if (!input) {
    printf("The file %s does not exist.", argv[1]);
    return 0;
  } else {
    int bufferLimit = atoi(argv[1]);
    int buffer[bufferLimit];
    int charReplaced = 0;
    int fileEndReached = 0;
    int i = 0;
    int j = 0;

    while(1) {
      // fill the buffer with the characters from the file.
      for(i = 0; i < bufferLimit; i++){
        buffer[i] = fgetc(input);
        // if EOF reached, move onto next step and mark that
        // it has finished.
        if (buffer[i] == EOF) {
          fileEndReached = 1;
          break;
        }
      }
      // output buffer of chars until EOF or end of buffer
      for(j = 0; j <= i; j++) {
        if(buffer[j] == EOF) {
          break;
        }
        // check for Character Replacements
        charReplaced += !convert(buffer[j]);
      }
      if(fileEndReached != 0) {
        break;
      } 
    }  
    //return a 1 if any Character Replacements were used
    if(charReplaced != 0) {
      return 1;
    }
  }
}

分享到QQ

分享到微博