使用 C 动态存储文件中的信息

发布于 2024-11-29 10:55:36 字数 1675 浏览 1 评论 0原文

我是 C 语言新手，正在尝试学习一些东西。我想做的是读取文件并存储信息。由于格式为 CSV，因此计划读取每个字符，确定它是数字还是逗号，并将数字存储在链接列表中。我遇到的问题是读取长度超过一个字符的数字，如下例所示。

5,2,24,5

这是我到目前为止得到的代码，它只是没有返回我期望的输出。这是代码，输出位于代码示例下方。

#include <ctype.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>

struct list {
  float value;
  struct list * next;
  struct list * prev;
};

int main( int argc, char *argv[] ){
  FILE *infile;
  char *token = NULL;
  char  my_char;

  /* Open the file. */
  // The file name should be in argv[1]
  if((infile = fopen(argv[1], "r")) == NULL) {
    printf("Error Opening File.\n");
    printf("ERROR: %s\n", strerror(errno));
    exit(1);
  }

  while((my_char = (char)fgetc(infile)) != EOF){
    //Is my_char a number?
    if(isdigit(my_char)){
      if(token == NULL){
        token = (char *)malloc(sizeof(char));
        memset(token, '\0', 1);
        strcpy(token, &my_char);
        printf("length of token -> %d\n", strlen(token));
        printf("%c\n", *token);
      } else {
        token = (char *)realloc(token, sizeof(token) + 1);
        strcat(token, &my_char);
        printf("%s\n", token);
      }
    }
  }

  free(token);
  fclose(infile);
}

这是输出：

[estest@THEcomputer KernelFunctions]$ nvcc linear_kernel.cu -o linear_kernel.exe
[estest@THEcomputer KernelFunctions]$ ./linear_kernel.exe iris.csv
length of token -> 5
5
5a#1a#
5a#1a#3a#
5a#1a#3a#5a#
5a#1a#3a#5a#1a#
5a#1a#3a#5a#1a#4a#
*** glibc detected *** ./linear_kernel.exe: realloc(): invalid next size: 0x0000000001236350 ***

我不明白为什么令牌的长度是“5”，而我期望的是 1，以及 5 后面的看起来很奇怪的字符（由“a#”表示）。谁能帮助我更好地理解这一点？

原文

I'm new to C and trying to learn a few things. What I'm trying to do is read in a file and store the information. Since the format will be a CSV, the plan is to read in each character, determine if its a number or a comma, and store the numbers in a linked list. The problem I'm having is reading in numbers that are more than one character long like the following example.

5,2,24,5

Here's the code I've got so far and its just not giving back output that I expect. Here's the code, and the output is below the code sample.

#include <ctype.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>

struct list {
  float value;
  struct list * next;
  struct list * prev;
};

int main( int argc, char *argv[] ){
  FILE *infile;
  char *token = NULL;
  char  my_char;

  /* Open the file. */
  // The file name should be in argv[1]
  if((infile = fopen(argv[1], "r")) == NULL) {
    printf("Error Opening File.\n");
    printf("ERROR: %s\n", strerror(errno));
    exit(1);
  }

  while((my_char = (char)fgetc(infile)) != EOF){
    //Is my_char a number?
    if(isdigit(my_char)){
      if(token == NULL){
        token = (char *)malloc(sizeof(char));
        memset(token, '\0', 1);
        strcpy(token, &my_char);
        printf("length of token -> %d\n", strlen(token));
        printf("%c\n", *token);
      } else {
        token = (char *)realloc(token, sizeof(token) + 1);
        strcat(token, &my_char);
        printf("%s\n", token);
      }
    }
  }

  free(token);
  fclose(infile);
}

And here is the output:

[estest@THEcomputer KernelFunctions]$ nvcc linear_kernel.cu -o linear_kernel.exe
[estest@THEcomputer KernelFunctions]$ ./linear_kernel.exe iris.csv
length of token -> 5
5
5a#1a#
5a#1a#3a#
5a#1a#3a#5a#
5a#1a#3a#5a#1a#
5a#1a#3a#5a#1a#4a#
*** glibc detected *** ./linear_kernel.exe: realloc(): invalid next size: 0x0000000001236350 ***

I don't understand why the length of the token is '5' when I expect to be 1 and the strange looking characters that follow 5 (represented by 'a#'). Can anyone help me understand this a little better?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

三生路 2024-12-06 10:55:36

char *token = NULL;

token = (char *)realloc(token, sizeof(token) + 1);

token 是一个指针。 sizeof 不会给你它所指向的内存块的分配大小；它为您提供了指针对象本身的大小。显然，指针在你的系统上是 4 个字节（这是典型的），所以你总是重新分配到 5 个字节。

更多建议：

exit(1);

exit(EXIT_FAILURE) 更便携。

char my_char；

while((my_char = (char)fgetc(infile)) != EOF){

fgetc 返回一个 int，而不是一个 char。该值是从文件中读取的下一个字符（表示为无符号字符，然后转换为 int，因此通常在 0..255 范围内）或值 EOF（通常为 -1）。如果您的系统上有普通 char 签名，则恰好为 255 的输入字符将导致循环提前终止；如果普通 char 是无符号的，则循环可能永远不会结束，因为您要将 EOF 的负值转换为有符号值。实际上我并不能 100% 确定后一种情况会发生什么，但这并不重要；将 my_char 设为 int。

token = (char *)malloc(sizeof(char));

不要强制转换 malloc() 的结果。这不是必需的（malloc() 返回一个 void*，因此可以隐式转换），并且它可以隐藏错误。 sizeof(char) 根据定义为 1。只需写：

token = malloc(1);

并且始终检查返回值； malloc() 失败时返回 NULL。

memset(token, '\0', 1);

更简单：*token = '\0';

分配一个字节，然后 realloc() 一次额外分配一个字节，效率可能非常低。

strcat(token, &my_char);

strcat() 的第二个参数必须是指向字符串的指针。 &my_char 的类型正确，但如果内存中 my_char 后面的字节恰好不是“\0”， 坏事可能会发生。

这不是详尽的评论。

推荐阅读：comp.lang.c 常见问题解答。

char *token = NULL;

token = (char *)realloc(token, sizeof(token) + 1);

token is a pointer. sizeof doesn't give you the allocated size of the chunk of memory to which it points; it gives you the size of the pointer object itself. Apparently pointers are 4 bytes on your system (that's typical), so you're always reallocating to 5 bytes.

Some more suggestions:

exit(1);

exit(EXIT_FAILURE) is more portable.

char my_char;

while((my_char = (char)fgetc(infile)) != EOF){

fgetc returns an int, not a char. The value is either the next character read from the file (represented as an unsigned char and then converted to int, so typically in the range 0..255) or the value EOF (which is typically -1). If plain char is signed on your system, an input character that happens to be 255 will cause your loop to terminate prematurely; if plain char is unsigned, your loop may never end, because you're converting the negative value of EOF to a signed value. I'm actually not 100% sure what happens in the latter case, but it doesn't matter; make my_char an int.

token = (char *)malloc(sizeof(char));

Don't cast the result of malloc(). It's not necessary (malloc() returns a void* so it can be converted implicitly), and it can hide errors. sizeof(char) is 1 by definition. Just write:

token = malloc(1);

And always check the return value; malloc() returns NULL on failure.

memset(token, '\0', 1);

Simpler: *token = '\0';

Allocating a single byte, then realloc()ating one additional byte at a time, is likely to be terribly inefficient.

strcat(token, &my_char);

The second argument to strcat() must be a pointer to a string. &my_char is of the right type, but if the byte following my_char in memory doesn't happen to be a '\0', Bad Things Can Happen.

This is not an exhaustive review.

Recommended reading: the comp.lang.c FAQ.

回复收藏 0 原文

居里长安 2024-12-06 10:55:36

主要问题似乎是空终止字符串的问题。 malloc 调用分配 1 个字节。但是 strcpy 会复制字节，直到到达空终止符（零字节）。因此结果没有很好地定义，因为 my_char 之后的字节是堆栈中的“随机”值。

您需要分配比字符串长度长一个字节（并重新分配长一个字节）以允许空终止符。并且 strcpy 和 strcat 调用对于源“字符串”无效，它实际上只是一个字符。要继续使用您正在实现的基本逻辑，只需将字符值分配到 token 数组中的适当位置即可。或者，您可以将 my_char 声明为双字节字符数组，并将第二个字节设置为 0 终止符，以允许使用 strcpy 和 strcat 。例如，

char my_char[2];
my_char[1] = '\0';

然后需要相应地更改 my_char 的用法（将值分配给 my_char[0]，并删除 &） code> 在 strcpy/strcat 调用中）。编译器警告/错误将有助于解决这些更改。

The main issue appears to be a problem with null terminated strings. The malloc call is allocating 1 byte. But strcpy copies bytes until it reaches a null terminator (a zero byte). So the results are not well defined since the byte after my_char is a "random" value from the stack.

You need to allocate one byte longer (and realloc one byte longer) than the length of the string to allow for a null terminator. And the strcpy and strcat calls are not valid for the source "string" which is actually just a character. To continue using the basic logic that you are implementing, it would be necessary to simply assign the character value to the appropriate position in the token array. Alternatively, you could declare my_char as a two byte character array and set the second byte to a 0 terminator to allow strcpy and strcat to be used. For example,

char my_char[2];
my_char[1] = '\0';

And then it would be necessary to change the usage of my_char accordingly (assign the value to my_char[0], and remove the & in the strcpy/strcat calls). The compiler warnings/errors would help address those changes.

回复收藏 0 原文

仅一夜美梦 2024-12-06 10:55:36

您在代码中仅为字符串分配了 1 个字节的数据：

token = (char *)malloc(sizeof(char));
memset(token, '\0', 1);

但是，由于您仅将一个字节清零，因此您的字符串不一定以 null 结尾。您最有可能看到的是 char * 之后内存中的额外垃圾。

You're allocating only 1 byte of data for your string in your code:

token = (char *)malloc(sizeof(char));
memset(token, '\0', 1);

However, because you're only zeroing out one byte, your string is not necessarily null terminated. What you're most likely seeing is extra junk that was in the memory after your char *.

回复收藏 0 原文

洛阳烟雨空心柳 2024-12-06 10:55:36

其一，一次阅读 1 整行比一次阅读 1 个字符要容易得多。然后，您可以使用 strtok() 用逗号分隔该行。

您的代码存在一些问题：

token = (char *)malloc(sizeof(char));

这只会分配 1 个字节。 C 字符串必须以 null 结尾，因此即使长度为 1 的字符串也需要 2 个字节的分配空间。

strcpy(token, &my_char);
strcat(token, &my_char);

my_char 是单个字符，而不是空终止字符串（这是 strcpy() 和 strcat() 所期望的）。

sizeof(token)

这不是你想要做的。这将返回指针的大小（这是 token 的类型。您可能需要类似 strlen() 的东西，但您必须重构代码以确保您使用的是空终止字符串而不是单个字符。

For one, it would be a lot easier for you to read 1 whole line at a time as opposed to 1 character at a time. You can then use strtok() to split the line by the commas.

There are a few problems with your code:

token = (char *)malloc(sizeof(char));

This will only allocate 1 byte. C strings have to be null-terminated, so even a string of length 1 requires 2 bytes of allocated space.

strcpy(token, &my_char);
strcat(token, &my_char);

my_char is a single character, not a null-terminated string (which is what strcpy() and strcat() expect).

sizeof(token)

This is not what you mean to do. This will return you the size of a pointer (which is the type of token. You probably want something like strlen(), but you'd have to refactor your code to make sure you're using null-terminated strings as opposed to single characters.

回复收藏 0 原文

千仐 2024-12-06 10:55:36

您的 my_char 应该是 int 因为这就是 fgetc 返回，使用 char 将意味着您永远找不到 EOF 条件：

int my_char;
/*...*/
while((my_char = fgetc(infile)) != EOF) {

EOF value 是一个 int，它不是有效的 char，这就是您如何在从精美手册：

如果将 fgetc() 返回的整数值存储到 char 类型的变量中，然后与整数常量 EOF 进行比较，则比较可能永远不会成功，因为 char 类型的变量在扩展为整数时的符号扩展为实现定义的。

其他人已经指出了你的记忆错误，所以我不会去管这些。

Your my_char should be int because that's what fgetc returns, using a char will mean that you'll never find your EOF condition:

int my_char;
/*...*/
while((my_char = fgetc(infile)) != EOF) {

The EOF value is an int that is not a valid char, that's how you can detect the end of a file while reading it one byte at a time and from the fine manual:

If the integer value returned by fgetc() is stored into a variable of type char and then compared against the integer constant EOF, the comparison may never succeed, because sign-extension of a variable of type char on widening to integer is implementation-defined.

Others have pointed out your memory errors so I'll leave those alone.

回复收藏 0 原文

星軌x 2024-12-06 10:55:36

while((my_char = (char)fgetc(infile)) != EOF){

这是糟糕的时期。 fgetc 返回 int。它可以表示比 char 更多的值。 EOF 通常为 -1。由于您存储在 char 中，您希望如何表示字符 0xff？你不会；你最终会将其视为EOF。您应该这样做：

int c;

while ((c=fgetc(infile)) != EOF)
{
   char my_char = c;

接下来...

       token = (char *)malloc(sizeof(char));

您应该检查 malloc 的返回值。您还应该考虑预先分配比您需要的更多的字符，否则每次调用 realloc 都可能必须复制您到目前为止看到的字符。例如，通过将每个分配大小设置为 2 的幂，您将获得更好的算法复杂性。此外，与 C++ 不同，在 C 中，您不需要从 void* 进行转换。

       memset(token, '\0', 1);
       strcpy(token, &my_char);

这不是你想的那个意思。 (&my_char)[1] 必须为零才能正常工作，因此这是未定义的行为。您应该尝试以下操作：

token[0] = my_char;
token[1] = 0;

另外，您只分配了 1 个 char。你需要 2 个才能正常工作。

       token = (char *)realloc(token, sizeof(token) + 1);

sizeof 不会神奇地记住上次分配的大小，它只采用指定类型的编译时大小，在本例中相当于 sizeof(char*)在 32 位或 64 位系统上分别为 4 或 8。您需要跟踪变量中的实际分配大小。此外，这种 realloc 很容易在失败时泄漏内存，您应该这样做：

 void *ptr = realloc(token, new_length);
 if (!ptr) { /* TODO: handle error */ }
 token = ptr;

继续...

       strcat(token, &my_char);

这与上次使用 &my_char 具有相同的未定义行为> 就好像它是一个 C 字符串。另外，即使它确实有效，也是浪费的，因为 strcat 必须遍历整个字符串才能找到结尾。

我的建议总结如下：

int c;
size_t alloc_size = 0;
size_t current_len = 0;
char *token = NULL;
void *ptr;

while ((c = fgetc(infile)) != EOF)
{
   if (is_digit(c))
   {
      if (alloc_size < current_len + 2)
      {
         if (!alloc_size)
         {
            // Set some arbitrary start size...
            //
            alloc_size = 64;
         }
         else
         {
            alloc_size *= 2;
         }

         if (!token)
            ptr = malloc(alloc_size);
         else
            ptr = realloc(token, alloc_size);

         if (!ptr)
         {
            free(token);
            return -1;
         }
      }

      token[current_len++] = c;
      token[current_len] = 0;
   }
}

/* TODO: do something with token... */

free(token);

while((my_char = (char)fgetc(infile)) != EOF){

This is bad times. fgetc returns int. It can represent more values than char. EOF is typically -1. Since you're storing in a char, how do you expect to represent the character 0xff? You won't; you'll end up treating it as EOF. You should do this:

int c;

while ((c=fgetc(infile)) != EOF)
{
   char my_char = c;

Next up...

       token = (char *)malloc(sizeof(char));

You should check the return value of malloc. You should also consider allocating more than you need up front, otherwise every call to realloc could potentially have to copy the characters that you've seen so far. You will get better algorithmic complexity by, say, making every allocation size a power of 2. Also, unlike C++, in C you don't need to cast from void*.

       memset(token, '\0', 1);
       strcpy(token, &my_char);

This is not what you think it means. (&my_char)[1] must be zero for this to work, so this is undefined behavior. You should try this:

token[0] = my_char;
token[1] = 0;

Also, you only allocated 1 char. You need 2 for this to work.

       token = (char *)realloc(token, sizeof(token) + 1);

sizeof does not magically remember how much you allocated last time, it only takes the compile-time size of the type it's specified, in this case equivalent to sizeof(char*) which would be 4 or 8 on 32 or 64-bit systems respectively. You need to track the real allocation size in a variable. Also this kind of realloc is prone to leak memory on failure, you should do this:

 void *ptr = realloc(token, new_length);
 if (!ptr) { /* TODO: handle error */ }
 token = ptr;

Moving on...

       strcat(token, &my_char);

This has the same undefined behavior as the last use of &my_char as if it was a C string. Also, even if it did work it is wasteful, since strcat must traverse the entire string to find the end.

Summary of my suggestions follows:

int c;
size_t alloc_size = 0;
size_t current_len = 0;
char *token = NULL;
void *ptr;

while ((c = fgetc(infile)) != EOF)
{
   if (is_digit(c))
   {
      if (alloc_size < current_len + 2)
      {
         if (!alloc_size)
         {
            // Set some arbitrary start size...
            //
            alloc_size = 64;
         }
         else
         {
            alloc_size *= 2;
         }

         if (!token)
            ptr = malloc(alloc_size);
         else
            ptr = realloc(token, alloc_size);

         if (!ptr)
         {
            free(token);
            return -1;
         }
      }

      token[current_len++] = c;
      token[current_len] = 0;
   }
}

/* TODO: do something with token... */

free(token);

回复收藏 0 原文

轻拂→两袖风尘 2024-12-06 10:55:36

strcpy 的实现非常简单

while(*dest++ = *src++);

，因此，src 指向的内存预计至少以一个“\0”字符结尾。在您的情况下，单元素数组包含一个不为空的字符。因此，strcpy 超出了它的内存范围，最终在其段之外取消引用，从而导致错误。当进行像 strcpy(buff, "abcd") 这样的调用时，不会观察到这种情况，因为编译器将 abcd\0 放在程序的代码部分中。

要解决一般问题，使用 fgetline 和 strtok 将是更好、更简单的解决方法。

The implementation of strcpy is as simple as

while(*dest++ = *src++);

So, memory pointed by src is expected to end with at least one '\0' character. In your case, the single element array holds a character that's not null. Hence, strcpy goes beyond it's memory and ends up dereferencing outside of its segment resulting in a fault. This is not observed when a call like strcpy(buff, "abcd") is made because, the compiler places abcd\0 in the code section of the program.

To solve your problem in general, using fgetline and strtok will be a better and easier way of solving it.

回复收藏 0 原文

~没有更多了~