使用 C 动态存储文件中的信息
我是 C 语言新手,正在尝试学习一些东西。我想做的是读取文件并存储信息。由于格式为 CSV,因此计划读取每个字符,确定它是数字还是逗号,并将数字存储在链接列表中。我遇到的问题是读取长度超过一个字符的数字,如下例所示。
5,2,24,5
这是我到目前为止得到的代码,它只是没有返回我期望的输出。这是代码,输出位于代码示例下方。
#include <ctype.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
struct list {
float value;
struct list * next;
struct list * prev;
};
int main( int argc, char *argv[] ){
FILE *infile;
char *token = NULL;
char my_char;
/* Open the file. */
// The file name should be in argv[1]
if((infile = fopen(argv[1], "r")) == NULL) {
printf("Error Opening File.\n");
printf("ERROR: %s\n", strerror(errno));
exit(1);
}
while((my_char = (char)fgetc(infile)) != EOF){
//Is my_char a number?
if(isdigit(my_char)){
if(token == NULL){
token = (char *)malloc(sizeof(char));
memset(token, '\0', 1);
strcpy(token, &my_char);
printf("length of token -> %d\n", strlen(token));
printf("%c\n", *token);
} else {
token = (char *)realloc(token, sizeof(token) + 1);
strcat(token, &my_char);
printf("%s\n", token);
}
}
}
free(token);
fclose(infile);
}
这是输出:
[estest@THEcomputer KernelFunctions]$ nvcc linear_kernel.cu -o linear_kernel.exe
[estest@THEcomputer KernelFunctions]$ ./linear_kernel.exe iris.csv
length of token -> 5
5
5a#1a#
5a#1a#3a#
5a#1a#3a#5a#
5a#1a#3a#5a#1a#
5a#1a#3a#5a#1a#4a#
*** glibc detected *** ./linear_kernel.exe: realloc(): invalid next size: 0x0000000001236350 ***
我不明白为什么令牌的长度是“5”,而我期望的是 1,以及 5 后面的看起来很奇怪的字符(由“a#”表示)。谁能帮助我更好地理解这一点?
I'm new to C and trying to learn a few things. What I'm trying to do is read in a file and store the information. Since the format will be a CSV, the plan is to read in each character, determine if its a number or a comma, and store the numbers in a linked list. The problem I'm having is reading in numbers that are more than one character long like the following example.
5,2,24,5
Here's the code I've got so far and its just not giving back output that I expect. Here's the code, and the output is below the code sample.
#include <ctype.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
struct list {
float value;
struct list * next;
struct list * prev;
};
int main( int argc, char *argv[] ){
FILE *infile;
char *token = NULL;
char my_char;
/* Open the file. */
// The file name should be in argv[1]
if((infile = fopen(argv[1], "r")) == NULL) {
printf("Error Opening File.\n");
printf("ERROR: %s\n", strerror(errno));
exit(1);
}
while((my_char = (char)fgetc(infile)) != EOF){
//Is my_char a number?
if(isdigit(my_char)){
if(token == NULL){
token = (char *)malloc(sizeof(char));
memset(token, '\0', 1);
strcpy(token, &my_char);
printf("length of token -> %d\n", strlen(token));
printf("%c\n", *token);
} else {
token = (char *)realloc(token, sizeof(token) + 1);
strcat(token, &my_char);
printf("%s\n", token);
}
}
}
free(token);
fclose(infile);
}
And here is the output:
[estest@THEcomputer KernelFunctions]$ nvcc linear_kernel.cu -o linear_kernel.exe
[estest@THEcomputer KernelFunctions]$ ./linear_kernel.exe iris.csv
length of token -> 5
5
5a#1a#
5a#1a#3a#
5a#1a#3a#5a#
5a#1a#3a#5a#1a#
5a#1a#3a#5a#1a#4a#
*** glibc detected *** ./linear_kernel.exe: realloc(): invalid next size: 0x0000000001236350 ***
I don't understand why the length of the token is '5' when I expect to be 1 and the strange looking characters that follow 5 (represented by 'a#'). Can anyone help me understand this a little better?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
token
是一个指针。sizeof
不会给你它所指向的内存块的分配大小;它为您提供了指针对象本身的大小。显然,指针在你的系统上是 4 个字节(这是典型的),所以你总是重新分配到 5 个字节。更多建议:
exit(EXIT_FAILURE)
更便携。char my_char;
fgetc
返回一个 int,而不是一个 char。该值是从文件中读取的下一个字符(表示为无符号字符,然后转换为 int,因此通常在 0..255 范围内)或值EOF
(通常为 -1)。如果您的系统上有普通 char 签名,则恰好为 255 的输入字符将导致循环提前终止;如果普通 char 是无符号的,则循环可能永远不会结束,因为您要将 EOF 的负值转换为有符号值。实际上我并不能 100% 确定后一种情况会发生什么,但这并不重要;将my_char
设为 int。不要强制转换
malloc()
的结果。这不是必需的(malloc()
返回一个void*
,因此可以隐式转换),并且它可以隐藏错误。sizeof(char)
根据定义为 1。只需写:并且始终检查返回值;
malloc()
失败时返回 NULL。更简单:
*token = '\0';
分配一个字节,然后
realloc()
一次额外分配一个字节,效率可能非常低。strcat()
的第二个参数必须是指向字符串的指针。&my_char
的类型正确,但如果内存中my_char
后面的字节恰好不是“\0”
,坏事可能会发生
。这不是详尽的评论。
推荐阅读:comp.lang.c 常见问题解答。
token
is a pointer.sizeof
doesn't give you the allocated size of the chunk of memory to which it points; it gives you the size of the pointer object itself. Apparently pointers are 4 bytes on your system (that's typical), so you're always reallocating to 5 bytes.Some more suggestions:
exit(EXIT_FAILURE)
is more portable.char my_char;
fgetc
returns an int, not a char. The value is either the next character read from the file (represented as an unsigned char and then converted to int, so typically in the range 0..255) or the valueEOF
(which is typically -1). If plain char is signed on your system, an input character that happens to be 255 will cause your loop to terminate prematurely; if plain char is unsigned, your loop may never end, because you're converting the negative value ofEOF
to a signed value. I'm actually not 100% sure what happens in the latter case, but it doesn't matter; makemy_char
an int.Don't cast the result of
malloc()
. It's not necessary (malloc()
returns avoid*
so it can be converted implicitly), and it can hide errors.sizeof(char)
is 1 by definition. Just write:And always check the return value;
malloc()
returns NULL on failure.Simpler:
*token = '\0';
Allocating a single byte, then
realloc()
ating one additional byte at a time, is likely to be terribly inefficient.The second argument to
strcat()
must be a pointer to a string.&my_char
is of the right type, but if the byte followingmy_char
in memory doesn't happen to be a '\0'
,Bad Things Can Happen
.This is not an exhaustive review.
Recommended reading: the comp.lang.c FAQ.
主要问题似乎是空终止字符串的问题。
malloc
调用分配 1 个字节。但是 strcpy 会复制字节,直到到达空终止符(零字节)。因此结果没有很好地定义,因为my_char
之后的字节是堆栈中的“随机”值。您需要分配比字符串长度长一个字节(并重新分配长一个字节)以允许空终止符。并且
strcpy
和strcat
调用对于源“字符串”无效,它实际上只是一个字符。要继续使用您正在实现的基本逻辑,只需将字符值分配到 token 数组中的适当位置即可。或者,您可以将my_char
声明为双字节字符数组,并将第二个字节设置为 0 终止符,以允许使用strcpy
和strcat
。例如,然后需要相应地更改
my_char
的用法(将值分配给my_char[0]
,并删除&
) code> 在 strcpy/strcat 调用中)。编译器警告/错误将有助于解决这些更改。The main issue appears to be a problem with null terminated strings. The
malloc
call is allocating 1 byte. Butstrcpy
copies bytes until it reaches a null terminator (a zero byte). So the results are not well defined since the byte aftermy_char
is a "random" value from the stack.You need to allocate one byte longer (and realloc one byte longer) than the length of the string to allow for a null terminator. And the
strcpy
andstrcat
calls are not valid for the source "string" which is actually just a character. To continue using the basic logic that you are implementing, it would be necessary to simply assign the character value to the appropriate position in thetoken
array. Alternatively, you could declaremy_char
as a two byte character array and set the second byte to a 0 terminator to allowstrcpy
andstrcat
to be used. For example,And then it would be necessary to change the usage of
my_char
accordingly (assign the value tomy_char[0]
, and remove the&
in the strcpy/strcat calls). The compiler warnings/errors would help address those changes.您在代码中仅为字符串分配了 1 个字节的数据:
但是,由于您仅将一个字节清零,因此您的字符串不一定以 null 结尾。您最有可能看到的是 char * 之后内存中的额外垃圾。
You're allocating only 1 byte of data for your string in your code:
However, because you're only zeroing out one byte, your string is not necessarily null terminated. What you're most likely seeing is extra junk that was in the memory after your char *.
其一,一次阅读 1 整行比一次阅读 1 个字符要容易得多。然后,您可以使用
strtok()
用逗号分隔该行。您的代码存在一些问题:
这只会分配 1 个字节。 C 字符串必须以 null 结尾,因此即使长度为 1 的字符串也需要 2 个字节的分配空间。
my_char
是单个字符,而不是空终止字符串(这是strcpy()
和strcat()
所期望的)。这不是你想要做的。这将返回指针的大小(这是
token
的类型。您可能需要类似strlen()
的东西,但您必须重构代码以确保您使用的是空终止字符串而不是单个字符。For one, it would be a lot easier for you to read 1 whole line at a time as opposed to 1 character at a time. You can then use
strtok()
to split the line by the commas.There are a few problems with your code:
This will only allocate 1 byte. C strings have to be null-terminated, so even a string of length 1 requires 2 bytes of allocated space.
my_char
is a single character, not a null-terminated string (which is whatstrcpy()
andstrcat()
expect).This is not what you mean to do. This will return you the size of a pointer (which is the type of
token
. You probably want something likestrlen()
, but you'd have to refactor your code to make sure you're using null-terminated strings as opposed to single characters.您的
my_char
应该是int
因为这就是fgetc
返回,使用char
将意味着您永远找不到 EOF 条件:EOF
value 是一个int
,它不是有效的char
,这就是您如何在从 精美手册:其他人已经指出了你的记忆错误,所以我不会去管这些。
Your
my_char
should beint
because that's whatfgetc
returns, using achar
will mean that you'll never find your EOF condition:The
EOF
value is anint
that is not a validchar
, that's how you can detect the end of a file while reading it one byte at a time and from the fine manual:Others have pointed out your memory errors so I'll leave those alone.
这是糟糕的时期。
fgetc
返回int
。它可以表示比char
更多的值。EOF
通常为-1
。由于您存储在char
中,您希望如何表示字符0xff
?你不会;你最终会将其视为EOF
。您应该这样做:接下来...
您应该检查
malloc
的返回值。您还应该考虑预先分配比您需要的更多的字符,否则每次调用realloc
都可能必须复制您到目前为止看到的字符。例如,通过将每个分配大小设置为 2 的幂,您将获得更好的算法复杂性。此外,与 C++ 不同,在 C 中,您不需要从void*
进行转换。这不是你想的那个意思。
(&my_char)[1]
必须为零才能正常工作,因此这是未定义的行为。您应该尝试以下操作:另外,您只分配了 1 个
char
。你需要 2 个才能正常工作。sizeof
不会神奇地记住上次分配的大小,它只采用指定类型的编译时大小,在本例中相当于sizeof(char*)
在 32 位或 64 位系统上分别为 4 或 8。您需要跟踪变量中的实际分配大小。此外,这种realloc
很容易在失败时泄漏内存,您应该这样做:继续...
这与上次使用
&my_char
具有相同的未定义行为> 就好像它是一个 C 字符串。另外,即使它确实有效,也是浪费的,因为strcat
必须遍历整个字符串才能找到结尾。我的建议总结如下:
This is bad times.
fgetc
returnsint
. It can represent more values thanchar
.EOF
is typically-1
. Since you're storing in achar
, how do you expect to represent the character0xff
? You won't; you'll end up treating it asEOF
. You should do this:Next up...
You should check the return value of
malloc
. You should also consider allocating more than you need up front, otherwise every call torealloc
could potentially have to copy the characters that you've seen so far. You will get better algorithmic complexity by, say, making every allocation size a power of 2. Also, unlike C++, in C you don't need to cast fromvoid*
.This is not what you think it means.
(&my_char)[1]
must be zero for this to work, so this is undefined behavior. You should try this:Also, you only allocated 1
char
. You need 2 for this to work.sizeof
does not magically remember how much you allocated last time, it only takes the compile-time size of the type it's specified, in this case equivalent tosizeof(char*)
which would be 4 or 8 on 32 or 64-bit systems respectively. You need to track the real allocation size in a variable. Also this kind ofrealloc
is prone to leak memory on failure, you should do this:Moving on...
This has the same undefined behavior as the last use of
&my_char
as if it was a C string. Also, even if it did work it is wasteful, sincestrcat
must traverse the entire string to find the end.Summary of my suggestions follows:
strcpy
的实现非常简单,因此,
src
指向的内存预计至少以一个“\0”字符结尾。在您的情况下,单元素数组包含一个不为空的字符。因此,strcpy 超出了它的内存范围,最终在其段之外取消引用,从而导致错误。当进行像strcpy(buff, "abcd")
这样的调用时,不会观察到这种情况,因为编译器将abcd\0
放在程序的代码部分中。要解决一般问题,使用
fgetline
和strtok
将是更好、更简单的解决方法。The implementation of
strcpy
is as simple asSo, memory pointed by
src
is expected to end with at least one '\0' character. In your case, the single element array holds a character that's not null. Hence,strcpy
goes beyond it's memory and ends up dereferencing outside of its segment resulting in a fault. This is not observed when a call likestrcpy(buff, "abcd")
is made because, the compiler placesabcd\0
in the code section of the program.To solve your problem in general, using
fgetline
andstrtok
will be a better and easier way of solving it.