在 C 中将文本文件读入缓冲区的正确方法?
我正在处理一些小文本文件,我想在处理它们时将其读入缓冲区,因此我想出了以下代码:
...
char source[1000000];
FILE *fp = fopen("TheFile.txt", "r");
if(fp != NULL)
{
while((symbol = getc(fp)) != EOF)
{
strcat(source, &symbol);
}
fclose(fp);
}
...
这是将文件内容放入缓冲区的正确方法还是我滥用strcat()
?
然后我迭代缓冲区:
for(int x = 0; (c = source[x]) != '\0'; x++)
{
//Process chars
}
I'm dealing with small text files that i want to read into a buffer while i process them, so i've come up with the following code:
...
char source[1000000];
FILE *fp = fopen("TheFile.txt", "r");
if(fp != NULL)
{
while((symbol = getc(fp)) != EOF)
{
strcat(source, &symbol);
}
fclose(fp);
}
...
Is this the correct way of putting the contents of the file into the buffer or am i abusing strcat()
?
I then iterate through the buffer thus:
for(int x = 0; (c = source[x]) != '\0'; x++)
{
//Process chars
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
这段代码有很多问题:
sizeof(source)
,则很容易出现缓冲区溢出。您正在将一个字符(不是以 NUL 结尾的字符串!)附加到可能以或不以 NUL 结尾的字符串。根据手册页描述,我唯一可以想象这种工作方式是文件中的每个字符都以 NUL 结尾,在这种情况下,这是毫无意义的。所以,是的,这绝对是对
strcat()
的严重滥用。以下是可以考虑使用的两种替代方案。
如果您提前知道最大缓冲区大小:
或者,如果您不知道:
There are quite a few things wrong with this code:
sizeof(source)
, this is prone to buffer overflows.You are appending a character (not a NUL-terminated string!) to a string that may or may not be NUL-terminated. The only time I can imagine this working according to the man-page description is if every character in the file is NUL-terminated, in which case this would be rather pointless. So yes, this is most definitely a terrible abuse of
strcat()
.The following are two alternatives to consider using instead.
If you know the maximum buffer size ahead of time:
Or, if you do not:
如果您使用的是 Linux 系统,一旦获得文件描述符,您就可以使用 fstat()
http://linux.die.net/man/2/stat
所以你可能会
这样避免寻找文件的开头和结尾。
If you're on a linux system, once you have the file descriptor you can get a lot of information about the file using fstat()
http://linux.die.net/man/2/stat
so you might have
This avoids seeking to the beginning and end of the file.
是的 - 您可能会因严重滥用 strcat 而被捕!
看一下 getline() 它一次读取一行数据,但重要的是它可以限制您读取的字符数,这样您就不会溢出缓冲区。
Strcat 相对较慢,因为它必须在每次插入字符时搜索整个字符串以查找结尾。
通常,您会保留一个指向字符串存储当前末尾的指针,并将其传递给 getline 作为读取下一行的位置。
Yes - you would probably be arrested for your terriable abuse of strcat !
Take a look at getline() it reads the data a line at a time but importantly it can limit the number of characters you read, so you don't overflow the buffer.
Strcat is relatively slow because it has to search the entire string for the end on every character insertion.
You would normally keep a pointer to the current end of the string storage and pass that to getline as the position to read the next line into.
请参阅JoelOnSoftware 的这篇文章了解为什么您不想使用
strcat
。查看 fread 作为替代方案。当您读取字节或字符时,将其与 1 一起使用作为大小。
See this article from JoelOnSoftware for why you don't want to use
strcat
.Look at fread for an alternative. Use it with 1 for the size when you're reading bytes or characters.
你为什么不直接使用你拥有的字符数组呢?这应该可以做到:
Why don't you just use the array of chars you have? This ought to do it:
未经测试,但应该可以工作..是的,它可以通过 fread 更好地实现,我将把它作为练习留给读者。
Not tested, but should work.. And yes, it could be better implemented with fread, I'll leave that as an exercise to the reader.
我想你想要 fread:
http://www.cplusplus.com/reference/clibrary/ cstdio/fread/
Methinks you want fread:
http://www.cplusplus.com/reference/clibrary/cstdio/fread/
你考虑过 mmap() 吗?您可以直接从文件中读取,就像它已经在内存中一样。
http://beej.us/guide/bgipc/output/html/多页/mmap.html
Have you considered mmap()? You can read from the file directly as if it were already in memory.
http://beej.us/guide/bgipc/output/html/multipage/mmap.html