在 C 中将文本文件读入缓冲区的正确方法？

发布于 2024-08-17 05:13:12 字数 430 浏览 4 评论 0原文

我正在处理一些小文本文件，我想在处理它们时将其读入缓冲区，因此我想出了以下代码：

...
char source[1000000];

FILE *fp = fopen("TheFile.txt", "r");
if(fp != NULL)
{
    while((symbol = getc(fp)) != EOF)
    {
        strcat(source, &symbol);
    }
    fclose(fp);
}
...

这是将文件内容放入缓冲区的正确方法还是我滥用strcat()？

然后我迭代缓冲区：

for(int x = 0; (c = source[x]) != '\0'; x++)
{
    //Process chars
}

原文

I'm dealing with small text files that i want to read into a buffer while i process them, so i've come up with the following code:

...
char source[1000000];

FILE *fp = fopen("TheFile.txt", "r");
if(fp != NULL)
{
    while((symbol = getc(fp)) != EOF)
    {
        strcat(source, &symbol);
    }
    fclose(fp);
}
...

Is this the correct way of putting the contents of the file into the buffer or am i abusing strcat()?

I then iterate through the buffer thus:

for(int x = 0; (c = source[x]) != '\0'; x++)
{
    //Process chars
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ι不睡觉的鱼゛ 2024-08-24 05:13:12

char source[1000000];

FILE *fp = fopen("TheFile.txt", "r");
if(fp != NULL)
{
    while((symbol = getc(fp)) != EOF)
    {
        strcat(source, &symbol);
    }
    fclose(fp);
}

这段代码有很多问题：

它非常慢（您一次提取缓冲区一个字符）。
如果文件大小超过sizeof(source)，则很容易出现缓冲区溢出。
事实上，当你仔细观察时，你会发现这段代码根本不应该工作。正如手册页中所述：

<块引用>
strcat() 函数将空终止字符串 s2 的副本附加到空终止字符串 s1 的末尾，然后添加终止“\0”。

您正在将一个字符（不是以 NUL 结尾的字符串！）附加到可能以或不以 NUL 结尾的字符串。根据手册页描述，我唯一可以想象这种工作方式是文件中的每个字符都以 NUL 结尾，在这种情况下，这是毫无意义的。所以，是的，这绝对是对 strcat() 的严重滥用。

以下是可以考虑使用的两种替代方案。

如果您提前知道最大缓冲区大小：

#include <stdio.h>
#define MAXBUFLEN 1000000

char source[MAXBUFLEN + 1];
FILE *fp = fopen("foo.txt", "r");
if (fp != NULL) {
    size_t newLen = fread(source, sizeof(char), MAXBUFLEN, fp);
    if ( ferror( fp ) != 0 ) {
        fputs("Error reading file", stderr);
    } else {
        source[newLen++] = '\0'; /* Just to be safe. */
    }

    fclose(fp);
}

或者，如果您不知道：

#include <stdio.h>
#include <stdlib.h>

char *source = NULL;
FILE *fp = fopen("foo.txt", "r");
if (fp != NULL) {
    /* Go to the end of the file. */
    if (fseek(fp, 0L, SEEK_END) == 0) {
        /* Get the size of the file. */
        long bufsize = ftell(fp);
        if (bufsize == -1) { /* Error */ }

        /* Allocate our buffer to that size. */
        source = malloc(sizeof(char) * (bufsize + 1));

        /* Go back to the start of the file. */
        if (fseek(fp, 0L, SEEK_SET) != 0) { /* Error */ }

        /* Read the entire file into memory. */
        size_t newLen = fread(source, sizeof(char), bufsize, fp);
        if ( ferror( fp ) != 0 ) {
            fputs("Error reading file", stderr);
        } else {
            source[newLen++] = '\0'; /* Just to be safe. */
        }
    }
    fclose(fp);
}

free(source); /* Don't forget to call free() later! */

char source[1000000];

FILE *fp = fopen("TheFile.txt", "r");
if(fp != NULL)
{
    while((symbol = getc(fp)) != EOF)
    {
        strcat(source, &symbol);
    }
    fclose(fp);
}

There are quite a few things wrong with this code:

It is very slow (you are extracting the buffer one character at a time).
If the filesize is over sizeof(source), this is prone to buffer overflows.
Really, when you look at it more closely, this code should not work at all. As stated in the man pages:

The strcat() function appends a copy of the null-terminated string s2 to the end of the null-terminated string s1, then add a terminating `\0'.

You are appending a character (not a NUL-terminated string!) to a string that may or may not be NUL-terminated. The only time I can imagine this working according to the man-page description is if every character in the file is NUL-terminated, in which case this would be rather pointless. So yes, this is most definitely a terrible abuse of strcat().

The following are two alternatives to consider using instead.

If you know the maximum buffer size ahead of time:

#include <stdio.h>
#define MAXBUFLEN 1000000

char source[MAXBUFLEN + 1];
FILE *fp = fopen("foo.txt", "r");
if (fp != NULL) {
    size_t newLen = fread(source, sizeof(char), MAXBUFLEN, fp);
    if ( ferror( fp ) != 0 ) {
        fputs("Error reading file", stderr);
    } else {
        source[newLen++] = '\0'; /* Just to be safe. */
    }

    fclose(fp);
}

Or, if you do not:

#include <stdio.h>
#include <stdlib.h>

char *source = NULL;
FILE *fp = fopen("foo.txt", "r");
if (fp != NULL) {
    /* Go to the end of the file. */
    if (fseek(fp, 0L, SEEK_END) == 0) {
        /* Get the size of the file. */
        long bufsize = ftell(fp);
        if (bufsize == -1) { /* Error */ }

        /* Allocate our buffer to that size. */
        source = malloc(sizeof(char) * (bufsize + 1));

        /* Go back to the start of the file. */
        if (fseek(fp, 0L, SEEK_SET) != 0) { /* Error */ }

        /* Read the entire file into memory. */
        size_t newLen = fread(source, sizeof(char), bufsize, fp);
        if ( ferror( fp ) != 0 ) {
            fputs("Error reading file", stderr);
        } else {
            source[newLen++] = '\0'; /* Just to be safe. */
        }
    }
    fclose(fp);
}

free(source); /* Don't forget to call free() later! */

回复收藏 0 原文

幽梦紫曦～ 2024-08-24 05:13:12

如果您使用的是 Linux 系统，一旦获得文件描述符，您就可以使用 fstat()

http://linux.die.net/man/2/stat

所以你可能会

#include  <unistd.h> 
void main()
{
    struct stat stat;
    int fd;
    //get file descriptor
    fstat(fd, &stat);
    //the size of the file is now in stat.st_size
}

这样避免寻找文件的开头和结尾。

If you're on a linux system, once you have the file descriptor you can get a lot of information about the file using fstat()

http://linux.die.net/man/2/stat

so you might have

#include  <unistd.h> 
void main()
{
    struct stat stat;
    int fd;
    //get file descriptor
    fstat(fd, &stat);
    //the size of the file is now in stat.st_size
}

This avoids seeking to the beginning and end of the file.

回复收藏 0 原文

别念他 2024-08-24 05:13:12

是的 - 您可能会因严重滥用 strcat 而被捕！

看一下 getline() 它一次读取一行数据，但重要的是它可以限制您读取的字符数，这样您就不会溢出缓冲区。

Strcat 相对较慢，因为它必须在每次插入字符时搜索整个字符串以查找结尾。
通常，您会保留一个指向字符串存储当前末尾的指针，并将其传递给 getline 作为读取下一行的位置。

回复收藏 0 原文

☆獨立☆ 2024-08-24 05:13:12

请参阅JoelOnSoftware 的这篇文章了解为什么您不想使用 strcat。

查看 fread 作为替代方案。当您读取字节或字符时，将其与 1 一起使用作为大小。

回复收藏 0 原文

弃爱 2024-08-24 05:13:12

你为什么不直接使用你拥有的字符数组呢？这应该可以做到：

   source[i] = getc(fp); 
   i++;

Why don't you just use the array of chars you have? This ought to do it:

   source[i] = getc(fp); 
   i++;

回复收藏 0 原文

隔岸观火 2024-08-24 05:13:12

未经测试，但应该可以工作..是的，它可以通过 fread 更好地实现，我将把它作为练习留给读者。

#define DEFAULT_SIZE 100
#define STEP_SIZE 100

char *buffer[DEFAULT_SIZE];
size_t buffer_sz=DEFAULT_SIZE;
size_t i=0;
while(!feof(fp)){
  buffer[i]=fgetc(fp);
  i++;
  if(i>=buffer_sz){
    buffer_sz+=STEP_SIZE;
    void *tmp=buffer;
    buffer=realloc(buffer,buffer_sz);
    if(buffer==null){ free(tmp); exit(1);} //ensure we don't have a memory leak
  }
}
buffer[i]=0;

Not tested, but should work.. And yes, it could be better implemented with fread, I'll leave that as an exercise to the reader.

#define DEFAULT_SIZE 100
#define STEP_SIZE 100

char *buffer[DEFAULT_SIZE];
size_t buffer_sz=DEFAULT_SIZE;
size_t i=0;
while(!feof(fp)){
  buffer[i]=fgetc(fp);
  i++;
  if(i>=buffer_sz){
    buffer_sz+=STEP_SIZE;
    void *tmp=buffer;
    buffer=realloc(buffer,buffer_sz);
    if(buffer==null){ free(tmp); exit(1);} //ensure we don't have a memory leak
  }
}
buffer[i]=0;

回复收藏 0 原文