在 C 中将文本文件读入缓冲区的正确方法?

发布于 2024-08-17 05:13:12 字数 430 浏览 4 评论 0原文

我正在处理一些小文本文件,我想在处理它们时将其读入缓冲区,因此我想出了以下代码:

...
char source[1000000];

FILE *fp = fopen("TheFile.txt", "r");
if(fp != NULL)
{
    while((symbol = getc(fp)) != EOF)
    {
        strcat(source, &symbol);
    }
    fclose(fp);
}
...

这是将文件内容放入缓冲区的正确方法还是我滥用strcat()

然后我迭代缓冲区:

for(int x = 0; (c = source[x]) != '\0'; x++)
{
    //Process chars
}

I'm dealing with small text files that i want to read into a buffer while i process them, so i've come up with the following code:

...
char source[1000000];

FILE *fp = fopen("TheFile.txt", "r");
if(fp != NULL)
{
    while((symbol = getc(fp)) != EOF)
    {
        strcat(source, &symbol);
    }
    fclose(fp);
}
...

Is this the correct way of putting the contents of the file into the buffer or am i abusing strcat()?

I then iterate through the buffer thus:

for(int x = 0; (c = source[x]) != '\0'; x++)
{
    //Process chars
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

ι不睡觉的鱼゛ 2024-08-24 05:13:12
char source[1000000];

FILE *fp = fopen("TheFile.txt", "r");
if(fp != NULL)
{
    while((symbol = getc(fp)) != EOF)
    {
        strcat(source, &symbol);
    }
    fclose(fp);
}

这段代码有很多问题:

  1. 它非常慢(您一次提取缓冲区一个字符)。
  2. 如果文件大小超过sizeof(source),则很容易出现缓冲区溢出。
  3. 事实上,当你仔细观察时,你会发现这段代码根本不应该工作。正如手册页中所述:

<块引用>

strcat() 函数将空终止字符串 s2 的副本附加到空终止字符串 s1 的末尾,然后添加终止“\0”。

您正在将一个字符(不是以 NUL 结尾的字符串!)附加到可能以或不以 NUL 结尾的字符串。根据手册页描述,我唯一可以想象这种工作方式是文件中的每个字符都以 NUL 结尾,在这种情况下,这是毫无意义的。所以,是的,这绝对是对 strcat() 的严重滥用。

以下是可以考虑使用的两种替代方案。

如果您提前知道最大缓冲区大小:

#include <stdio.h>
#define MAXBUFLEN 1000000

char source[MAXBUFLEN + 1];
FILE *fp = fopen("foo.txt", "r");
if (fp != NULL) {
    size_t newLen = fread(source, sizeof(char), MAXBUFLEN, fp);
    if ( ferror( fp ) != 0 ) {
        fputs("Error reading file", stderr);
    } else {
        source[newLen++] = '\0'; /* Just to be safe. */
    }

    fclose(fp);
}

或者,如果您不知道:

#include <stdio.h>
#include <stdlib.h>

char *source = NULL;
FILE *fp = fopen("foo.txt", "r");
if (fp != NULL) {
    /* Go to the end of the file. */
    if (fseek(fp, 0L, SEEK_END) == 0) {
        /* Get the size of the file. */
        long bufsize = ftell(fp);
        if (bufsize == -1) { /* Error */ }

        /* Allocate our buffer to that size. */
        source = malloc(sizeof(char) * (bufsize + 1));

        /* Go back to the start of the file. */
        if (fseek(fp, 0L, SEEK_SET) != 0) { /* Error */ }

        /* Read the entire file into memory. */
        size_t newLen = fread(source, sizeof(char), bufsize, fp);
        if ( ferror( fp ) != 0 ) {
            fputs("Error reading file", stderr);
        } else {
            source[newLen++] = '\0'; /* Just to be safe. */
        }
    }
    fclose(fp);
}

free(source); /* Don't forget to call free() later! */
char source[1000000];

FILE *fp = fopen("TheFile.txt", "r");
if(fp != NULL)
{
    while((symbol = getc(fp)) != EOF)
    {
        strcat(source, &symbol);
    }
    fclose(fp);
}

There are quite a few things wrong with this code:

  1. It is very slow (you are extracting the buffer one character at a time).
  2. If the filesize is over sizeof(source), this is prone to buffer overflows.
  3. Really, when you look at it more closely, this code should not work at all. As stated in the man pages:

The strcat() function appends a copy of the null-terminated string s2 to the end of the null-terminated string s1, then add a terminating `\0'.

You are appending a character (not a NUL-terminated string!) to a string that may or may not be NUL-terminated. The only time I can imagine this working according to the man-page description is if every character in the file is NUL-terminated, in which case this would be rather pointless. So yes, this is most definitely a terrible abuse of strcat().

The following are two alternatives to consider using instead.

If you know the maximum buffer size ahead of time:

#include <stdio.h>
#define MAXBUFLEN 1000000

char source[MAXBUFLEN + 1];
FILE *fp = fopen("foo.txt", "r");
if (fp != NULL) {
    size_t newLen = fread(source, sizeof(char), MAXBUFLEN, fp);
    if ( ferror( fp ) != 0 ) {
        fputs("Error reading file", stderr);
    } else {
        source[newLen++] = '\0'; /* Just to be safe. */
    }

    fclose(fp);
}

Or, if you do not:

#include <stdio.h>
#include <stdlib.h>

char *source = NULL;
FILE *fp = fopen("foo.txt", "r");
if (fp != NULL) {
    /* Go to the end of the file. */
    if (fseek(fp, 0L, SEEK_END) == 0) {
        /* Get the size of the file. */
        long bufsize = ftell(fp);
        if (bufsize == -1) { /* Error */ }

        /* Allocate our buffer to that size. */
        source = malloc(sizeof(char) * (bufsize + 1));

        /* Go back to the start of the file. */
        if (fseek(fp, 0L, SEEK_SET) != 0) { /* Error */ }

        /* Read the entire file into memory. */
        size_t newLen = fread(source, sizeof(char), bufsize, fp);
        if ( ferror( fp ) != 0 ) {
            fputs("Error reading file", stderr);
        } else {
            source[newLen++] = '\0'; /* Just to be safe. */
        }
    }
    fclose(fp);
}

free(source); /* Don't forget to call free() later! */
幽梦紫曦~ 2024-08-24 05:13:12

如果您使用的是 Linux 系统,一旦获得文件描述符,您就可以使用 fstat()

http://linux.die.net/man/2/stat

所以你可能会

#include  <unistd.h> 
void main()
{
    struct stat stat;
    int fd;
    //get file descriptor
    fstat(fd, &stat);
    //the size of the file is now in stat.st_size
}

这样避免寻找文件的开头和结尾。

If you're on a linux system, once you have the file descriptor you can get a lot of information about the file using fstat()

http://linux.die.net/man/2/stat

so you might have

#include  <unistd.h> 
void main()
{
    struct stat stat;
    int fd;
    //get file descriptor
    fstat(fd, &stat);
    //the size of the file is now in stat.st_size
}

This avoids seeking to the beginning and end of the file.

别念他 2024-08-24 05:13:12

是的 - 您可能会因严重滥用 strcat 而被捕!

看一下 getline() 它一次读取一行数据,但重要的是它可以限制您读取的字符数,这样您就不会溢出缓冲区。

Strcat 相对较慢,因为它必须在每次插入字符时搜索整个字符串以查找结尾。
通常,您会保留一个指向字符串存储当前末尾的指针,并将其传递给 getline 作为读取下一行的位置。

Yes - you would probably be arrested for your terriable abuse of strcat !

Take a look at getline() it reads the data a line at a time but importantly it can limit the number of characters you read, so you don't overflow the buffer.

Strcat is relatively slow because it has to search the entire string for the end on every character insertion.
You would normally keep a pointer to the current end of the string storage and pass that to getline as the position to read the next line into.

☆獨立☆ 2024-08-24 05:13:12

请参阅JoelOnSoftware 的这篇文章了解为什么您不想使用 strcat

查看 fread 作为替代方案。当您读取字节或字符时,将其与 1 一起使用作为大小。

See this article from JoelOnSoftware for why you don't want to use strcat.

Look at fread for an alternative. Use it with 1 for the size when you're reading bytes or characters.

弃爱 2024-08-24 05:13:12

你为什么不直接使用你拥有的字符数组呢?这应该可以做到:

   source[i] = getc(fp); 
   i++;

Why don't you just use the array of chars you have? This ought to do it:

   source[i] = getc(fp); 
   i++;
隔岸观火 2024-08-24 05:13:12

未经测试,但应该可以工作..是的,它可以通过 fread 更好地实现,我将把它作为练习留给读者。

#define DEFAULT_SIZE 100
#define STEP_SIZE 100

char *buffer[DEFAULT_SIZE];
size_t buffer_sz=DEFAULT_SIZE;
size_t i=0;
while(!feof(fp)){
  buffer[i]=fgetc(fp);
  i++;
  if(i>=buffer_sz){
    buffer_sz+=STEP_SIZE;
    void *tmp=buffer;
    buffer=realloc(buffer,buffer_sz);
    if(buffer==null){ free(tmp); exit(1);} //ensure we don't have a memory leak
  }
}
buffer[i]=0;

Not tested, but should work.. And yes, it could be better implemented with fread, I'll leave that as an exercise to the reader.

#define DEFAULT_SIZE 100
#define STEP_SIZE 100

char *buffer[DEFAULT_SIZE];
size_t buffer_sz=DEFAULT_SIZE;
size_t i=0;
while(!feof(fp)){
  buffer[i]=fgetc(fp);
  i++;
  if(i>=buffer_sz){
    buffer_sz+=STEP_SIZE;
    void *tmp=buffer;
    buffer=realloc(buffer,buffer_sz);
    if(buffer==null){ free(tmp); exit(1);} //ensure we don't have a memory leak
  }
}
buffer[i]=0;
娇纵 2024-08-24 05:13:12

你考虑过 mmap() 吗?您可以直接从文件中读取,就像它已经在内存中一样。

http://beej.us/guide/bgipc/output/html/多页/mmap.html

Have you considered mmap()? You can read from the file directly as if it were already in memory.

http://beej.us/guide/bgipc/output/html/multipage/mmap.html

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文