为什么是 “while( !feof(file) )”总是错的？

发布于 2024-10-26 17:42:46 字数 553 浏览 12 评论 0原文

使用 feof() 控制读取循环有什么问题？例如：

#include <stdio.h>
#include <stdlib.h>

int
main(int argc, char **argv)
{
    char *path = "stdin";
    FILE *fp = argc > 1 ? fopen(path=argv[1], "r") : stdin;

    if( fp == NULL ){
        perror(path);
        return EXIT_FAILURE;
    }

    while( !feof(fp) ){  /* THIS IS WRONG */
        /* Read and process data from file… */
    }
    if( fclose(fp) != 0 ){
        perror(path);
        return EXIT_FAILURE;
    }
    return EXIT_SUCCESS;
}

这个循环有什么问题？

原文

What is wrong with using feof() to control a read loop? For example:

#include <stdio.h>
#include <stdlib.h>

int
main(int argc, char **argv)
{
    char *path = "stdin";
    FILE *fp = argc > 1 ? fopen(path=argv[1], "r") : stdin;

    if( fp == NULL ){
        perror(path);
        return EXIT_FAILURE;
    }

    while( !feof(fp) ){  /* THIS IS WRONG */
        /* Read and process data from file… */
    }
    if( fclose(fp) != 0 ){
        perror(path);
        return EXIT_FAILURE;
    }
    return EXIT_SUCCESS;
}

What is wrong with this loop?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

单身狗的梦 2024-11-02 17:42:46

TL;DR

while(!feof(file)) 是错误的，因为它测试不相关的内容，并且无法测试您需要知道的内容。结果是您错误地执行代码，并假设它正在访问已成功读取的数据，而实际上这从未发生过。

我想提供一个抽象的、高层次的视角。因此，如果您对 while(!feof(file)) 的实际作用感兴趣，请继续阅读。

并发性和同时性

I/O 操作与环境交互。环境不是您程序的一部分，也不在您的控制之下。该环境确实与您的程序“同时”存在。与所有并发事件一样，关于“当前状态”的问题没有意义：并发事件之间没有“同时性”的概念。状态的许多属性根本无法同时存在。

让我更准确地说：假设您想问“您有更多数据吗”。您可以向并发容器或 I/O 系统询问这一点。但答案通常是无法采取行动的，因此毫无意义。如果容器说“是”怎么办——当您尝试读取时，它可能不再有数据。同样，如果答案是“否”，那么当您尝试读取时，数据可能已经到达。结论是，根本不存在像“我有数据”这样的属性，因为您无法对任何可能的答案采取有意义的行动。（使用缓冲输入时情况会稍微好一些，您可能会得到一个“是的，我有数据”，这构成了某种保证，但您仍然必须能够处理相反的情况。而对于输出情况当然正如我所描述的那样糟糕：你永远不知道该磁盘或网络缓冲区是否已满。）

因此我们得出的结论是，请求 I/O 是不可能的，而且实际上是不合理的。系统是否能够执行 I/O 操作。我们与它交互的唯一可能的方式（就像与并发容器一样）是尝试操作并检查它是成功还是失败。当你与环境交互的那一刻，只有那时你才能知道交互是否实际上是可能的，此时你必须致力于执行交互。（如果您愿意的话，这是一个“同步点”。）

EOF

现在我们讨论 EOF。 EOF 是您从尝试 I/O 操作中获得的响应。这意味着您试图读取或写入某些内容，但这样做时您无法读取或写入任何数据，而是遇到了输入或输出的结尾。基本上所有 I/O API 都是如此，无论是 C 标准库、C++ iostream 还是其他库。只要 I/O 操作成功，您就无法知道进一步的未来操作是否会成功。您必须始终首先尝试该操作，然后对成功或失败做出响应。

示例

在每个示例中，请仔细注意，我们首先尝试 I/O 操作，然后使用有效的结果。进一步注意，我们总是必须使用 I/O 操作的结果，尽管结果在每个示例中采用不同的形状和形式。

C stdio，从文件中读取：
<前><代码> for (;;) {
size_t n = fread(buf, 1, bufsize, infile);
消耗（buf，n）；
if (n == 0) { 中断; }
}
我们必须使用的结果是n，即读取的元素数量（可能小至零）。
C stdio，scanf：
```
 for (int a, b, c; scanf("%d %d %d", &a, &b, &c) == 3; ) {
      消耗（a，b，c）；
  }
```
我们必须使用的结果是scanf的返回值，即转换后的元素数量。
C++、iostreams格式化提取：
```
 for (int n; std::cin >> n; ) {
      消耗（n）；
  }
```
我们必须使用的结果是 std::cin 本身，它可以在布尔上下文中进行计算，并告诉我们流是否仍在 good() 中状态。
C++，iostreams getline：
```
 for (std::string line; std::getline(std::cin, line); ) {
      消耗（线）；
  }
```
我们必须使用的结果仍然是 std::cin，就像以前一样。

POSIX，write(2) 刷新缓冲区：

 char const * p = buf;
  ssize_t n = 缓冲区大小；
  for (ssize_t k = bufsize; (k = write(fd, p, n)) > 0; p += k, n -= k) {}
  if (n != 0) { /* 错误，无法写入完整缓冲区 */ }

我们这里使用的结果是k，即写入的字节数。这里的要点是，我们只能知道写操作之后写入了多少字节。

POSIX getline()< /p>
```
 char *buffer = NULL;
  size_t bufsiz = 0;
  ssize_t nbytes；
  while ((nbytes = getline(&buffer, &bufsiz, fp)) != -1)
  {
      /* 使用缓冲区中的 nbytes 数据 */
  }
  自由（缓冲区）；
```
我们必须使用的结果是nbytes，即直到并包括换行符的字节数（如果文件不以换行符结尾，则为 EOF）。
请注意，当发生错误或达到 EOF 时，该函数显式返回 -1（而不是 EOF！）。

您可能会注意到，我们很少拼写出实际的单词“EOF”。我们通常以其他一些我们更感兴趣的方式来检测错误情况（例如，未能执行我们期望的 I/O）。在每个示例中，都有一些 API 功能可以明确地告诉我们已经遇到了 EOF 状态，但这实际上并不是一个非常有用的信息。它比我们通常关心的细节要多得多。重要的是 I/O 是否成功，而不是如何失败。

实际查询 EOF 状态的最后一个示例：假设您有一个字符串，并且想要测试它是否代表一个完整的整数，除了空格之外，末尾没有任何额外的位。使用 C++ iostreams，它是这样的：

 std::string 输入 = " 123 "; // 例子

  std::istringstream iss(输入);
  整数值；
  if (iss >> 值 >> std::ws && iss.get() == EOF) {
      消耗（价值）；
  } 别的 {
      // 错误，“输入”无法解析为整数
  }

我们在这里使用两个结果。第一个是 iss，即流对象本身，用于检查对 value 的格式化提取是否成功。但是，在消耗完空白之后，我们执行另一个 I/O/ 操作，iss.get()，并预计它会因 EOF 而失败，如果整个字符串已被消耗，就会出现这种情况通过格式化提取。

在 C 标准库中，您可以通过检查结束指针是否到达输入字符串的末尾来实现与 strto*l 函数类似的功能。

TL;DR

while(!feof(file)) is wrong because it tests for something that is irrelevant and fails to test for something that you need to know. The result is that you are erroneously executing code that assumes that it is accessing data that was read successfully, when in fact this never happened.

I'd like to provide an abstract, high-level perspective. So continue reading if you're interested in what while(!feof(file)) actually does.

Concurrency and simultaneity

I/O operations interact with the environment. The environment is not part of your program, and not under your control. The environment truly exists "concurrently" with your program. As with all things concurrent, questions about the "current state" don't make sense: There is no concept of "simultaneity" across concurrent events. Many properties of state simply don't exist concurrently.

Let me make this more precise: Suppose you want to ask, "do you have more data". You could ask this of a concurrent container, or of your I/O system. But the answer is generally unactionable, and thus meaningless. So what if the container says "yes" – by the time you try reading, it may no longer have data. Similarly, if the answer is "no", by the time you try reading, data may have arrived. The conclusion is that there simply is no property like "I have data", since you cannot act meaningfully in response to any possible answer. (The situation is slightly better with buffered input, where you might conceivably get a "yes, I have data" that constitutes some kind of guarantee, but you would still have to be able to deal with the opposite case. And with output the situation is certainly just as bad as I described: you never know if that disk or that network buffer is full.)

So we conclude that it is impossible, and in fact unreasonable, to ask an I/O system whether it will be able to perform an I/O operation. The only possible way we can interact with it (just as with a concurrent container) is to attempt the operation and check whether it succeeded or failed. At that moment where you interact with the environment, then and only then can you know whether the interaction was actually possible, and at that point you must commit to performing the interaction. (This is a "synchronisation point", if you will.)

EOF

Now we get to EOF. EOF is the response you get from an attempted I/O operation. It means that you were trying to read or write something, but when doing so you failed to read or write any data, and instead the end of the input or output was encountered. This is true for essentially all the I/O APIs, whether it be the C standard library, C++ iostreams, or other libraries. As long as the I/O operations succeed, you simply cannot know whether further, future operations will succeed. You must always first try the operation and then respond to success or failure.

Examples

In each of the examples, note carefully that we first attempt the I/O operation and then consume the result if it is valid. Note further that we always must use the result of the I/O operation, though the result takes different shapes and forms in each example.

C stdio, read from a file:
```
  for (;;) {
      size_t n = fread(buf, 1, bufsize, infile);
      consume(buf, n);
      if (n == 0) { break; }
  }
```
The result we must use is n, the number of elements that were read (which may be as little as zero).
C stdio, scanf:
```
  for (int a, b, c; scanf("%d %d %d", &a, &b, &c) == 3; ) {
      consume(a, b, c);
  }
```
The result we must use is the return value of scanf, the number of elements converted.
C++, iostreams formatted extraction:
```
  for (int n; std::cin >> n; ) {
      consume(n);
  }
```
The result we must use is std::cin itself, which can be evaluated in a boolean context and tells us whether the stream is still in the good() state.

C++, iostreams getline:

  for (std::string line; std::getline(std::cin, line); ) {
      consume(line);
  }

The result we must use is again std::cin, just as before.

POSIX, write(2) to flush a buffer:

  char const * p = buf;
  ssize_t n = bufsize;
  for (ssize_t k = bufsize; (k = write(fd, p, n)) > 0; p += k, n -= k) {}
  if (n != 0) { /* error, failed to write complete buffer */ }

The result we use here is k, the number of bytes written. The point here is that we can only know how many bytes were written after the write operation.

POSIX getline()
```
  char *buffer = NULL;
  size_t bufsiz = 0;
  ssize_t nbytes;
  while ((nbytes = getline(&buffer, &bufsiz, fp)) != -1)
  {
      /* Use nbytes of data in buffer */
  }
  free(buffer);
```
The result we must use is nbytes, the number of bytes up to and including the newline (or EOF if the file did not end with a newline).
Note that the function explicitly returns -1 (and not EOF!) when an error occurs or it reaches EOF.

You may notice that we very rarely spell out the actual word "EOF". We usually detect the error condition in some other way that is more immediately interesting to us (e.g. failure to perform as much I/O as we had desired). In every example there is some API feature that could tell us explicitly that the EOF state has been encountered, but this is in fact not a terribly useful piece of information. It is much more of a detail than we often care about. What matters is whether the I/O succeeded, more-so than how it failed.

A final example that actually queries the EOF state: Suppose you have a string and want to test that it represents an integer in its entirety, with no extra bits at the end except whitespace. Using C++ iostreams, it goes like this:

  std::string input = "   123   ";   // example

  std::istringstream iss(input);
  int value;
  if (iss >> value >> std::ws && iss.get() == EOF) {
      consume(value);
  } else {
      // error, "input" is not parsable as an integer
  }

We use two results here. The first is iss, the stream object itself, to check that the formatted extraction to value succeeded. But then, after also consuming whitespace, we perform another I/O/ operation, iss.get(), and expect it to fail as EOF, which is the case if the entire string has already been consumed by the formatted extraction.

In the C standard library you can achieve something similar with the strto*l functions by checking that the end pointer has reached the end of the input string.

回复收藏 0 原文

回首观望 2024-11-02 17:42:46

这是错误的，因为（在没有读取错误的情况下）它进入循环的次数比作者预期的要多。如果出现读取错误，则循环永远不会终止。

考虑以下代码：

/* WARNING: demonstration of bad coding technique!! */

#include <stdio.h>
#include <stdlib.h>

FILE *Fopen(const char *path, const char *mode);

int
main(int argc, char **argv)
{
    FILE *in = argc > 1 ? Fopen(argv[1], "r") : stdin;
    unsigned count = 0;

    /* WARNING: this is a bug */
    while( !feof(in) ) {  /* This is WRONG! */
        fgetc(in);
        count++;
    }
    printf("Number of characters read: %u\n", count);
    return EXIT_SUCCESS;
}

FILE *
Fopen(const char *path, const char *mode)
{
    FILE *f = fopen(path, mode);
    if( f == NULL ) {
        perror(path);
        exit(EXIT_FAILURE);
    }
    return f;
}

该程序将始终打印比输入流中的字符数大1的字符（假设没有读取错误）。考虑输入流为空的情况：

$ ./a.out < /dev/null
Number of characters read: 1

在这种情况下，在读取任何数据之前调用 feof()，因此它返回 false。进入循环，调用fgetc()（并返回EOF），并且计数递增。然后调用 feof() 并返回 true，导致循环中止。

所有此类情况都会发生这种情况。 feof() 在流读取之后遇到文件末尾之前不会返回 true。 feof() 的目的不是检查下一次读取是否会到达文件末尾。 feof() 的目的是确定先前读取函数的状态
并区分错误条件和数据流结束。如果fread()返回0，则必须使用feof/ferror来确定是否发生了错误或者是否消耗了所有数据。同样，如果fgetc返回EOF。 feof() 仅在 fread 返回零或 fgetc 返回 EOF 之后才有用。在此之前，feof() 将始终返回 0。

始终需要检查读取的返回值（fread() 或 调用 feof() 之前的 fscanf() 或 fgetc())。

更糟糕的是，考虑发生读取错误的情况。在这种情况下，fgetc() 返回 EOF，feof() 返回 false，并且循环永远不会终止。在使用 while(!feof(p)) 的所有情况下，循环内必须至少对 ferror() 进行检查，或者至少检查while 条件应该替换为 while(!feof(p) && !ferror(p)) ，否则很有可能出现无限循环，可能会喷出各种无效的垃圾数据正在处理中。

总而言之，尽管我不能肯定地说，在任何情况下，写“while(!feof(f))”在语义上都是正确的（尽管必须

编辑：正确编写代码的一种方法，演示 feof 和 ferror：

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>

int
main(int argc, char **argv)
{
    FILE *in = stdin;
    unsigned count = 0;

    while( getc(in) != EOF ){
        count++;
    }
    if( feof(in) ){
        printf("Number of characters read: %u\n", count);
    } else if( ferror(in) ){
        perror("stdin");
    } else {
        assert(0);
    }
    return EXIT_SUCCESS;
}

It's wrong because (in the absence of a read error) it enters the loop one more time than the author expects. If there is a read error, the loop never terminates.

Consider the following code:

/* WARNING: demonstration of bad coding technique!! */

#include <stdio.h>
#include <stdlib.h>

FILE *Fopen(const char *path, const char *mode);

int
main(int argc, char **argv)
{
    FILE *in = argc > 1 ? Fopen(argv[1], "r") : stdin;
    unsigned count = 0;

    /* WARNING: this is a bug */
    while( !feof(in) ) {  /* This is WRONG! */
        fgetc(in);
        count++;
    }
    printf("Number of characters read: %u\n", count);
    return EXIT_SUCCESS;
}

FILE *
Fopen(const char *path, const char *mode)
{
    FILE *f = fopen(path, mode);
    if( f == NULL ) {
        perror(path);
        exit(EXIT_FAILURE);
    }
    return f;
}

This program will consistently print one greater than the number of characters in the input stream (assuming no read errors). Consider the case where the input stream is empty:

$ ./a.out < /dev/null
Number of characters read: 1

In this case, feof() is called before any data has been read, so it returns false. The loop is entered, fgetc() is called (and returns EOF), and count is incremented. Then feof() is called and returns true, causing the loop to abort.

This happens in all such cases. feof() does not return true until after a read on the stream encounters the end of file. The purpose of feof() is NOT to check if the next read will reach the end of file. The purpose of feof() is to determine the status of a previous read function
and distinguish between an error condition and the end of the data stream. If fread() returns 0, you must use feof/ferror to decide whether an error occurred or if all of the data was consumed. Similarly if fgetc returns EOF. feof() is only useful after fread has returned zero or fgetc has returned EOF. Before that happens, feof() will always return 0.

It is always necessary to check the return value of a read (either an fread(), or an fscanf(), or an fgetc()) before calling feof().

Even worse, consider the case where a read error occurs. In that case, fgetc() returns EOF, feof() returns false, and the loop never terminates. In all cases where while(!feof(p)) is used, there must be at least a check inside the loop for ferror(), or at the very least the while condition should be replaced with while(!feof(p) && !ferror(p)) or there is a very real possibility of an infinite loop, probably spewing all sorts of garbage as invalid data is being processed.

In summary, although I cannot state with certainty that there is never a situation in which it may be semantically correct to write "while(!feof(f))" (although there must be another check inside the loop with a break to avoid a infinite loop on a read error), it is the case that it is almost certainly always wrong. And even if a case ever arose where it would be correct, it is so idiomatically wrong that it would not be the right way to write the code. Anyone seeing that code should immediately hesitate and say, "that's a bug". And possibly slap the author (unless the author is your boss in which case discretion is advised.)

EDIT: one way to write the code correctly, demonstrating correct usage of feof and ferror:

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>

int
main(int argc, char **argv)
{
    FILE *in = stdin;
    unsigned count = 0;

    while( getc(in) != EOF ){
        count++;
    }
    if( feof(in) ){
        printf("Number of characters read: %u\n", count);
    } else if( ferror(in) ){
        perror("stdin");
    } else {
        assert(0);
    }
    return EXIT_SUCCESS;
}

回复收藏 0 原文

此生挚爱伱 2024-11-02 17:42:46

不，这并不总是错误的。如果您的循环条件是“当我们还没有尝试读取文件末尾之后的内容时”，那么您可以使用 while (!feof(f))。然而，这不是常见的循环条件 - 通常您想测试其他内容（例如“我可以阅读更多内容吗”）。 while (!feof(f)) 并没有错，只是使用错误。

回复收藏 0 原文

与往事干杯 2024-11-02 17:42:46

feof() 指示是否尝试读取超过文件末尾的内容。这意味着它几乎没有预测作用：如果它为 true，您确定下一个输入操作将失败（顺便说一句，您不确定前一个输入操作是否失败），但如果它为 false，您不确定下一个输入操作将会成功。此外，输入操作可能会因文件结尾以外的其他原因而失败（格式化输入的格式错误、纯粹的 IO 故障 - 磁盘故障、网络超时 - 对于所有输入类型），因此即使你可以预测文件末尾（任何尝试实现 Ada 的人都会告诉您，如果您需要跳过空格，它会很复杂，并且它会对交互式设备产生不良影响 - 有时会强制输入下一个）在开始处理前一个之前的行），您必须能够处理失败。

所以C中正确的习惯用法是以IO操作成功作为循环条件进行循环，然后测试失败的原因。例如：

while (fgets(line, sizeof(line), file)) {
    /* note that fgets don't strip the terminating \n, checking its
       presence allow to handle lines longer that sizeof(line), not showed here */
    ...
}
if (ferror(file)) {
   /* IO failure */
} else if (feof(file)) {
   /* format error (not possible with fgets, but would be with fscanf) or end of file */
} else {
   /* format error (not possible with fgets, but would be with fscanf) */
}

feof() indicates if one has tried to read past the end of file. That means it has little predictive effect: if it is true, you are sure that the next input operation will fail (you aren't sure the previous one failed BTW), but if it is false, you aren't sure the next input operation will succeed. More over, input operations may fail for other reasons than the end of file (a format error for formatted input, a pure IO failure -- disk failure, network timeout -- for all input kinds), so even if you could be predictive about the end of file (and anybody who has tried to implement Ada one, which is predictive, will tell you it can complex if you need to skip spaces, and that it has undesirable effects on interactive devices -- sometimes forcing the input of the next line before starting the handling of the previous one), you would have to be able to handle a failure.

So the correct idiom in C is to loop with the IO operation success as loop condition, and then test the cause of the failure. For instance:

while (fgets(line, sizeof(line), file)) {
    /* note that fgets don't strip the terminating \n, checking its
       presence allow to handle lines longer that sizeof(line), not showed here */
    ...
}
if (ferror(file)) {
   /* IO failure */
} else if (feof(file)) {
   /* format error (not possible with fgets, but would be with fscanf) or end of file */
} else {
   /* format error (not possible with fgets, but would be with fscanf) */
}

回复收藏 0 原文

雅心素梦 2024-11-02 17:42:46

这个问题的其他答案都很好，但是有点长。如果你只想要 TL;DR，那就是：

feof(F) 的命名很糟糕。它不是意味着“现在检查F是否位于文件末尾”；相反，它告诉您为什么之前的尝试未能从F获取任何数据。

文件结束状态很容易改变，因为文件可以增大或缩小，并且每次按 ^D 时终端都会报告 EOF 一次（在“cooked”中）模式，在空行上）。

除非您确实关心为什么之前的读取无法返回任何数据，否则最好忘记 feof 函数的存在。

回复收藏 0 原文

水晶透心 2024-11-02 17:42:46

feof() 不是很直观。以我的拙见，如果任何读取操作导致到达文件末尾，则 FILE 的文件结束状态应设置为 true。相反，您必须在每次读取操作后手动检查是否已到达文件末尾。例如，如果使用 fgetc() 从文本文件中读取，这样的方法就可以工作：

#include <stdio.h>

int main(int argc, char *argv[])
{
  FILE *in = fopen("testfile.txt", "r");

  while(1) {
    char c = fgetc(in);
    if (feof(in)) break;
    printf("%c", c);
  }

  fclose(in);
  return 0;
}

如果这样的方法可以工作，那就太好了：

#include <stdio.h>

int main(int argc, char *argv[])
{
  FILE *in = fopen("testfile.txt", "r");

  while(!feof(in)) {
    printf("%c", fgetc(in));
  }

  fclose(in);
  return 0;
}

feof() is not very intuitive. In my very humble opinion, the FILE's end-of-file state should be set to true if any read operation results in the end of file being reached. Instead, you have to manually check if the end of file has been reached after each read operation. For example, something like this will work if reading from a text file using fgetc():

#include <stdio.h>

int main(int argc, char *argv[])
{
  FILE *in = fopen("testfile.txt", "r");

  while(1) {
    char c = fgetc(in);
    if (feof(in)) break;
    printf("%c", c);
  }

  fclose(in);
  return 0;
}

It would be great if something like this would work instead:

#include <stdio.h>

int main(int argc, char *argv[])
{
  FILE *in = fopen("testfile.txt", "r");

  while(!feof(in)) {
    printf("%c", fgetc(in));
  }

  fclose(in);
  return 0;
}

回复收藏 0 原文

~没有更多了~