读入文本文件 - 一次 1 个字符。使用C

发布于 2024-10-01 22:46:24 字数 488 浏览 5 评论 0原文

我正在尝试逐行读取文本文件并单独处理每个字符。

例如,我的文本文件中的一行可能如下所示: ABC XXXX XXXXXXXX ABC

行中总会有不同数量的空格。但字符数相同(包括空格)。

这就是我到目前为止所拥有的...

char currentLine[100];
fgets(currentLine, 22, inputFile);

然后我尝试迭代 currentLine 数组并处理每个字符...

for (j = 0; j<22; j++) {
    if (&currentLine[j] == 'x') {
        // character is an x... do something
     }
}

任何人都可以帮助我知道我应该如何做到这一点?

正如您可能知道的那样 - 我刚刚开始使用 C。

I'm trying to read in a text file line by line and process each character individually.

For example, one line in my text file might look like this:
ABC XXXX XXXXXXXX ABC

There will always be a different amount of spaces in the line. But the same number of characters (including spaces).

This is what I have so far...

char currentLine[100];
fgets(currentLine, 22, inputFile);

I'm then trying to iterate through the currentLine Array and work with each character...

for (j = 0; j<22; j++) {
    if (¤tLine[j] == 'x') {
        // character is an x... do something
     }
}

Can anyone help me with how I should be doing this?

As you can probably tell - I've just started using C.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

半夏半凉 2024-10-08 22:46:24

如下所示是逐字符处理文件的规范方法:

#include <stdio.h>

int main(int argc, char **argv) 
{

    FILE *fp;
    int c;

    if (argc != 2) {
        fprintf(stderr, "Usage: %s file.txt\n", argv[0]);
        exit(1);
    }
    if (!(fp = fopen(argv[1], "rt"))) {
        perror(argv[1]);
        exit(1);
    }
    while ((c = fgetc(fp)) != EOF) {

        // now do something with each character, c.

    }
    fclose(fp);
    return 0;
}

请注意,c 声明为 int,而不是 char,因为 EOF 的值不同于可存储在 char 中的所有字符。

对于更复杂的解析,一次一行读取文件通常是正确的方法。然而,您会希望对格式不正确的输入数据采取更多防御措施。本质上,编写代码时假设外部世界是敌对的。永远不要假设该文件是完整的,即使它是您刚刚编写的文件。

例如,您使用 100 个字符的缓冲区来读取行,但将读取量限制为 22 个字符(可能是因为您知道 22 是“正确”的行长度)。额外的缓冲区空间很好,但您应该考虑到文件可能包含长度错误的行的可能性。即使这是一个错误,您也必须决定如何处理该错误,并重新同步您的进程或放弃它。

编辑:我已经为规范的简单情况添加了程序的假定其余部分的一些框架。对于 C 的新用户,有几件事需要指出。首先,我假设有一个简单的命令行界面来获取要处理的文件的名称,并使用 argc 验证参数确实存在。如果没有,我会利用 argv[0] 的内容打印一条简短的使用消息,按照惯例,该消息以某种有用的方式命名当前程序,并以非零状态退出。

我打开文件以文本模式阅读。文本模式和二进制模式之间的区别在 Unix 平台上并不重要,但在其他平台上可能很重要,尤其是 Windows。由于讨论的是一次处理一个字符的文件,我假设该文件是文本而不是二进制。如果 fopen() 失败,则返回 NULL 并将全局变量 errno 设置为失败原因的描述性代码。对 perror() 的调用会将 errno 转换为人类可读的内容,并将其与提供的字符串一起打印。在这里,我提供了我们尝试打开的文件的名称。结果将类似于“foo.txt:没有这样的文件”。在这种情况下,我们也以非零状态退出。我没有打扰,但出于不同的原因以不同的非零状态代码退出通常是明智的,这可以帮助 shell 脚本更好地理解错误。

最后,我关闭该文件。原则上,我还应该测试 fclose() 是否失败。对于仅读取文件的进程,大多数错误条件已被检测为某种内容错误,并且在结束时不会添加任何有用的状态。然而,对于文件写入,您可能直到调用 fclose() 才发现某些 I/O 错误。编写文件时,最好检查返回代码并期望在接触该文件的任何调用中处理 I/O 错误。

Something like the following is the canonical way to process a file character by character:

#include <stdio.h>

int main(int argc, char **argv) 
{

    FILE *fp;
    int c;

    if (argc != 2) {
        fprintf(stderr, "Usage: %s file.txt\n", argv[0]);
        exit(1);
    }
    if (!(fp = fopen(argv[1], "rt"))) {
        perror(argv[1]);
        exit(1);
    }
    while ((c = fgetc(fp)) != EOF) {

        // now do something with each character, c.

    }
    fclose(fp);
    return 0;
}

Note that c is declared int, not char because EOF has a value that is distinct from all characters that can be stored in a char.

For more complex parsing, then reading the file a line at a time is generally the right approach. You will, however, want to be much more defensive against input data that is not formatted correctly. Essentially, write the code to assume that the outside world is hostile. Never assume that the file is intact, even if it is a file that you just wrote.

For example, you are using a 100 character buffer to read lines, but limiting the amount read to 22 characters (probably because you know that 22 is the "correct" line length). The extra buffer space is fine, but you should allow for the possibility that the file might contain a line that is the wrong length. Even if that is an error, you have to decide how to handle that error and either resynchronize your process or abandon it.

Edit: I've added some skeleton of an assumed rest of the program for the canonical simple case. There are couple of things to point out there for new users of C. First, I've assumed a simple command line interface to get the name of the file to process, and verified using argc that an argument is really present. If not, I print a brief usage message taking advantage of the content of argv[0] which by convention names the current program in some useful way, and exit with a non-zero status.

I open the file for reading in text mode. The distinction between text and binary modes is unimportant on Unix platforms, but can be important on others, especially Windows. Since the discussion is of processing the file a character at a time, I'm assuming that the file is text and not binary. If fopen() fails, then it returns NULL and sets the global variable errno to a descriptive code for why it failed. The call to perror() translates errno to something human-readable and prints it along with a provided string. Here I've provided the name of the file we attempted to open. The result will look something like "foo.txt: no such file". We also exit with non-zero status in this case. I haven't bothered, but it is often sensible to exit with distinct non-zero status codes for distinct reasons, which can help shell scripts make better sense of errors.

Finally, I close the file. In principle, I should also test the fclose() for failure. For a process that just reads a file, most error conditions will already have been detected as some kind of content error, and there will be no useful status added at the close. For file writing, however, you might not discover certain I/O errors until the call to fclose(). When writing a file it is good practice to check return codes and expect to handle I/O errors at any call that touches the file.

戏蝶舞 2024-10-08 22:46:24

您不需要地址运算符 (&)。您试图将变量 currentLine[j] 的值与“x”进行比较,而不是它的地址。

You don't need the address operator (&). You're trying to compare the value of the variable currentLine[j] to 'x', not it's address.

无所的.畏惧 2024-10-08 22:46:24

ABC XXXX XXXXXXXX ABC 有 21 个字符。还有换行符(22 个字符)和终止空字节(23 个字符)。

您需要 fgets(currentLine, 23, inputFile); 来读取整行。

但是您将 currentLine 声明为一个 100 的数组。为什么不使用全部呢?

fgets(currentLine, sizeof currentLine, inputFile);

当使用所有这些时,并不意味着每次调用 fgets 时系统都会放置多于一行的内容。 fgets 总是在读取 '\n' 后停止。

ABC XXXX XXXXXXXX ABC has 21 characters. There's also the line break (22 chars) and the terminating null byte (23 chars).

You need to fgets(currentLine, 23, inputFile); to read the full line.

But you declared currentLine as an array of 100. Why not use all of it?

fgets(currentLine, sizeof currentLine, inputFile);

When using all of it, it doesn't mean that the system will put more than a line each time fgets is called. fgets always stops after reading a '\n'.

裸钻 2024-10-08 22:46:24

尝试

while( fgets(currentLine, 100, inputFile) ) {
    for (j = 0; j<22; j++) {
        if (/*&*/currentLine[j] == 'x') { /* <--- without & */
        // character is an x... do something
        }
    }
}

Try

while( fgets(currentLine, 100, inputFile) ) {
    for (j = 0; j<22; j++) {
        if (/*&*/currentLine[j] == 'x') { /* <--- without & */
        // character is an x... do something
        }
    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文