用于读取行的 C fgets 与 fgetc

发布于 2024-10-20 08:07:34 字数 307 浏览 3 评论 0原文

我需要读取一行文本(以换行符结尾)而不对长度做出假设。所以我现在面临可能性:

  • 使用 fgets 并每次检查最后一个字符是否是换行符并连续附加到缓冲区
  • 使用 fgetc 偶尔使用 读取每个字符realloc 缓冲区

直觉告诉我,fgetc 变体可能会更慢,但话又说回来,我不明白 fgets 如何在不检查每个字符的情况下做到这一点(也我的直觉并不总是那么好)。线路相当大,因此性能很重要。

我想知道每种方法的优点和缺点。先感谢您。

I need to read a line of text (terminated by a newline) without making assumptions about the length. So I now face to possibilities:

  • Use fgets and check each time if the last character is a newline and continuously append to a buffer
  • Read each character using fgetc and occasionally realloc the buffer

Intuition tells me the fgetc variant might be slower, but then again I don't see how fgets can do it without examining every character (also my intuition isn't always that good). The lines are quite large so the performance is important.

I would like to know the pros and cons of each approach. Thank you in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

童话里做英雄 2024-10-27 08:07:34

我建议使用 fgets() 与动态内存分配相结合 - 或者您可以研究 getline() 符合 POSIX 2008 标准,可在较新的 Linux 计算机上使用。这会为你完成内存分配的工作。您需要密切关注缓冲区长度及其地址 - 因此您甚至可以自己创建一个结构来处理信息。

虽然 fgetc() 也可以工作,但它稍微复杂一些——但也只是稍微复杂一些。在幕后,它使用与 fgets() 相同的机制。内部可能能够利用更快的操作 - 类似于 strchr() - 当您直接调用 fgetc() 时,这是不可用的。

I suggest using fgets() coupled with dynamic memory allocation - or you can investigate the interface to getline() that is in the POSIX 2008 standard and available on more recent Linux machines. That does the memory allocation stuff for you. You need to keep tabs on the buffer length as well as its address - so you might even create yourself a structure to handle the information.

Although fgetc() also works, it is marginally fiddlier - but only marginally so. Underneath the covers, it uses the same mechanisms as fgets(). The internals may be able to exploit speedier operation - analogous to strchr() - that are not available when you call fgetc() directly.

橙幽之幻 2024-10-27 08:07:34

您的环境是否提供 getline(3) 函数?如果是这样,我会说去吧。

我认为最大的优点是它会自行分配缓冲区(如果您愿意),并且如果缓冲区太小,则会使用 realloc() 传入您传入的缓冲区。 (所以这意味着您需要传入从 malloc() 获取的内容)。

这消除了 fgets/fgetc 的一些痛苦,并且您可以希望编写实现它的 C 库的人能够提高其效率。

额外奖励:Linux 的手册页有一个很好的示例,说明如何有效地使用它。

Does your environment provide the getline(3) function? If so, I'd say go for that.

The big advantage I see is that it allocates the buffer itself (if you want), and will realloc() the buffer you pass in if it's too small. (So this means you need to pass in something gotten from malloc()).

This gets rid of some of the pain of fgets/fgetc, and you can hope that whoever wrote the C library that implements it took care of making it efficient.

Bonus: the man page on Linux has a nice example of how to use it in an efficient manner.

冬天旳寂寞 2024-10-27 08:07:34

如果性能对您来说很重要,您通常希望调用 getc 而不是 fgetc。该标准试图使 getc 更容易实现为宏,以避免函数调用开销。

除此之外,要处理的主要问题可能是分配缓冲区的策略。大多数人使用固定增量(例如,当/如果我们用完空间,则分配另外 128 个字节)。我建议改为使用常量因子,因此,如果空间不足,请分配一个缓冲区,该缓冲区是先前大小的 1 1/2 倍。

特别是当 getc 作为宏实现时,getcfgets 之间的差异通常非常小,因此您最好不要专注于其他问题。

If performance matters much to you, you generally want to call getc instead of fgetc. The standard tries to make it easier to implement getc as a macro to avoid function call overhead.

Past that, the main thing to deal with is probably your strategy in allocating the buffer. Most people use fixed increments (e.g., when/if we run out of space, allocate another 128 bytes). I'd advise instead using a constant factor, so if you run out of space allocate a buffer that's, say, 1 1/2 times the previous size.

Especially when getc is implemented as a macro, the difference between getc and fgets is usually quite minimal, so you're best off concentrating on other issues.

海螺姑娘 2024-10-27 08:07:34

如果您可以设置最大行长度,即使是很大的长度,那么一个 fgets 就可以解决问题。如果不是,多个 fgets 调用仍然比多个 fgetc 调用更快,因为后者的开销会更大。

不过,更好的答案是,除非必须,否则不值得担心性能差异。如果 fgetc 足够快,那又有什么关系呢?

If you can set a maximum line length, even a large one, then one fgets would do the trick. If not, multiple fgets calls will still be faster than multiple fgetc calls because the overhead of the latter will be greater.

A better answer, though, is that it's not worth worrying about the performance difference until and unless you have to. If fgetc is fast enough, what does it matter?

洒一地阳光 2024-10-27 08:07:34

我会分配一个大缓冲区,然后使用 fgets、检查、重新分配并重复(如果您还没有读到行尾)。

每次读取(通过 fgetc 或 fgets)时,您都会进行系统调用,这需要时间,您希望最大限度地减少发生的次数,因此调用 fgets 的次数更少,并且在内存中迭代速度更快。

如果您正在读取文件,则在文件中使用 mmap() 是另一种选择。

I would allocate a large buffer and then use fgets, checking, reallocing and repeating if you haven't read to the end of the line.

Each time you read (either via fgetc or fgets) you are making a system call which takes time, you want to minimize the number of times that happens, so calling fgets fewer times and iterating in memory is faster.

If you are reading from a file, mmap()ing in the file is another option.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文