用于读取行的 C fgets 与 fgetc
我需要读取一行文本(以换行符结尾)而不对长度做出假设。所以我现在面临可能性:
- 使用 fgets 并每次检查最后一个字符是否是换行符并连续附加到缓冲区
- 使用 fgetc 偶尔使用
读取每个字符realloc
缓冲区
直觉告诉我,fgetc
变体可能会更慢,但话又说回来,我不明白 fgets
如何在不检查每个字符的情况下做到这一点(也我的直觉并不总是那么好)。线路相当大,因此性能很重要。
我想知道每种方法的优点和缺点。先感谢您。
I need to read a line of text (terminated by a newline) without making assumptions about the length. So I now face to possibilities:
- Use
fgets
and check each time if the last character is a newline and continuously append to a buffer - Read each character using
fgetc
and occasionallyrealloc
the buffer
Intuition tells me the fgetc
variant might be slower, but then again I don't see how fgets
can do it without examining every character (also my intuition isn't always that good). The lines are quite large so the performance is important.
I would like to know the pros and cons of each approach. Thank you in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我建议使用
fgets()
与动态内存分配相结合 - 或者您可以研究getline()
符合 POSIX 2008 标准,可在较新的 Linux 计算机上使用。这会为你完成内存分配的工作。您需要密切关注缓冲区长度及其地址 - 因此您甚至可以自己创建一个结构来处理信息。虽然 fgetc() 也可以工作,但它稍微复杂一些——但也只是稍微复杂一些。在幕后,它使用与 fgets() 相同的机制。内部可能能够利用更快的操作 - 类似于
strchr()
- 当您直接调用fgetc()
时,这是不可用的。I suggest using
fgets()
coupled with dynamic memory allocation - or you can investigate the interface togetline()
that is in the POSIX 2008 standard and available on more recent Linux machines. That does the memory allocation stuff for you. You need to keep tabs on the buffer length as well as its address - so you might even create yourself a structure to handle the information.Although
fgetc()
also works, it is marginally fiddlier - but only marginally so. Underneath the covers, it uses the same mechanisms asfgets()
. The internals may be able to exploit speedier operation - analogous tostrchr()
- that are not available when you callfgetc()
directly.您的环境是否提供
getline(3)
函数?如果是这样,我会说去吧。我认为最大的优点是它会自行分配缓冲区(如果您愿意),并且如果缓冲区太小,则会使用 realloc() 传入您传入的缓冲区。 (所以这意味着您需要传入从
malloc()
获取的内容)。这消除了 fgets/fgetc 的一些痛苦,并且您可以希望编写实现它的 C 库的人能够提高其效率。
额外奖励:Linux 的手册页有一个很好的示例,说明如何有效地使用它。
Does your environment provide the
getline(3)
function? If so, I'd say go for that.The big advantage I see is that it allocates the buffer itself (if you want), and will
realloc()
the buffer you pass in if it's too small. (So this means you need to pass in something gotten frommalloc()
).This gets rid of some of the pain of fgets/fgetc, and you can hope that whoever wrote the C library that implements it took care of making it efficient.
Bonus: the man page on Linux has a nice example of how to use it in an efficient manner.
如果性能对您来说很重要,您通常希望调用
getc
而不是fgetc
。该标准试图使getc
更容易实现为宏,以避免函数调用开销。除此之外,要处理的主要问题可能是分配缓冲区的策略。大多数人使用固定增量(例如,当/如果我们用完空间,则分配另外 128 个字节)。我建议改为使用常量因子,因此,如果空间不足,请分配一个缓冲区,该缓冲区是先前大小的 1 1/2 倍。
特别是当
getc
作为宏实现时,getc
和fgets
之间的差异通常非常小,因此您最好不要专注于其他问题。If performance matters much to you, you generally want to call
getc
instead offgetc
. The standard tries to make it easier to implementgetc
as a macro to avoid function call overhead.Past that, the main thing to deal with is probably your strategy in allocating the buffer. Most people use fixed increments (e.g., when/if we run out of space, allocate another 128 bytes). I'd advise instead using a constant factor, so if you run out of space allocate a buffer that's, say, 1 1/2 times the previous size.
Especially when
getc
is implemented as a macro, the difference betweengetc
andfgets
is usually quite minimal, so you're best off concentrating on other issues.如果您可以设置最大行长度,即使是很大的长度,那么一个
fgets
就可以解决问题。如果不是,多个fgets
调用仍然比多个fgetc
调用更快,因为后者的开销会更大。不过,更好的答案是,除非必须,否则不值得担心性能差异。如果 fgetc 足够快,那又有什么关系呢?
If you can set a maximum line length, even a large one, then one
fgets
would do the trick. If not, multiplefgets
calls will still be faster than multiplefgetc
calls because the overhead of the latter will be greater.A better answer, though, is that it's not worth worrying about the performance difference until and unless you have to. If
fgetc
is fast enough, what does it matter?我会分配一个大缓冲区,然后使用 fgets、检查、重新分配并重复(如果您还没有读到行尾)。
每次读取(通过 fgetc 或 fgets)时,您都会进行系统调用,这需要时间,您希望最大限度地减少发生的次数,因此调用 fgets 的次数更少,并且在内存中迭代速度更快。
如果您正在读取文件,则在文件中使用
mmap()
是另一种选择。I would allocate a large buffer and then use fgets, checking, reallocing and repeating if you haven't read to the end of the line.
Each time you read (either via fgetc or fgets) you are making a system call which takes time, you want to minimize the number of times that happens, so calling fgets fewer times and iterating in memory is faster.
If you are reading from a file,
mmap()
ing in the file is another option.