C语言编程的文件处理问题
我想从给定的输入文件中逐行读取,处理每一行(即它的单词),然后移动到其他行...
所以我使用 fscanf(fptr,"%s",words) 来读取这个单词,一旦遇到行尾,它应该停止...
但这在 fscanf 中是不可能的,我猜...所以请告诉我该怎么做...
我应该阅读中的所有单词给定的行(即应该遇到行尾)终止,然后移动到其他行,并重复相同的过程。
I want to read line-by-line from a given input file,, process each line (i.e. its words) and then move on to other line...
So i am using fscanf(fptr,"%s",words) to read the word and it should stop once it encounters end of line...
but this is not possible in fscanf, i guess... so please tell me the way as to what to do...
I should read all the words in the given line (i.e. end of line should be encountered) to terminate and then move on to other line, and repeat the same process..
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
使用fgets()。 是的,链接是 cplusplus,但它源自 c
stdio.h
。您还可以使用 sscanf() 来从字符串中读取单词,或者只是使用 strtok() 来分隔它们。
响应评论:
fgets()
的这种行为(在字符串中保留\n
)允许您确定实际的结束是否遇到了-line。 请注意,如果提供的缓冲区不够大,fgets()
也可能只从文件中读取部分行。 在你的情况下 - 只需检查最后的\n
并将其删除(如果不需要)。 像这样:就这么简单。
Use fgets(). Yeah, link is to cplusplus, but it originates from c
stdio.h
.You may also use
sscanf()
to read words from string, or juststrtok()
to separate them.In response to comment: this behavior of
fgets()
(leaving\n
in the string) allows you to determine if the actual end-of-line was encountered. Note, thatfgets()
may also read only part of the line from file if supplied buffer is not large enough. In your case - just check for\n
in the end and remove it, if you don't need it. Something like this:Simple as that.
如果您正在使用可用的 GNU 扩展的系统,则有一个名为 getline (man 3 getline) 的东西,它允许您逐行读取文件,而 getline 会在需要时为您分配额外的内存。 联机帮助页包含一个示例,我修改了该示例以使用 strtok (man 3 strtrok) 分割行。
If you are working on a system with the GNU extensions available there is something called getline (man 3 getline) which allows you to read a file on a line by line basis, while getline will allocate extra memory for you if needed. The manpage contains an example which I modified to split the line using strtok (man 3 strtrok).
考虑到所有 stdio 函数固有的缓冲,我很想使用 getc() 逐个字符地读取流。 一个简单的有限状态机可以识别字边界,如果需要的话还可以识别行边界。 一个优点是完全没有缓冲区溢出,除了您在进一步处理需要时收集当前单词的任何缓冲区之外。
您可能想要做一个快速基准测试,比较使用 getc() 与 fgets() 完全读取大文件所需的时间...
如果外部约束要求文件确实一次读取一行(例如,如果您需要处理来自 tty 的面向行的输入),那么 fgets() 可能是您的朋友,正如其他答案指出的那样,但即使如此,只要输入流在行缓冲中运行, getc() 方法也可能是可以接受的如果 stdin 在 tty 上,则该模式是 stdin 的常见模式。
编辑:要控制输入流上的缓冲区,您可能需要调用 setbuf() 或 setvbuf() 来强制其进入缓冲模式。 如果输入流最终没有缓冲,那么使用某种形式的显式缓冲区总是比原始流上的 getc() 更快。
最佳性能可能会使用与磁盘 I/O 相关的缓冲区,大小至少为两个磁盘块,甚至可能更多。 通常,即使是这样的性能也可以通过将输入安排为内存映射文件并在处理文件时依赖内核分页来读取和填充缓冲区,就好像它是一个巨大的字符串一样。
无论选择如何,如果性能很重要,那么您将需要对多种方法进行基准测试,并选择最适合您的平台的一种。 即便如此,如果你的问题被编写、调试和使用,最简单的表达仍然可能是最好的整体答案。
Given the buffering inherent in all the stdio functions, I would be tempted to read the stream character by character with getc(). A simple finite state machine can identify word boundaries, and line boundaries if needed. An advantage is the complete lack of buffers to overflow, aside from whatever buffer you collect the current word in if your further processing requires it.
You might want to do a quick benchmark comparing the time required to read a large file completely with getc() vs. fgets()...
If an outside constraint requires that the file really be read a line at a time (for instance, if you need to handle line-oriented input from a tty) then fgets() probably is your friend as other answers point out, but even then the getc() approach may be acceptable as long as the input stream is running in line-buffered mode which is common for stdin if stdin is on a tty.
Edit: To have control over the buffer on the input stream, you might need to call setbuf() or setvbuf() to force it to a buffered mode. If the input stream ends up unbuffered, then using an explicit buffer of some form will always be faster than getc() on a raw stream.
Best performance would probably use a buffer related to your disk I/O, at least two disk blocks in size and probably a lot more than that. Often, even that performance can be beat by arranging the input to be a memory mapped file and relying on the kernel's paging to read and fill the buffer as you process the file as if it were one giant string.
Regardless of the choice, if performance is going to matter then you will want to benchmark several approaches and pick the one that works best in your platform. And even then, the simplest expression of your problem may still be the best overall answer if it gets written, debugged and used.
它是有一点邪恶的;)
更新:对邪恶的更多澄清
请注意,
xstr(MAXLINE) [^\n]
读取MAXLINE
字符,该字符可以是除换行符之外的任何字符(即\n
) 。 如果该行的长度超过MAXLINE*[^\n]
会拒绝任何内容(这就是*
字符存在的原因) code> 个字符,但 NOT 包括换行符。 换行符告诉scanf
停止匹配。 如果我们按照蜻蜓的建议去做怎么办? 唯一的问题是 scanf 不知道在哪里停止,并且会继续抑制赋值,直到命中下一个换行符(这是第一部分的另一个匹配项)。 因此,您在报告时将跟踪一行输入。如果你想循环阅读怎么办? 需要进行一些修改。 我们需要添加一个
getchar()
来使用不匹配的换行符。 这是代码:It is, with a bit of wickedness ;)
Update: More clarification on evilness
Note that
xstr(MAXLINE) [^\n]
readsMAXLINE
characters which can be anything except the newline character (i.e.\n
). The second part of the specifier i.e.*[^\n]
rejects anything (that's why the*
character is there) if the line has more thanMAXLINE
characters upto but NOT including the newline character. The newline character tellsscanf
to stop matching. What if we did as dragonfly suggested? The only problem isscanf
will not know where to stop and will keep suppressing assignment until the next newline is hit (which is another match for the first part). Hence you will trail by one line of input when reporting.What if you wanted to read in a loop? A little modification is required. We need to add a
getchar()
to consume the unmatched newline. Here's the code: