C 程序的文件输出在计算行数时表现异常
我正在使用 C 解析一个大的平面文件并将相关行输出到输出文件中。输出文件应约为 70,000 行。
如果我在 gedit 中打开该文件,它会完全按照预期显示,并具有正确的行数和行长度。
但是,运行 wc -l <file>
返回 13,156。 grep -c ""
也是如此。
tail
返回我在 gedit 中看到的最后 10 行。 head
返回前 10 行。但是 tail -n +8000 | head -n 1 应该返回第 8,000 行,它返回我在 gedit 中第 34,804 行看到的文本。
如果我在文件中缺少换行符,我会期望得到这些结果。但gedit似乎没有问题。此外,显示最大行长度的 wc -L
按预期返回 142 字节。正如预期的那样,文件大小略高于 9,000,000 字节。
如果 wc -L
= 142,并且 wc -c
= 9046609,那么如何 wc -l <file> = 13156?
有谁知道我写入此文件时做错了什么?
I am using C to parse a large flat file and output relevant lines into an output file. The output file should be around 70,000 lines.
If I open the file in gedit, it displays exactly as expected, with the correct number of lines and line lengths.
However, running wc -l <file>
returns 13,156. So does grep -c "" <file>
.
tail <file>
returns the last 10 lines that I see in gedit. head <file>
returns the first 10 lines. But tail -n +8000 | head -n 1
, which should return the 8,000th line, returns the text that I see on line 34,804 in gedit.
I'd expect these results if I was missing newline characters in the file. But gedit doesn't seem to have a problem with it. Additionally, wc -L <file>
, which displays the maximum line length, returns 142 bytes, as expected. The size of the file is a little over 9,000,000 bytes, as also expected.
If wc -L <file>
= 142, and wc -c <file>
= 9046609, then how can can wc -l <file>
= 13156?
Does anyone know what I did wrong when writing to this file?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
它可能是回车符('\r')和换行符('\n')的一些奇怪组合。
假设您有 GNU Coreutils 版本的“tr”,您可以使用以下命令来计算文件中每个字符的数量:
对于普通的 Unix 样式文本文件,第二个命令应打印 0。 对于 Windows 样式文本文件,两者应该打印相同的数字。
“file”命令也可能会告诉您一些有用的信息。
It's probably some odd combination of return ('\r') and linefeed ('\n') characters.
Assuming you have the GNU Coreutils version of "tr", you can use these commands to count the number of each character in the file:
For a normal Unix-style text file, the second command should print 0. For a Windows-style text file, both should print the same number.
The "file" command will also probably tell you something useful.