如何读取仅由 LF 分隔的文件中的每一行?

发布于 2024-07-27 17:18:50 字数 849 浏览 6 评论 0 原文

我必须逐行读取日志文件。 大小约为 6MB,总共 40000 行。 但在测试我的程序后,我发现该日志文件仅由 LF 字符分隔。 所以我无法使用StreamReader类的Readline方法

如何解决这个问题?

编辑:我尝试使用文本阅读器,但我的程序仍然无法工作:

using (TextReader sr = new StreamReader(strPath, Encoding.Unicode))
            {


                sr.ReadLine(); //ignore three first lines of log file
                sr.ReadLine(); 
                sr.ReadLine();

                int count = 0; //number of read line
                string strLine;
                while (sr.Peek()!=0)
                {
                    strLine = sr.ReadLine();
                    if (strLine.Trim() != "")
                    {
                        InsertData(strLine);
                        count++;
                    }
                }

                return count;
            }

I have to read line-by-line a log file. It's about 6MB in size and 40000 line total. But after testing my program, I discover that that log file is only delimited by LF character only. So I can't use the Readline method of StreamReader class

How can I fix this problem?

edit: I tried to use Text Reader, but my program still didn't work:

using (TextReader sr = new StreamReader(strPath, Encoding.Unicode))
            {


                sr.ReadLine(); //ignore three first lines of log file
                sr.ReadLine(); 
                sr.ReadLine();

                int count = 0; //number of read line
                string strLine;
                while (sr.Peek()!=0)
                {
                    strLine = sr.ReadLine();
                    if (strLine.Trim() != "")
                    {
                        InsertData(strLine);
                        count++;
                    }
                }

                return count;
            }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

荒路情人 2024-08-03 17:18:51

TextReader.ReadLine 已经处理仅由 \n 终止的行。

来自文档

一行被定义为一系列
字符后跟一个回车符
返回(0x000d),换行(0x000a),
回车符后跟一行
feed、Environment.NewLine 或 end
流标记。 字符串是
返回的内容不包含
终止回车和/或
换行。 返回值是一个
空引用(视觉上没有任何内容)
基本)如果输入流结束
已达到。

所以基本上,你应该没问题。 它仍然可以与 StreamReader 一起使用。)

(我讨论的是 TextReader 而不是 StreamReader,因为那是声明该方法的地方 -显然 如果您想轻松地遍历行(并且可能对日志文件使用 LINQ),您可以在 MiscUtil 很有用。 它基本上将对 ReadLine() 的调用包装在迭代器中。 例如,您可以执行以下操作:

var query = from file in Directory.GetFiles("logs")
            from line in new LineReader(file)
            where !line.StartsWith("DEBUG")
            select line;

foreach (string line in query)
{
    // ...
}

所有流式传输:)

TextReader.ReadLine already handles lines terminated just by \n.

From the docs:

A line is defined as a sequence of
characters followed by a carriage
return (0x000d), a line feed (0x000a),
a carriage return followed by a line
feed, Environment.NewLine, or the end
of stream marker. The string that is
returned does not contain the
terminating carriage return and/or
line feed. The returned value is a
null reference (Nothing in Visual
Basic) if the end of the input stream
has been reached.

So basically, you should be fine. (I've talked about TextReader rather than StreamReader because that's where the method is declared - obviously it will still work with a StreamReader.)

If you want to iterate through lines easily (and potentially use LINQ against the log file) you may find my LineReader class in MiscUtil useful. It basically wraps calls to ReadLine() in an iterator. So for instance, you can do:

var query = from file in Directory.GetFiles("logs")
            from line in new LineReader(file)
            where !line.StartsWith("DEBUG")
            select line;

foreach (string line in query)
{
    // ...
}

All streaming :)

全部不再 2024-08-03 17:18:51

File.ReadAllLines(fileName) 是否无法正确加载以 LF 行结尾的文件? 如果您需要整个文件,请使用此方法 - 我看到一个网站表明它比其他方法慢,但如果您将正确的编码传递给它(默认为 UTF-8),则情况并非如此,而且它尽可能干净。

编辑:确实如此。 如果您需要流式传输,TextReader.ReadLine() 也可以正确处理 Unix 行结束。

再次编辑:StreamReader 也是如此。 您是否刚刚检查了文档并假设它无法处理 LF 行尾? 我正在查看 Reflector,它看起来确实是一个正确的处理例程。

Does File.ReadAllLines(fileName) not correctly load files with LF line ends? Use this if you need the whole file - I saw a site indicating it's slower than another method, but it's not if you pass the correct Encoding to it (default is UTF-8), plus it's as clean as you can get.

Edit: It does. And if you need streaming, TextReader.ReadLine() correctly handles Unix line ends as well.

Edit again: So does StreamReader. Did you just check the documentation and assume it won't handle LF line ends? I'm looking in Reflector and it sure seems like a proper handling routine.

十年九夏 2024-08-03 17:18:51

我猜想 \LF (\n) 会没问题(而 \CR (\r) -only 可能会导致问题)。

您可以一次读取每一行一个字符,并在读取终止符时对其进行处理。

分析后,如果这太慢,那么您可以将应用程序端缓冲与 read([]) 结合使用。 但首先尝试一次简单的字符!

I'd have guessed \LF (\n) would be fine (whereas \CR (\r) -only might cause problems).

You could read each line a character at a time and process it when you read the terminator.

After profiling, if this is too slow, then you could use app-side-buffering with read([]). But try simple character-at-a-time first!

汐鸠 2024-08-03 17:18:51

或者您可以使用 Readblock 方法并自己解析这些行

Or you can use the Readblock Method and parse the lines yourself

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文