从 C# 调用 ReadLine() 后 FileStream 位置关闭

发布于 2024-09-03 00:54:00 字数 899 浏览 8 评论 0原文

我试图一次读取几行块的(小)文件,并且我需要返回到特定块的开头。

问题是,在第一次调用

streamReader.ReadLine();

streamReader.BaseStream.Position 属性后,该属性被设置为文件末尾!现在我假设一些缓存是在后台完成的,但我希望此属性能够反映从该文件使用的字节数。是的,该文件有多于一行:-)

例如,再次调用 ReadLine() 将会(自然地)返回文件中的下一行,该行不会从先前报告的位置开始streamReader.BaseStream.Position

如何找到第一行结束的实际位置,以便稍后返回那里?

我只能想到通过添加 ReadLine() 返回的字符串的长度来手动进行簿记,但即使在这里也有一些注意事项:

  • ReadLine() 会删除可能有变量的换行符长度(是'\n'吗?是“\r\n”吗?等等)
  • 我不确定这是否适用于可变长度字符

......所以现在看来​​我唯一的选择是重新考虑如何解析文件,这样我就不必倒带。

如果有帮助,我会像这样打开我的文件:

using (var reader = new StreamReader(
        new FileStream(
                       m_path, 
                       FileMode.Open, 
                       FileAccess.Read, 
                       FileShare.ReadWrite)))
{...}

有什么建议吗?

I'm trying to read a (small-ish) file in chunks of a few lines at a time, and I need to return to the beginning of particular chunks.

The problem is, after the very first call to

streamReader.ReadLine();

the streamReader.BaseStream.Position property is set to the end of the file! Now I assume some caching is done in the backstage, but I was expecting this property to reflect the number of bytes that I used from that file. And yes, the file has more than one line :-)

For instance, calling ReadLine() again will (naturally) return the next line in the file, which does not start at the position previously reported by streamReader.BaseStream.Position.

How can I find the actual position where the 1st line ends, so I can return there later?

I can only think of manually doing the bookkeeping, by adding the lengths of the strings returned by ReadLine(), but even here there are a couple of caveats:

  • ReadLine() strips the new-line character(s) which may have a variable length (is is '\n'? Is it "\r\n"? Etc.)
  • I'm not sure if this would work OK with variable-length characters

...so right now it seems like my only option is to rethink how I parse the file, so I don't have to rewind.

If it helps, I open my file like this:

using (var reader = new StreamReader(
        new FileStream(
                       m_path, 
                       FileMode.Open, 
                       FileAccess.Read, 
                       FileShare.ReadWrite)))
{...}

Any suggestions?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

疯狂的代价 2024-09-10 00:54:00

如果您需要读取行,并且需要返回到之前的块,为什么不将您读取的行存储在列表中呢?那应该很容易。

您不应该依赖于根据字符串的长度计算字节长度 - 出于您自己提到的原因:多字节字符、换行符等。

If you need to read lines, and you need to go back to previous chunks, why not store the lines you read in a List? That should be easy enough.

You should not depend on calculating a length in bytes based on the length of the string - for the reasons you mention yourself: Multibyte characters, newline characters, etc.

月下伊人醉 2024-09-10 00:54:00

我做了一个类似的实现,我需要快速访问一个非常大的文本文件中的第 n 行。

StreamReader.BaseStream.Position 指向文件末尾的原因是它有一个内置缓冲区,正如您所期望的那样。

通过计算从每个 ReadLine() 调用读取的字节数进行簿记适用于大多数纯文本文件。但是,我遇到过这样的情况:控制字符(不可打印的控制字符)混合在文本文件中。计算的字节数是错误的,导致我的程序此后无法找到正确的位置。

我的最终解决方案是自己实现行读取器。到目前为止效果很好。这应该给出一些想法:

using (FileStream fs = new FileStream(filePath, FileMode.Open))
{
    int ch;
    int currentLine = 1, offset = 0;

    while ((ch = fs.ReadByte()) >= 0)
    {
        offset++;

        // This covers all cases: \r\n and only \n (for UNIX files)
        if (ch == 10)
        {
            currentLine++;

            // ... do sth such as log current offset with line number
        }
    }
}

并返回到记录的偏移量:

using (FileStream fs = new FileStream(filePath, FileMode.Open))
{
    fs.Seek(yourOffset, SeekOrigin.Begin);
    TextReader tr = new StreamReader(fs);

    string line = tr.ReadLine();
}

还要注意已经有缓冲机制 内置于 FileStream中。

I have done a similar implementation where I needed to access the n-th line in an extremely big text file fast.

The reason streamReader.BaseStream.Position had pointed to the end of file is that it has a built-in buffer, as you expected.

Bookkeeping by counting number of bytes read from each ReadLine() call will work for most plain text files. However, I have encounter cases where there control character, the unprintable one, mixed in the text file. The number of bytes calculated is wrong and caused my program not beeing able to seek to the correct location thereafter.

My final solution was to go with implementing the line reader on my own. It worked well so far. This should give some ideas what it looks like:

using (FileStream fs = new FileStream(filePath, FileMode.Open))
{
    int ch;
    int currentLine = 1, offset = 0;

    while ((ch = fs.ReadByte()) >= 0)
    {
        offset++;

        // This covers all cases: \r\n and only \n (for UNIX files)
        if (ch == 10)
        {
            currentLine++;

            // ... do sth such as log current offset with line number
        }
    }
}

And to go back to logged offset:

using (FileStream fs = new FileStream(filePath, FileMode.Open))
{
    fs.Seek(yourOffset, SeekOrigin.Begin);
    TextReader tr = new StreamReader(fs);

    string line = tr.ReadLine();
}

Also note there is already buffering mechanism built into FileStream.

空城之時有危險 2024-09-10 00:54:00

StreamReader 不是为这种用途而设计的,因此,如果这是您所需要的,我怀疑您必须为 FileStream 编写自己的包装器。

StreamReader isn't designed for this kind of usage, so if this is what you need I suspect that you'll have to write your own wrapper for FileStream.

只想待在家 2024-09-10 00:54:00

接受的答案的一个问题是,如果 ReadLine() 遇到异常,比如由于日志框架在 ReadLine() 时暂时锁定文件,那么您将不会将该行“保存”到列表中,因为它从未返回一条线。如果您捕获此异常,则无法再次重试 ReadLine(),因为 StreamReaders 内部状态和缓冲区从最后一个 ReadLine() 开始就搞砸了,您只能得到返回的行的一部分,并且您不能忽略该断线并查找正如OP发现的那样,回到它的开头。

如果您想到达真正的可查找位置,那么您需要使用反射来获取 StreamReaders 私有变量,这些变量允许您计算其在其自己的缓冲区内的位置。格兰杰的解决方案见此处:StreamReader 和查找,应该可以工作。或者执行其他相关问题中的其他答案所做的操作:创建自己的 StreamReader,公开真正的可查找位置(此链接中的答案:跟踪流读取器行的位置)。这是我在处理 StreamReader 和查找时遇到的仅有的两个选项,出于某种原因,它决定完全消除几乎所有情况下查找的可能性。

编辑:我使用了格兰杰的解决方案并且它有效。只需确保按照以下顺序进行操作:GetActualPosition(),然后将 BaseStream.Position 设置为该位置,然后确保调用 DiscardBufferedData(),最后调用 ReadLine(),您将获得从该位置开始的整行方法中给出。

A problem with the accepted answer is that if ReadLine() encounters an exception, say due to the logging framework locking the file temporarily right when you ReadLine(), then you will not have that line "saved" into a list because it never returned a line. If you catch this exception you cannot retry the ReadLine() a second time because StreamReaders internal state and buffer are screwed up from the last ReadLine() and you will only get part of a line returned, and you cannot ignore that broken line and seek back to the beginning of it as OP found out.

If you want to get to the true seekable location then you need to use reflection to get to StreamReaders private variables that allow you calculate its position inside it's own buffer. Granger's solution seen here: StreamReader and seeking, should work. Or do what other answers in other related questions have done: create your own StreamReader that exposes the true seekable location (this answer in this link: Tracking the position of the line of a streamreader). Those are the only two options I've come across while dealing with StreamReader and seeking, which for some reason decided to completely remove the possibility of seeking in nearly every situation.

edit: I used Granger's solution and it works. Just be sure you go in this order: GetActualPosition(), then set BaseStream.Position to that position, then make sure you call DiscardBufferedData(), and finally you can call ReadLine() and you will get the full line starting from the position given in the method.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文