如何知道文本文件中流读取器的位置(行号)?

发布于 2024-07-19 20:19:05 字数 370 浏览 9 评论 0原文

一个例子(这可能不是现实生活中的,但为了表达我的观点):

public void StreamInfo(StreamReader p)
{
    string info = string.Format(
        "The supplied streamreaer read : {0}\n at line {1}",
        p.ReadLine(),
        p.GetLinePosition()-1);               

}

GetLinePosition 这里是streamreader的一个想象的扩展方法。 这可能吗?

当然我可以自己数数,但这不是问题。

an example (that might not be real life, but to make my point) :

public void StreamInfo(StreamReader p)
{
    string info = string.Format(
        "The supplied streamreaer read : {0}\n at line {1}",
        p.ReadLine(),
        p.GetLinePosition()-1);               

}

GetLinePosition here is an imaginary extension method of streamreader.
Is this possible?

Of course I could keep count myself but that's not the question.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

酷炫老祖宗 2024-07-26 20:19:05

我在寻找类似问题的解决方案时遇到了这篇文章,我需要在特定行中寻找 StreamReader。 我最终创建了两个扩展方法来获取和设置 StreamReader 上的位置。 它实际上并不提供行号计数,但在实践中,我只是抓取每个 ReadLine() 之前的位置,如果该行感兴趣,那么我保留起始位置以供稍后设置像这样回到这一行:

var index = streamReader.GetPosition();
var line1 = streamReader.ReadLine();

streamReader.SetPosition(index);
var line2 = streamReader.ReadLine();

Assert.AreEqual(line1, line2);

重要的部分:

public static class StreamReaderExtensions
{
    readonly static FieldInfo charPosField = typeof(StreamReader).GetField("charPos", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.DeclaredOnly);
    readonly static FieldInfo byteLenField = typeof(StreamReader).GetField("byteLen", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.DeclaredOnly);
    readonly static FieldInfo charBufferField = typeof(StreamReader).GetField("charBuffer", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.DeclaredOnly);

    public static long GetPosition(this StreamReader reader)
    {
        // shift position back from BaseStream.Position by the number of bytes read
        // into internal buffer.
        int byteLen = (int)byteLenField.GetValue(reader);
        var position = reader.BaseStream.Position - byteLen;

        // if we have consumed chars from the buffer we need to calculate how many
        // bytes they represent in the current encoding and add that to the position.
        int charPos = (int)charPosField.GetValue(reader);
        if (charPos > 0)
        {
            var charBuffer = (char[])charBufferField.GetValue(reader);
            var encoding = reader.CurrentEncoding;
            var bytesConsumed = encoding.GetBytes(charBuffer, 0, charPos).Length;
            position += bytesConsumed;
        }

        return position;
    }

    public static void SetPosition(this StreamReader reader, long position)
    {
        reader.DiscardBufferedData();
        reader.BaseStream.Seek(position, SeekOrigin.Begin);
    }
}

这对我来说效果很好,并且取决于您对使用反射的容忍度它认为这是一个相当简单的解决方案。

注意事项:

  1. 虽然我使用各种 Systems.Text.Encoding 选项进行了一些简单的测试,但我使用的几乎所有数据都是简单的文本文件 (ASCII)
  2. 我只使用过 StreamReader.ReadLine() 方法,虽然对 StreamReader 源代码的简要回顾似乎表明这在使用其他读取方法时仍然有效,但我还没有真正测试过该场景。

I came across this post while looking for a solution to a similar problem where I needed to seek the StreamReader to particular lines. I ended up creating two extension methods to get and set the position on a StreamReader. It doesn't actually provide a line number count, but in practice, I just grab the position before each ReadLine() and if the line is of interest, then I keep the start position for setting later to get back to the line like so:

var index = streamReader.GetPosition();
var line1 = streamReader.ReadLine();

streamReader.SetPosition(index);
var line2 = streamReader.ReadLine();

Assert.AreEqual(line1, line2);

and the important part:

public static class StreamReaderExtensions
{
    readonly static FieldInfo charPosField = typeof(StreamReader).GetField("charPos", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.DeclaredOnly);
    readonly static FieldInfo byteLenField = typeof(StreamReader).GetField("byteLen", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.DeclaredOnly);
    readonly static FieldInfo charBufferField = typeof(StreamReader).GetField("charBuffer", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.DeclaredOnly);

    public static long GetPosition(this StreamReader reader)
    {
        // shift position back from BaseStream.Position by the number of bytes read
        // into internal buffer.
        int byteLen = (int)byteLenField.GetValue(reader);
        var position = reader.BaseStream.Position - byteLen;

        // if we have consumed chars from the buffer we need to calculate how many
        // bytes they represent in the current encoding and add that to the position.
        int charPos = (int)charPosField.GetValue(reader);
        if (charPos > 0)
        {
            var charBuffer = (char[])charBufferField.GetValue(reader);
            var encoding = reader.CurrentEncoding;
            var bytesConsumed = encoding.GetBytes(charBuffer, 0, charPos).Length;
            position += bytesConsumed;
        }

        return position;
    }

    public static void SetPosition(this StreamReader reader, long position)
    {
        reader.DiscardBufferedData();
        reader.BaseStream.Seek(position, SeekOrigin.Begin);
    }
}

This works quite well for me and depending on your tolerance for using reflection It thinks it is a fairly simple solution.

Caveats:

  1. While I have done some simple testing using various Systems.Text.Encoding options, pretty much all of the data I consume with this are simple text files (ASCII).
  2. I only ever use the StreamReader.ReadLine() method and while a brief review of the source for StreamReader seems to indicate this will still work when using the other read methods, I have not really tested that scenario.
妄断弥空 2024-07-26 20:19:05

不,不太可能。 “行号”的概念基于已读取的实际数据,而不仅仅是位置。 例如,如果您要将读取器 Seek() 到任意位置,它实际上不会读取该数据,因此它无法确定行号。

做到这一点的唯一方法就是自己跟踪它。

No, not really possible. The concept of a "line number" is based upon the actual data that's already been read, not just the position. For instance, if you were to Seek() the reader to an arbitrary position, it's not actuall going to read that data, so it wouldn't be able to determine the line number.

The only way to do this is to keep track of it yourself.

养猫人 2024-07-26 20:19:05

为任何 TextReader 提供行计数包装器非常容易:

public class PositioningReader : TextReader {
    private TextReader _inner;
    public PositioningReader(TextReader inner) {
        _inner = inner;
    }
    public override void Close() {
        _inner.Close();
    }
    public override int Peek() {
        return _inner.Peek();
    }
    public override int Read() {
        var c = _inner.Read();
        if (c >= 0)
            AdvancePosition((Char)c);
        return c;
    }

    private int _linePos = 0;
    public int LinePos { get { return _linePos; } }

    private int _charPos = 0;
    public int CharPos { get { return _charPos; } }

    private int _matched = 0;
    private void AdvancePosition(Char c) {
        if (Environment.NewLine[_matched] == c) {
            _matched++;
            if (_matched == Environment.NewLine.Length) {
                _linePos++;
                _charPos = 0;
                _matched = 0;
            }
        }
        else {
            _matched = 0;
            _charPos++;
        }
    }
}

缺点(为了简洁起见):

  1. 不检查构造函数参数是否为 null
  2. 不识别终止行的替代方法。 读取由原始 \r 或 \n 分隔的文件时,将与 ReadLine() 行为不一致。
  3. 不会覆盖“块”级方法,例如 Read(char[], int, int)、ReadBlock、ReadLine、ReadToEnd。 TextReader 实现可以正常工作,因为它将其他所有内容路由到 Read(); 然而,可以通过以下方式实现更好的性能
    • 通过将调用路由到 _inner 来覆盖这些方法。 而不是基础。
    • 将读取的字符传递到 AdvancePosition。 请参阅示例 ReadBlock 实现:

public override int ReadBlock(char[] buffer, int index, int count) {
    var readCount = _inner.ReadBlock(buffer, index, count);    
    for (int i = 0; i < readCount; i++)
        AdvancePosition(buffer[index + i]);
    return readCount;
}

It is extremely easy to provide a line-counting wrapper for any TextReader:

public class PositioningReader : TextReader {
    private TextReader _inner;
    public PositioningReader(TextReader inner) {
        _inner = inner;
    }
    public override void Close() {
        _inner.Close();
    }
    public override int Peek() {
        return _inner.Peek();
    }
    public override int Read() {
        var c = _inner.Read();
        if (c >= 0)
            AdvancePosition((Char)c);
        return c;
    }

    private int _linePos = 0;
    public int LinePos { get { return _linePos; } }

    private int _charPos = 0;
    public int CharPos { get { return _charPos; } }

    private int _matched = 0;
    private void AdvancePosition(Char c) {
        if (Environment.NewLine[_matched] == c) {
            _matched++;
            if (_matched == Environment.NewLine.Length) {
                _linePos++;
                _charPos = 0;
                _matched = 0;
            }
        }
        else {
            _matched = 0;
            _charPos++;
        }
    }
}

Drawbacks (for the sake of brevity):

  1. Does not check constructor argument for null
  2. Does not recognize alternate ways to terminate the lines. Will be inconsistent with ReadLine() behavior when reading files separated by raw \r or \n.
  3. Does not override "block"-level methods like Read(char[], int, int), ReadBlock, ReadLine, ReadToEnd. TextReader implementation works correctly since it routes everything else to Read(); however, better performance could be achieved by
    • overriding those methods via routing calls to _inner. instead of base.
    • passing the characters read to the AdvancePosition. See the sample ReadBlock implementation:

public override int ReadBlock(char[] buffer, int index, int count) {
    var readCount = _inner.ReadBlock(buffer, index, count);    
    for (int i = 0; i < readCount; i++)
        AdvancePosition(buffer[index + i]);
    return readCount;
}
回心转意 2024-07-26 20:19:05

不。

考虑一下可以使用底层流对象(可以位于任何行中的任何点)来寻找任何位置。
现在考虑一下这会对 StreamReader 保存的任何计数产生什么影响。

StreamReader 应该去找出它现在在哪一行吗?
它是否应该只保留读取的行数,而不管文件中的位置如何?

恕我直言,还有更多的问题会让这成为实施的噩梦。

No.

Consider that it's possible to seek to any poisition using the underlying stream object (which could be at any point in any line).
Now consider what that would do to any count kept by the StreamReader.

Should the StreamReader go and figure out which line it's now on?
Should it just keep a number of lines read, regardless of position within the file?

There are more questions than just these that would make this a nightmare to implement, imho.

Smile简单爱 2024-07-26 20:19:05

这是一个使用 ReadLine() 方法实现 StreamReader 的人,该方法注册文件位置。

http://www.daniweb.com/forums/thread35078.html

我想应该继承自 StreamReader,然后将额外的方法以及一些属性(_lineLength + _bytesRead)添加到特殊类中:

 // Reads a line. A line is defined as a sequence of characters followed by
 // a carriage return ('\r'), a line feed ('\n'), or a carriage return
 // immediately followed by a line feed. The resulting string does not
 // contain the terminating carriage return and/or line feed. The returned
 // value is null if the end of the input stream has been reached.
 //
 /// <include file='doc\myStreamReader.uex' path='docs/doc[@for="myStreamReader.ReadLine"]/*' />
 public override String ReadLine()
 {
          _lineLength = 0;
          //if (stream == null)
          //       __Error.ReaderClosed();
          if (charPos == charLen)
          {
                   if (ReadBuffer() == 0) return null;
          }
          StringBuilder sb = null;
          do
          {
                   int i = charPos;
                   do
                   {
                           char ch = charBuffer[i];
                           int EolChars = 0;
                           if (ch == '\r' || ch == '\n')
                           {
                                    EolChars = 1;
                                    String s;
                                    if (sb != null)
                                    {
                                             sb.Append(charBuffer, charPos, i - charPos);
                                             s = sb.ToString();
                                    }
                                    else
                                    {
                                             s = new String(charBuffer, charPos, i - charPos);
                                    }
                                    charPos = i + 1;
                                    if (ch == '\r' && (charPos < charLen || ReadBuffer() > 0))
                                    {
                                             if (charBuffer[charPos] == '\n')
                                             {
                                                      charPos++;
                                                      EolChars = 2;
                                             }
                                    }
                                    _lineLength = s.Length + EolChars;
                                    _bytesRead = _bytesRead + _lineLength;
                                    return s;
                           }
                           i++;
                   } while (i < charLen);
                   i = charLen - charPos;
                   if (sb == null) sb = new StringBuilder(i + 80);
                   sb.Append(charBuffer, charPos, i);
          } while (ReadBuffer() > 0);
          string ss = sb.ToString();
          _lineLength = ss.Length;
          _bytesRead = _bytesRead + _lineLength;
          return ss;
 }

认为代码中存在一个小错误,因为字符串的长度用于计算文件位置而不是使用实际的长度读取的字节数(缺乏对 UTF8 和 UTF16 编码文件的支持)。

Here is a guy that implemented a StreamReader with ReadLine() method that registers file position.

http://www.daniweb.com/forums/thread35078.html

I guess one should inherit from StreamReader, and then add the extra method to the special class along with some properties (_lineLength + _bytesRead):

 // Reads a line. A line is defined as a sequence of characters followed by
 // a carriage return ('\r'), a line feed ('\n'), or a carriage return
 // immediately followed by a line feed. The resulting string does not
 // contain the terminating carriage return and/or line feed. The returned
 // value is null if the end of the input stream has been reached.
 //
 /// <include file='doc\myStreamReader.uex' path='docs/doc[@for="myStreamReader.ReadLine"]/*' />
 public override String ReadLine()
 {
          _lineLength = 0;
          //if (stream == null)
          //       __Error.ReaderClosed();
          if (charPos == charLen)
          {
                   if (ReadBuffer() == 0) return null;
          }
          StringBuilder sb = null;
          do
          {
                   int i = charPos;
                   do
                   {
                           char ch = charBuffer[i];
                           int EolChars = 0;
                           if (ch == '\r' || ch == '\n')
                           {
                                    EolChars = 1;
                                    String s;
                                    if (sb != null)
                                    {
                                             sb.Append(charBuffer, charPos, i - charPos);
                                             s = sb.ToString();
                                    }
                                    else
                                    {
                                             s = new String(charBuffer, charPos, i - charPos);
                                    }
                                    charPos = i + 1;
                                    if (ch == '\r' && (charPos < charLen || ReadBuffer() > 0))
                                    {
                                             if (charBuffer[charPos] == '\n')
                                             {
                                                      charPos++;
                                                      EolChars = 2;
                                             }
                                    }
                                    _lineLength = s.Length + EolChars;
                                    _bytesRead = _bytesRead + _lineLength;
                                    return s;
                           }
                           i++;
                   } while (i < charLen);
                   i = charLen - charPos;
                   if (sb == null) sb = new StringBuilder(i + 80);
                   sb.Append(charBuffer, charPos, i);
          } while (ReadBuffer() > 0);
          string ss = sb.ToString();
          _lineLength = ss.Length;
          _bytesRead = _bytesRead + _lineLength;
          return ss;
 }

Think there is a minor bug in the code as the length of the string is used to calculate file position instead of using the actual bytes read (Lacking support for UTF8 and UTF16 encoded files).

很糊涂小朋友 2024-07-26 20:19:05

我来这里是为了寻找一些简单的东西。 如果您只是使用 ReadLine() 并且不关心使用 Seek() 或其他任何东西,只需创建 StreamReader 的一个简单子类

class CountingReader : StreamReader {
    private int _lineNumber = 0;
    public int LineNumber { get { return _lineNumber; } }

    public CountingReader(Stream stream) : base(stream) { }

    public override string ReadLine() {
        _lineNumber++;
        return base.ReadLine();
    }
}

,然后以正常方式进行操作,例如从名为 file 的 FileInfo 对象中

CountingReader reader = new CountingReader(file.OpenRead())

读取reader.LineNumber 属性。

I came here looking for something simple. If you're just using ReadLine() and don't care about using Seek() or anything, just make a simple subclass of StreamReader

class CountingReader : StreamReader {
    private int _lineNumber = 0;
    public int LineNumber { get { return _lineNumber; } }

    public CountingReader(Stream stream) : base(stream) { }

    public override string ReadLine() {
        _lineNumber++;
        return base.ReadLine();
    }
}

and then you make it the normal way, say from a FileInfo object named file

CountingReader reader = new CountingReader(file.OpenRead())

and you just read the reader.LineNumber property.

迷鸟归林 2024-07-26 20:19:05

已经就 BaseStream 提出的观点是有效且重要的。 然而,在某些情况下,您想要阅读文本并知道您在文本中的位置。 将其编写为类以使其易于重用仍然很有用。

我现在尝试写这样一个类。 它似乎工作正常,但速度相当慢。 当性能不是很重要时(它并没有那么慢,见下文),它应该没问题。

我使用相同的逻辑来跟踪文本中的位置,无论您是一次读取一个字符、一次读取一个缓冲区还是一次读取一行。 虽然我确信通过放弃它可以使性能变得更好,但它使其更容易实现......并且我希望遵循代码。

我对 ReadLine 方法(我认为这是该实现的最弱点)与 StreamReader 进行了非常基本的性能比较,差异几乎是一个数量级。 我使用我的类 StreamReaderEx 获得了 22 MB/s,但直接使用 StreamReader(在我配备 SSD 的笔记本电脑上)的速度几乎是 9 倍。 虽然这可能很有趣,但我不知道如何进行正确的阅读测试; 也许使用 2 个相同的文件,每个文件都大于磁盘缓冲区,并交替读取它们..? 至少,当我运行多次时,无论哪个类首先读取测试文件,我的简单测试都会产生一致的结果。

NewLine 符号默认为Environment.NewLine,但可以设置为长度为1 或2 的任何字符串。读者仅将此符号视为换行符,这可能是一个缺点。 至少我知道 Visual Studio 多次提示我打开的文件“具有不一致的换行符”。

请注意,我没有包含 Guard 类; 这是一个简单的实用程序类,从上下文中应该可以清楚地看出如何替换它。 您甚至可以删除它,但是您会丢失一些参数检查,因此生成的代码将远离“正确”。 例如,Guard.NotNull(s, "s") 只是检查 s 是否不为 null,如果是这样,则抛出 ArgumentNullException(参数名称为“s”,因此是第二个参数)。

废话够多了,代码如下:

public class StreamReaderEx : StreamReader
{
    // NewLine characters (magic value -1: "not used").
    int newLine1, newLine2;

    // The last character read was the first character of the NewLine symbol AND we are using a two-character symbol.
    bool insideNewLine;

    // StringBuilder used for ReadLine implementation.
    StringBuilder lineBuilder = new StringBuilder();


    public StreamReaderEx(string path, string newLine = "\r\n") : base(path)
    {
        init(newLine);
    }


    public StreamReaderEx(Stream s, string newLine = "\r\n") : base(s)
    {
        init(newLine);
    }


    public string NewLine
    {
        get { return "" + (char)newLine1 + (char)newLine2; }
        private set
        {
            Guard.NotNull(value, "value");
            Guard.Range(value.Length, 1, 2, "Only 1 to 2 character NewLine symbols are supported.");

            newLine1 = value[0];
            newLine2 = (value.Length == 2 ? value[1] : -1);
        }
    }


    public int LineNumber { get; private set; }
    public int LinePosition { get; private set; }


    public override int Read()
    {
        int next = base.Read();
        trackTextPosition(next);
        return next;
    }


    public override int Read(char[] buffer, int index, int count)
    {
        int n = base.Read(buffer, index, count);
        for (int i = 0; i 

The points already made with respect to the BaseStream are valid and important. However, there are situations in which you want to read a text and know where in the text you are. It can still be useful to write that up as a class to make it easy to reuse.

I tried to write such a class now. It seems to work correctly, but it's rather slow. It should be fine when performance isn't crucial (it isn't that slow, see below).

I use the same logic to track position in the text regardless if you read a char at a time, one buffer at a time, or one line at a time. While I'm sure this can be made to perform rather better by abandoning this, it made it much easier to implement... and, I hope, to follow the code.

I did a very basic performance comparison of the ReadLine method (which I believe is the weakest point of this implementation) to StreamReader, and the difference is almost an order of magnitude. I got 22 MB/s using my class StreamReaderEx, but nearly 9 times as much using StreamReader directly (on my SSD-equipped laptop). While it could be interesting, I don't know how to make a proper reading test; maybe using 2 identical files, each larger than the disk buffer, and reading them alternately..? At least my simple test produces consistent results when I run it several times, and regardless of which class reads the test file first.

The NewLine symbol defaults to Environment.NewLine but can be set to any string of length 1 or 2. The reader considers only this symbol as a newline, which may be a drawback. At least I know Visual Studio has prompted me a fair number of times that a file I open "has inconsistent newlines".

Please note that I haven't included the Guard class; this is a simple utility class and it should be obvoius from the context how to replace it. You can even remove it, but you'd lose some argument checking and thus the resulting code would be farther from "correct". For example, Guard.NotNull(s, "s") simply checks that is s is not null, throwing an ArgumentNullException (with argument name "s", hence the second parameter) should it be the case.

Enough babble, here's the code:

public class StreamReaderEx : StreamReader
{
    // NewLine characters (magic value -1: "not used").
    int newLine1, newLine2;

    // The last character read was the first character of the NewLine symbol AND we are using a two-character symbol.
    bool insideNewLine;

    // StringBuilder used for ReadLine implementation.
    StringBuilder lineBuilder = new StringBuilder();


    public StreamReaderEx(string path, string newLine = "\r\n") : base(path)
    {
        init(newLine);
    }


    public StreamReaderEx(Stream s, string newLine = "\r\n") : base(s)
    {
        init(newLine);
    }


    public string NewLine
    {
        get { return "" + (char)newLine1 + (char)newLine2; }
        private set
        {
            Guard.NotNull(value, "value");
            Guard.Range(value.Length, 1, 2, "Only 1 to 2 character NewLine symbols are supported.");

            newLine1 = value[0];
            newLine2 = (value.Length == 2 ? value[1] : -1);
        }
    }


    public int LineNumber { get; private set; }
    public int LinePosition { get; private set; }


    public override int Read()
    {
        int next = base.Read();
        trackTextPosition(next);
        return next;
    }


    public override int Read(char[] buffer, int index, int count)
    {
        int n = base.Read(buffer, index, count);
        for (int i = 0; i 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文