逐行读取文本文件,并提供精确的偏移/位置报告

发布于 2024-08-28 04:39:03 字数 791 浏览 7 评论 0原文

我的简单要求:读取一个巨大的(>一百万)行测试文件(对于本示例,假设它是某种 CSV)并保留对该行开头的引用,以便将来更快地查找(读取一行,开始在X)。

我首先尝试了简单而简单的方法,使用 StreamWriter 并访问底层的 BaseStream.Position。不幸的是,这并没有按照我的预期工作:

给定一个包含以下内容的文件

Foo
Bar
Baz
Bla
Fasel

和这个非常简单的代码,

using (var sr = new StreamReader(@"C:\Temp\LineTest.txt")) {
  string line;
  long pos = sr.BaseStream.Position;
  while ((line = sr.ReadLine()) != null) {
    Console.Write("{0:d3} ", pos);
    Console.WriteLine(line);
    pos = sr.BaseStream.Position;
  }
}

输出是:

000 Foo
025 Bar
025 Baz
025 Bla
025 Fasel

我可以想象流正在尝试提供帮助/高效,并且每当有新数据时可能会读取(大)块必要的。对我来说这很糟糕..

最后的问题是:在逐行读取文件时获取(字节,字符)偏移量的任何方法,而不使用基本流并弄乱 \r \n \r\n 和字符串编码等手动?没什么大不了的,真的,我只是不喜欢构建可能已经存在的东西..

My simple requirement: Reading a huge (> a million) line test file (For this example assume it's a CSV of some sorts) and keeping a reference to the beginning of that line for faster lookup in the future (read a line, starting at X).

I tried the naive and easy way first, using a StreamWriter and accessing the underlying BaseStream.Position. Unfortunately that doesn't work as I intended:

Given a file containing the following

Foo
Bar
Baz
Bla
Fasel

and this very simple code

using (var sr = new StreamReader(@"C:\Temp\LineTest.txt")) {
  string line;
  long pos = sr.BaseStream.Position;
  while ((line = sr.ReadLine()) != null) {
    Console.Write("{0:d3} ", pos);
    Console.WriteLine(line);
    pos = sr.BaseStream.Position;
  }
}

the output is:

000 Foo
025 Bar
025 Baz
025 Bla
025 Fasel

I can imagine that the stream is trying to be helpful/efficient and probably reads in (big) chunks whenever new data is necessary. For me this is bad..

The question, finally: Any way to get the (byte, char) offset while reading a file line by line without using a basic Stream and messing with \r \n \r\n and string encoding etc. manually? Not a big deal, really, I just don't like to build things that might exist already..

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

鹤舞 2024-09-04 04:39:03

您可以创建一个 TextReader 包装器,它将跟踪基本 TextReader 中的当前位置:

public class TrackingTextReader : TextReader
{
    private TextReader _baseReader;
    private int _position;

    public TrackingTextReader(TextReader baseReader)
    {
        _baseReader = baseReader;
    }

    public override int Read()
    {
        _position++;
        return _baseReader.Read();
    }

    public override int Peek()
    {
        return _baseReader.Peek();
    }

    public int Position
    {
        get { return _position; }
    }
}

然后您可以按如下方式使用它:

string text = @"Foo
Bar
Baz
Bla
Fasel";

using (var reader = new StringReader(text))
using (var trackingReader = new TrackingTextReader(reader))
{
    string line;
    while ((line = trackingReader.ReadLine()) != null)
    {
        Console.WriteLine("{0:d3} {1}", trackingReader.Position, line);
    }
}

You could create a TextReader wrapper, which would track the current position in the base TextReader :

public class TrackingTextReader : TextReader
{
    private TextReader _baseReader;
    private int _position;

    public TrackingTextReader(TextReader baseReader)
    {
        _baseReader = baseReader;
    }

    public override int Read()
    {
        _position++;
        return _baseReader.Read();
    }

    public override int Peek()
    {
        return _baseReader.Peek();
    }

    public int Position
    {
        get { return _position; }
    }
}

You could then use it as follows :

string text = @"Foo
Bar
Baz
Bla
Fasel";

using (var reader = new StringReader(text))
using (var trackingReader = new TrackingTextReader(reader))
{
    string line;
    while ((line = trackingReader.ReadLine()) != null)
    {
        Console.WriteLine("{0:d3} {1}", trackingReader.Position, line);
    }
}
黯然#的苍凉 2024-09-04 04:39:03

经过搜索、测试并做了一些疯狂的事情之后,我的代码需要解决(我目前在我的产品中使用此代码)。

public sealed class TextFileReader : IDisposable
{

    FileStream _fileStream = null;
    BinaryReader _binReader = null;
    StreamReader _streamReader = null;
    List<string> _lines = null;
    long _length = -1;

    /// <summary>
    /// Initializes a new instance of the <see cref="TextFileReader"/> class with default encoding (UTF8).
    /// </summary>
    /// <param name="filePath">The path to text file.</param>
    public TextFileReader(string filePath) : this(filePath, Encoding.UTF8) { }

    /// <summary>
    /// Initializes a new instance of the <see cref="TextFileReader"/> class.
    /// </summary>
    /// <param name="filePath">The path to text file.</param>
    /// <param name="encoding">The encoding of text file.</param>
    public TextFileReader(string filePath, Encoding encoding)
    {
        if (!File.Exists(filePath))
            throw new FileNotFoundException("File (" + filePath + ") is not found.");

        _fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read);
        _length = _fileStream.Length;
        _binReader = new BinaryReader(_fileStream, encoding);
    }

    /// <summary>
    /// Reads a line of characters from the current stream at the current position and returns the data as a string.
    /// </summary>
    /// <returns>The next line from the input stream, or null if the end of the input stream is reached</returns>
    public string ReadLine()
    {
        if (_binReader.PeekChar() == -1)
            return null;

        string line = "";
        int nextChar = _binReader.Read();
        while (nextChar != -1)
        {
            char current = (char)nextChar;
            if (current.Equals('\n'))
                break;
            else if (current.Equals('\r'))
            {
                int pickChar = _binReader.PeekChar();
                if (pickChar != -1 && ((char)pickChar).Equals('\n'))
                    nextChar = _binReader.Read();
                break;
            }
            else
                line += current;
            nextChar = _binReader.Read();
        }
        return line;
    }

    /// <summary>
    /// Reads some lines of characters from the current stream at the current position and returns the data as a collection of string.
    /// </summary>
    /// <param name="totalLines">The total number of lines to read (set as 0 to read from current position to end of file).</param>
    /// <returns>The next lines from the input stream, or empty collectoin if the end of the input stream is reached</returns>
    public List<string> ReadLines(int totalLines)
    {
        if (totalLines < 1 && this.Position == 0)
            return this.ReadAllLines();

        _lines = new List<string>();
        int counter = 0;
        string line = this.ReadLine();
        while (line != null)
        {
            _lines.Add(line);
            counter++;
            if (totalLines > 0 && counter >= totalLines)
                break;
            line = this.ReadLine();
        }
        return _lines;
    }

    /// <summary>
    /// Reads all lines of characters from the current stream (from the begin to end) and returns the data as a collection of string.
    /// </summary>
    /// <returns>The next lines from the input stream, or empty collectoin if the end of the input stream is reached</returns>
    public List<string> ReadAllLines()
    {
        if (_streamReader == null)
            _streamReader = new StreamReader(_fileStream);
        _streamReader.BaseStream.Seek(0, SeekOrigin.Begin);
        _lines = new List<string>();
        string line = _streamReader.ReadLine();
        while (line != null)
        {
            _lines.Add(line);
            line = _streamReader.ReadLine();
        }
        return _lines;
    }

    /// <summary>
    /// Gets the length of text file (in bytes).
    /// </summary>
    public long Length
    {
        get { return _length; }
    }

    /// <summary>
    /// Gets or sets the current reading position.
    /// </summary>
    public long Position
    {
        get
        {
            if (_binReader == null)
                return -1;
            else
                return _binReader.BaseStream.Position;
        }
        set
        {
            if (_binReader == null)
                return;
            else if (value >= this.Length)
                this.SetPosition(this.Length);
            else
                this.SetPosition(value);
        }
    }

    void SetPosition(long position)
    {
        _binReader.BaseStream.Seek(position, SeekOrigin.Begin);
    }

    /// <summary>
    /// Gets the lines after reading.
    /// </summary>
    public List<string> Lines
    {
        get
        {
            return _lines;
        }
    }

    /// <summary>
    /// Performs application-defined tasks associated with freeing, releasing, or resetting unmanaged resources.
    /// </summary>
    public void Dispose()
    {
        if (_binReader != null)
            _binReader.Close();
        if (_streamReader != null)
        {
            _streamReader.Close();
            _streamReader.Dispose();
        }
        if (_fileStream != null)
        {
            _fileStream.Close();
            _fileStream.Dispose();
        }
    }

    ~TextFileReader()
    {
        this.Dispose();
    }
}

After searching, testing and do something crazy, there is my code to solve (I'm currently using this code in my product).

public sealed class TextFileReader : IDisposable
{

    FileStream _fileStream = null;
    BinaryReader _binReader = null;
    StreamReader _streamReader = null;
    List<string> _lines = null;
    long _length = -1;

    /// <summary>
    /// Initializes a new instance of the <see cref="TextFileReader"/> class with default encoding (UTF8).
    /// </summary>
    /// <param name="filePath">The path to text file.</param>
    public TextFileReader(string filePath) : this(filePath, Encoding.UTF8) { }

    /// <summary>
    /// Initializes a new instance of the <see cref="TextFileReader"/> class.
    /// </summary>
    /// <param name="filePath">The path to text file.</param>
    /// <param name="encoding">The encoding of text file.</param>
    public TextFileReader(string filePath, Encoding encoding)
    {
        if (!File.Exists(filePath))
            throw new FileNotFoundException("File (" + filePath + ") is not found.");

        _fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read);
        _length = _fileStream.Length;
        _binReader = new BinaryReader(_fileStream, encoding);
    }

    /// <summary>
    /// Reads a line of characters from the current stream at the current position and returns the data as a string.
    /// </summary>
    /// <returns>The next line from the input stream, or null if the end of the input stream is reached</returns>
    public string ReadLine()
    {
        if (_binReader.PeekChar() == -1)
            return null;

        string line = "";
        int nextChar = _binReader.Read();
        while (nextChar != -1)
        {
            char current = (char)nextChar;
            if (current.Equals('\n'))
                break;
            else if (current.Equals('\r'))
            {
                int pickChar = _binReader.PeekChar();
                if (pickChar != -1 && ((char)pickChar).Equals('\n'))
                    nextChar = _binReader.Read();
                break;
            }
            else
                line += current;
            nextChar = _binReader.Read();
        }
        return line;
    }

    /// <summary>
    /// Reads some lines of characters from the current stream at the current position and returns the data as a collection of string.
    /// </summary>
    /// <param name="totalLines">The total number of lines to read (set as 0 to read from current position to end of file).</param>
    /// <returns>The next lines from the input stream, or empty collectoin if the end of the input stream is reached</returns>
    public List<string> ReadLines(int totalLines)
    {
        if (totalLines < 1 && this.Position == 0)
            return this.ReadAllLines();

        _lines = new List<string>();
        int counter = 0;
        string line = this.ReadLine();
        while (line != null)
        {
            _lines.Add(line);
            counter++;
            if (totalLines > 0 && counter >= totalLines)
                break;
            line = this.ReadLine();
        }
        return _lines;
    }

    /// <summary>
    /// Reads all lines of characters from the current stream (from the begin to end) and returns the data as a collection of string.
    /// </summary>
    /// <returns>The next lines from the input stream, or empty collectoin if the end of the input stream is reached</returns>
    public List<string> ReadAllLines()
    {
        if (_streamReader == null)
            _streamReader = new StreamReader(_fileStream);
        _streamReader.BaseStream.Seek(0, SeekOrigin.Begin);
        _lines = new List<string>();
        string line = _streamReader.ReadLine();
        while (line != null)
        {
            _lines.Add(line);
            line = _streamReader.ReadLine();
        }
        return _lines;
    }

    /// <summary>
    /// Gets the length of text file (in bytes).
    /// </summary>
    public long Length
    {
        get { return _length; }
    }

    /// <summary>
    /// Gets or sets the current reading position.
    /// </summary>
    public long Position
    {
        get
        {
            if (_binReader == null)
                return -1;
            else
                return _binReader.BaseStream.Position;
        }
        set
        {
            if (_binReader == null)
                return;
            else if (value >= this.Length)
                this.SetPosition(this.Length);
            else
                this.SetPosition(value);
        }
    }

    void SetPosition(long position)
    {
        _binReader.BaseStream.Seek(position, SeekOrigin.Begin);
    }

    /// <summary>
    /// Gets the lines after reading.
    /// </summary>
    public List<string> Lines
    {
        get
        {
            return _lines;
        }
    }

    /// <summary>
    /// Performs application-defined tasks associated with freeing, releasing, or resetting unmanaged resources.
    /// </summary>
    public void Dispose()
    {
        if (_binReader != null)
            _binReader.Close();
        if (_streamReader != null)
        {
            _streamReader.Close();
            _streamReader.Dispose();
        }
        if (_fileStream != null)
        {
            _fileStream.Close();
            _fileStream.Dispose();
        }
    }

    ~TextFileReader()
    {
        this.Dispose();
    }
}
过期以后 2024-09-04 04:39:03

这确实是一个棘手的问题。
在互联网上对不同解决方案进行了漫长而疲惫的枚举(包括此线程中的解决方案,谢谢!)之后,我不得不创建自己的自行车。

我有以下要求:

  • 性能 - 读取必须非常快,因此一次读取一个字符或使用反射是不可接受的,因此需要缓冲
  • 流式处理 - 文件可以是巨大,因此将其完全读入内存是不可接受的
  • 拖尾 - 文件尾部应该可用
  • 长行 - 行可以很长,因此不能限制缓冲区
  • <强>稳定 - 在使用过程中单字节错误立即可见。对我来说不幸的是,我发现的几个实现都存在稳定性问题

    公共类OffsetStreamReader
    {
        私有常量 int InitialBufferSize = 4096;    
        私有只读 char _bom;
        私有只读字节_end;
        私有只读编码_encoding;
        私有只读流_stream;
        私有只读布尔_tail;
    
        私有字节[] _buffer;
        私有 int _processedInBuffer;
        私有 int _informationInBuffer;
    
        公共OffsetStreamReader(流流,布尔尾)
        {
            _buffer = 新字节[InitialBufferSize];
            _processedInBuffer = 初始缓冲区大小;
    
            if (stream == null || !stream.CanRead)
                抛出新的 ArgumentException(“流”);
    
            _stream = 流;
            _tail = 尾巴;
            _encoding = 编码.UTF8;
    
            _bom = '\uFEFF';
            _end = _encoding.GetBytes(new [] {'\n'})[0];
        }
    
        公共长偏移量{获取;私人套装; }
    
        公共字符串 ReadLine()
        {
            // 底层流关闭
            if (!_stream.CanRead)
                返回空值;
    
            // 结束符
            if (_processedInBuffer == _informationInBuffer)
            {
                如果(_尾)
                {
                    _processedInBuffer = _buffer.Length;
                    _信息缓冲区 = 0;
                    读取缓冲区();
                }
    
                返回空值;
            }
    
            var lineEnd = Search(_buffer, _end, _processedInBuffer);
            var haveEnd = true;
    
            // 文件结束但未完成换行符
            if (lineEnd.HasValue == false && _informationInBuffer + _processedInBuffer < _buffer.Length)
            {
                如果(_尾)
                    返回空值;
                别的
                {
                    lineEnd = _informationInBuffer;
                    有结束=假;
                }
            }
    
            // 当前缓冲区没有结束
            if (!lineEnd.HasValue)
            {
                读取缓冲区();
                if (_informationInBuffer != 0)
                    返回 ReadLine();
    
                返回空值;
            }
    
            var arr = new byte[lineEnd.Value - _processedInBuffer];
            Array.Copy(_buffer, _processedInBuffer, arr, 0, arr.Length);
    
            偏移量 = 偏移量 + lineEnd.Value - _processedInBuffer + (haveEnd ? 1 : 0);
            _processedInBuffer = lineEnd.Value + (haveEnd ? 1 : 0);
    
            return _encoding.GetString(arr).TrimStart(_bom).TrimEnd('\r', '\n');
        }
    
        私有无效 ReadBuffer()
        {
            var notProcessedPartLength = _buffer.Length - _processedInBuffer;
    
            // 扩展缓冲区以便能够将整行放入缓冲区
            // 是[NOT_PROCESSED]
            // 变成[NOT_PROCESSED]
            if (notProcessedPartLength == _buffer.Length)
            {
                var ExtendedBuffer = new byte[_buffer.Length + _buffer.Length/2];
                Array.Copy(_buffer, ExtendedBuffer, _buffer.Length);
                _buffer = 扩展缓冲区;
            }
    
            // 将未处理的信息复制到开头
            // 是[PROCESSED NOT_PROCESSED]
            // 变成[NOT_PROCESSED]
            Array.Copy(_buffer, (long) _processedInBuffer, _buffer, 0, notProcessedPartLength);
    
            // 读取更多信息到缓冲区的空部分
            // 是 [ NOT_PROCESSED ]
            // 变成 [ NOT_PROCESSED NEW_NOT_PROCESSED ]
            _informationInBuffer = notProcessedPartLength + _stream.Read(_buffer, notProcessedPartLength, _buffer.Length - notProcessedPartLength);
    
            _processedInBuffer = 0;
        }
    
        私人整数?搜索(字节[]缓冲区,字节byteToSearch,int bufferOffset)
        {
            for (int i = bufferOffset; i < buffer.Length - 1; i++)
            {
                if (buffer[i] == byteToSearch)
                    返回我;
            }
            返回空值;
        }
    }
    

This is really tough issue.
After very long and exhausting enumeration of different solutions in the internet (including solutions from this thread, thank you!) I had to create my own bicycle.

I had following requirements:

  • Performance - reading must be very fast, so reading one char at the time or using reflection are not acceptable, so buffering is required
  • Streaming - file can be huge, so it is not acceptable to read it to memory entirely
  • Tailing - file tailing should be available
  • Long lines - lines can be very long, so buffer can't be limited
  • Stable - single byte error was immediately visible during usage. Unfortunately for me, several implementations I found were with stability problems

    public class OffsetStreamReader
    {
        private const int InitialBufferSize = 4096;    
        private readonly char _bom;
        private readonly byte _end;
        private readonly Encoding _encoding;
        private readonly Stream _stream;
        private readonly bool _tail;
    
        private byte[] _buffer;
        private int _processedInBuffer;
        private int _informationInBuffer;
    
        public OffsetStreamReader(Stream stream, bool tail)
        {
            _buffer = new byte[InitialBufferSize];
            _processedInBuffer = InitialBufferSize;
    
            if (stream == null || !stream.CanRead)
                throw new ArgumentException("stream");
    
            _stream = stream;
            _tail = tail;
            _encoding = Encoding.UTF8;
    
            _bom = '\uFEFF';
            _end = _encoding.GetBytes(new [] {'\n'})[0];
        }
    
        public long Offset { get; private set; }
    
        public string ReadLine()
        {
            // Underlying stream closed
            if (!_stream.CanRead)
                return null;
    
            // EOF
            if (_processedInBuffer == _informationInBuffer)
            {
                if (_tail)
                {
                    _processedInBuffer = _buffer.Length;
                    _informationInBuffer = 0;
                    ReadBuffer();
                }
    
                return null;
            }
    
            var lineEnd = Search(_buffer, _end, _processedInBuffer);
            var haveEnd = true;
    
            // File ended but no finalizing newline character
            if (lineEnd.HasValue == false && _informationInBuffer + _processedInBuffer < _buffer.Length)
            {
                if (_tail)
                    return null;
                else
                {
                    lineEnd = _informationInBuffer;
                    haveEnd = false;
                }
            }
    
            // No end in current buffer
            if (!lineEnd.HasValue)
            {
                ReadBuffer();
                if (_informationInBuffer != 0)
                    return ReadLine();
    
                return null;
            }
    
            var arr = new byte[lineEnd.Value - _processedInBuffer];
            Array.Copy(_buffer, _processedInBuffer, arr, 0, arr.Length);
    
            Offset = Offset + lineEnd.Value - _processedInBuffer + (haveEnd ? 1 : 0);
            _processedInBuffer = lineEnd.Value + (haveEnd ? 1 : 0);
    
            return _encoding.GetString(arr).TrimStart(_bom).TrimEnd('\r', '\n');
        }
    
        private void ReadBuffer()
        {
            var notProcessedPartLength = _buffer.Length - _processedInBuffer;
    
            // Extend buffer to be able to fit whole line to the buffer
            // Was     [NOT_PROCESSED]
            // Become  [NOT_PROCESSED        ]
            if (notProcessedPartLength == _buffer.Length)
            {
                var extendedBuffer = new byte[_buffer.Length + _buffer.Length/2];
                Array.Copy(_buffer, extendedBuffer, _buffer.Length);
                _buffer = extendedBuffer;
            }
    
            // Copy not processed information to the begining
            // Was    [PROCESSED NOT_PROCESSED]
            // Become [NOT_PROCESSED          ]
            Array.Copy(_buffer, (long) _processedInBuffer, _buffer, 0, notProcessedPartLength);
    
            // Read more information to the empty part of buffer
            // Was    [ NOT_PROCESSED                   ]
            // Become [ NOT_PROCESSED NEW_NOT_PROCESSED ]
            _informationInBuffer = notProcessedPartLength + _stream.Read(_buffer, notProcessedPartLength, _buffer.Length - notProcessedPartLength);
    
            _processedInBuffer = 0;
        }
    
        private int? Search(byte[] buffer, byte byteToSearch, int bufferOffset)
        {
            for (int i = bufferOffset; i < buffer.Length - 1; i++)
            {
                if (buffer[i] == byteToSearch)
                    return i;
            }
            return null;
        }
    }
    
不打扰别人 2024-09-04 04:39:03

尽管托马斯·莱维斯克的解决方案效果很好,但这是我的。它使用反射,因此速度会较慢,但它与编码无关。另外我也添加了 Seek 扩展。

/// <summary>Useful <see cref="StreamReader"/> extentions.</summary>
public static class StreamReaderExtentions
{
    /// <summary>Gets the position within the <see cref="StreamReader.BaseStream"/> of the <see cref="StreamReader"/>.</summary>
    /// <remarks><para>This method is quite slow. It uses reflection to access private <see cref="StreamReader"/> fields. Don't use it too often.</para></remarks>
    /// <param name="streamReader">Source <see cref="StreamReader"/>.</param>
    /// <exception cref="ArgumentNullException">Occurs when passed <see cref="StreamReader"/> is null.</exception>
    /// <returns>The current position of this stream.</returns>
    public static long GetPosition(this StreamReader streamReader)
    {
        if (streamReader == null)
            throw new ArgumentNullException("streamReader");

        var charBuffer = (char[])streamReader.GetType().InvokeMember("charBuffer", BindingFlags.DeclaredOnly | BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.GetField, null, streamReader, null);
        var charPos = (int)streamReader.GetType().InvokeMember("charPos", BindingFlags.DeclaredOnly | BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.GetField, null, streamReader, null);
        var charLen = (int)streamReader.GetType().InvokeMember("charLen", BindingFlags.DeclaredOnly | BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.GetField, null, streamReader, null);

        var offsetLength = streamReader.CurrentEncoding.GetByteCount(charBuffer, charPos, charLen - charPos);

        return streamReader.BaseStream.Position - offsetLength;
    }

    /// <summary>Sets the position within the <see cref="StreamReader.BaseStream"/> of the <see cref="StreamReader"/>.</summary>
    /// <remarks>
    /// <para><see cref="StreamReader.BaseStream"/> should be seekable.</para>
    /// <para>This method is quite slow. It uses reflection and flushes the charBuffer of the <see cref="StreamReader.BaseStream"/>. Don't use it too often.</para>
    /// </remarks>
    /// <param name="streamReader">Source <see cref="StreamReader"/>.</param>
    /// <param name="position">The point relative to origin from which to begin seeking.</param>
    /// <param name="origin">Specifies the beginning, the end, or the current position as a reference point for origin, using a value of type <see cref="SeekOrigin"/>. </param>
    /// <exception cref="ArgumentNullException">Occurs when passed <see cref="StreamReader"/> is null.</exception>
    /// <exception cref="ArgumentException">Occurs when <see cref="StreamReader.BaseStream"/> is not seekable.</exception>
    /// <returns>The new position in the stream. This position can be different to the <see cref="position"/> because of the preamble.</returns>
    public static long Seek(this StreamReader streamReader, long position, SeekOrigin origin)
    {
        if (streamReader == null)
            throw new ArgumentNullException("streamReader");

        if (!streamReader.BaseStream.CanSeek)
            throw new ArgumentException("Underlying stream should be seekable.", "streamReader");

        var preamble = (byte[])streamReader.GetType().InvokeMember("_preamble", BindingFlags.DeclaredOnly | BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.GetField, null, streamReader, null);
        if (preamble.Length > 0 && position < preamble.Length) // preamble or BOM must be skipped
            position += preamble.Length;

        var newPosition = streamReader.BaseStream.Seek(position, origin); // seek
        streamReader.DiscardBufferedData(); // this updates the buffer

        return newPosition;
    }
}

Though Thomas Levesque's solution works well, here's mine. It uses reflection so it will be slower, but it's encoding-independent. Plus I added Seek extension too.

/// <summary>Useful <see cref="StreamReader"/> extentions.</summary>
public static class StreamReaderExtentions
{
    /// <summary>Gets the position within the <see cref="StreamReader.BaseStream"/> of the <see cref="StreamReader"/>.</summary>
    /// <remarks><para>This method is quite slow. It uses reflection to access private <see cref="StreamReader"/> fields. Don't use it too often.</para></remarks>
    /// <param name="streamReader">Source <see cref="StreamReader"/>.</param>
    /// <exception cref="ArgumentNullException">Occurs when passed <see cref="StreamReader"/> is null.</exception>
    /// <returns>The current position of this stream.</returns>
    public static long GetPosition(this StreamReader streamReader)
    {
        if (streamReader == null)
            throw new ArgumentNullException("streamReader");

        var charBuffer = (char[])streamReader.GetType().InvokeMember("charBuffer", BindingFlags.DeclaredOnly | BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.GetField, null, streamReader, null);
        var charPos = (int)streamReader.GetType().InvokeMember("charPos", BindingFlags.DeclaredOnly | BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.GetField, null, streamReader, null);
        var charLen = (int)streamReader.GetType().InvokeMember("charLen", BindingFlags.DeclaredOnly | BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.GetField, null, streamReader, null);

        var offsetLength = streamReader.CurrentEncoding.GetByteCount(charBuffer, charPos, charLen - charPos);

        return streamReader.BaseStream.Position - offsetLength;
    }

    /// <summary>Sets the position within the <see cref="StreamReader.BaseStream"/> of the <see cref="StreamReader"/>.</summary>
    /// <remarks>
    /// <para><see cref="StreamReader.BaseStream"/> should be seekable.</para>
    /// <para>This method is quite slow. It uses reflection and flushes the charBuffer of the <see cref="StreamReader.BaseStream"/>. Don't use it too often.</para>
    /// </remarks>
    /// <param name="streamReader">Source <see cref="StreamReader"/>.</param>
    /// <param name="position">The point relative to origin from which to begin seeking.</param>
    /// <param name="origin">Specifies the beginning, the end, or the current position as a reference point for origin, using a value of type <see cref="SeekOrigin"/>. </param>
    /// <exception cref="ArgumentNullException">Occurs when passed <see cref="StreamReader"/> is null.</exception>
    /// <exception cref="ArgumentException">Occurs when <see cref="StreamReader.BaseStream"/> is not seekable.</exception>
    /// <returns>The new position in the stream. This position can be different to the <see cref="position"/> because of the preamble.</returns>
    public static long Seek(this StreamReader streamReader, long position, SeekOrigin origin)
    {
        if (streamReader == null)
            throw new ArgumentNullException("streamReader");

        if (!streamReader.BaseStream.CanSeek)
            throw new ArgumentException("Underlying stream should be seekable.", "streamReader");

        var preamble = (byte[])streamReader.GetType().InvokeMember("_preamble", BindingFlags.DeclaredOnly | BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.GetField, null, streamReader, null);
        if (preamble.Length > 0 && position < preamble.Length) // preamble or BOM must be skipped
            position += preamble.Length;

        var newPosition = streamReader.BaseStream.Seek(position, origin); // seek
        streamReader.DiscardBufferedData(); // this updates the buffer

        return newPosition;
    }
}
霊感 2024-09-04 04:39:03

这行得通吗:

using (var sr = new StreamReader(@"C:\Temp\LineTest.txt")) {
  string line;
  long pos = 0;
  while ((line = sr.ReadLine()) != null) {
    Console.Write("{0:d3} ", pos);
    Console.WriteLine(line);
    pos += line.Length;
  }
}

Would this work:

using (var sr = new StreamReader(@"C:\Temp\LineTest.txt")) {
  string line;
  long pos = 0;
  while ((line = sr.ReadLine()) != null) {
    Console.Write("{0:d3} ", pos);
    Console.WriteLine(line);
    pos += line.Length;
  }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文