逐行读取文本文件,并提供精确的偏移/位置报告
我的简单要求:读取一个巨大的(>一百万)行测试文件(对于本示例,假设它是某种 CSV)并保留对该行开头的引用,以便将来更快地查找(读取一行,开始在X)。
我首先尝试了简单而简单的方法,使用 StreamWriter 并访问底层的 BaseStream.Position。不幸的是,这并没有按照我的预期工作:
给定一个包含以下内容的文件
Foo
Bar
Baz
Bla
Fasel
和这个非常简单的代码,
using (var sr = new StreamReader(@"C:\Temp\LineTest.txt")) {
string line;
long pos = sr.BaseStream.Position;
while ((line = sr.ReadLine()) != null) {
Console.Write("{0:d3} ", pos);
Console.WriteLine(line);
pos = sr.BaseStream.Position;
}
}
输出是:
000 Foo
025 Bar
025 Baz
025 Bla
025 Fasel
我可以想象流正在尝试提供帮助/高效,并且每当有新数据时可能会读取(大)块必要的。对我来说这很糟糕..
最后的问题是:在逐行读取文件时获取(字节,字符)偏移量的任何方法,而不使用基本流并弄乱 \r \n \r\n 和字符串编码等手动?没什么大不了的,真的,我只是不喜欢构建可能已经存在的东西..
My simple requirement: Reading a huge (> a million) line test file (For this example assume it's a CSV of some sorts) and keeping a reference to the beginning of that line for faster lookup in the future (read a line, starting at X).
I tried the naive and easy way first, using a StreamWriter
and accessing the underlying BaseStream.Position
. Unfortunately that doesn't work as I intended:
Given a file containing the following
Foo
Bar
Baz
Bla
Fasel
and this very simple code
using (var sr = new StreamReader(@"C:\Temp\LineTest.txt")) {
string line;
long pos = sr.BaseStream.Position;
while ((line = sr.ReadLine()) != null) {
Console.Write("{0:d3} ", pos);
Console.WriteLine(line);
pos = sr.BaseStream.Position;
}
}
the output is:
000 Foo
025 Bar
025 Baz
025 Bla
025 Fasel
I can imagine that the stream is trying to be helpful/efficient and probably reads in (big) chunks whenever new data is necessary. For me this is bad..
The question, finally: Any way to get the (byte, char) offset while reading a file line by line without using a basic Stream and messing with \r \n \r\n and string encoding etc. manually? Not a big deal, really, I just don't like to build things that might exist already..
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您可以创建一个
TextReader
包装器,它将跟踪基本TextReader
中的当前位置:然后您可以按如下方式使用它:
You could create a
TextReader
wrapper, which would track the current position in the baseTextReader
:You could then use it as follows :
经过搜索、测试并做了一些疯狂的事情之后,我的代码需要解决(我目前在我的产品中使用此代码)。
After searching, testing and do something crazy, there is my code to solve (I'm currently using this code in my product).
这确实是一个棘手的问题。
在互联网上对不同解决方案进行了漫长而疲惫的枚举(包括此线程中的解决方案,谢谢!)之后,我不得不创建自己的自行车。
我有以下要求:
<强>稳定 - 在使用过程中单字节错误立即可见。对我来说不幸的是,我发现的几个实现都存在稳定性问题
This is really tough issue.
After very long and exhausting enumeration of different solutions in the internet (including solutions from this thread, thank you!) I had to create my own bicycle.
I had following requirements:
Stable - single byte error was immediately visible during usage. Unfortunately for me, several implementations I found were with stability problems
尽管托马斯·莱维斯克的解决方案效果很好,但这是我的。它使用反射,因此速度会较慢,但它与编码无关。另外我也添加了 Seek 扩展。
Though Thomas Levesque's solution works well, here's mine. It uses reflection so it will be slower, but it's encoding-independent. Plus I added Seek extension too.
这行得通吗:
Would this work: