c# - StreamReader 和查找
您可以使用StreamReader
读取普通文本文件,然后在读取过程中保存当前位置后关闭StreamReader
,然后再次打开StreamReader
并从那个位置开始阅读?
如果不是,我还能用什么来完成相同的情况而不锁定文件?
我尝试过这个,但它不起作用:
var fs = File.Open(@ "C:\testfile.txt", FileMode.Open, FileAccess.Read);
var sr = new StreamReader(fs);
Debug.WriteLine(sr.ReadLine()); //Prints:firstline
var pos = fs.Position;
while (!sr.EndOfStream)
{
Debug.WriteLine(sr.ReadLine());
}
fs.Seek(pos, SeekOrigin.Begin);
Debug.WriteLine(sr.ReadLine());
//Prints Nothing, i expect it to print SecondLine.
这是我也尝试过的其他代码:
var position = -1;
StreamReaderSE sr = new StreamReaderSE(@ "c:\testfile.txt");
Debug.WriteLine(sr.ReadLine());
position = sr.BytesRead;
Debug.WriteLine(sr.ReadLine());
Debug.WriteLine(sr.ReadLine());
Debug.WriteLine(sr.ReadLine());
Debug.WriteLine(sr.ReadLine());
Debug.WriteLine("Wait");
sr.BaseStream.Seek(position, SeekOrigin.Begin);
Debug.WriteLine(sr.ReadLine());
Can you use StreamReader
to read a normal textfile and then in the middle of reading close the StreamReader
after saving the current position and then open StreamReader
again and start reading from that poistion ?
If not what else can I use to accomplish the same case without locking the file ?
I tried this but it doesn't work:
var fs = File.Open(@ "C:\testfile.txt", FileMode.Open, FileAccess.Read);
var sr = new StreamReader(fs);
Debug.WriteLine(sr.ReadLine()); //Prints:firstline
var pos = fs.Position;
while (!sr.EndOfStream)
{
Debug.WriteLine(sr.ReadLine());
}
fs.Seek(pos, SeekOrigin.Begin);
Debug.WriteLine(sr.ReadLine());
//Prints Nothing, i expect it to print SecondLine.
Here is the other code I also tried :
var position = -1;
StreamReaderSE sr = new StreamReaderSE(@ "c:\testfile.txt");
Debug.WriteLine(sr.ReadLine());
position = sr.BytesRead;
Debug.WriteLine(sr.ReadLine());
Debug.WriteLine(sr.ReadLine());
Debug.WriteLine(sr.ReadLine());
Debug.WriteLine(sr.ReadLine());
Debug.WriteLine("Wait");
sr.BaseStream.Seek(position, SeekOrigin.Begin);
Debug.WriteLine(sr.ReadLine());
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我意识到这确实太晚了,但我自己偶然发现了 StreamReader 中的这个令人难以置信的缺陷;事实上,使用
StreamReader
时您无法可靠地进行查找。就我个人而言,我的具体需求是具有读取字符的能力,但如果满足特定条件则“备份”;这是我正在解析的文件格式之一的副作用。使用 ReadLine() 不是一个选项,因为它仅在非常琐碎的解析作业中有用。我必须支持可配置的记录/行分隔符序列并支持转义分隔符序列。另外,我不想实现自己的缓冲区,这样我就可以支持“备份”和转义序列;这应该是 StreamReader 的工作。
此方法按需计算底层字节流中的实际位置。它适用于 UTF8、UTF-16LE、UTF-16BE、UTF-32LE、UTF-32BE 和任何单字节编码(例如代码页 1252、437、28591 等),无论是否存在前导码/BOM。此版本不适用于 UTF-7、Shift-JIS 或其他可变字节编码。
当我需要寻找底层流中的任意位置时,我直接设置
BaseStream.Position
,然后调用DiscardBufferedData()
来获取StreamReader
为下一个Read()
/Peek()
调用恢复同步。友情提醒:不要随意设置
BaseStream.Position
。如果将一个字符一分为二,则将使下一个Read()
无效,并且对于 UTF-16/-32,还会使该方法的结果无效。当然,这使用反射来获取私有变量,因此存在风险。但是,此方法适用于 .Net 2.0、3.0、3.5、4.0、4.0.3、4.5、4.5.1、4.5.2、4.6 和 4.6.1。除了这个风险之外,唯一的另一个关键假设是底层字节缓冲区是一个字节[1024];如果 Microsoft 以错误的方式更改它,该方法将中断 UTF-16/-32。
已针对填充有
Ažテ
I realize this is really belated, but I just stumbled onto this incredible flaw in
StreamReader
myself; the fact that you can't reliably seek when usingStreamReader
. Personally, my specific need is to have the ability to read characters, but then "back up" if a certain condition is met; it's a side effect of one of the file formats I'm parsing.Using
ReadLine()
isn't an option because it's only useful in really trivial parsing jobs. I have to support configurable record/line delimiter sequences and support escape delimiter sequences. Also, I don't want to implement my own buffer so I can support "backing up" and escape sequences; that should be theStreamReader
's job.This method calculates the actual position in the underlying stream of bytes on-demand. It works for UTF8, UTF-16LE, UTF-16BE, UTF-32LE, UTF-32BE, and any single-byte encoding (e.g. code pages 1252, 437, 28591, etc.), regardless the presence of a preamble/BOM. This version will not work for UTF-7, Shift-JIS, or other variable-byte encodings.
When I need to seek to an arbitrary position in the underlying stream, I directly set
BaseStream.Position
and then callDiscardBufferedData()
to getStreamReader
back in sync for the nextRead()
/Peek()
call.And a friendly reminder: don't arbitrarily set
BaseStream.Position
. If you bisect a character, you'll invalidate the nextRead()
and, for UTF-16/-32, you'll also invalidate the result of this method.Of course, this uses Reflection to get at private variables, so there is risk involved. However, this method works with .Net 2.0, 3.0, 3.5, 4.0, 4.0.3, 4.5, 4.5.1, 4.5.2, 4.6, and 4.6.1. Beyond that risk, the only other critical assumption is that the underlying byte-buffer is a
byte[1024]
; if Microsoft changes it the wrong way, the method breaks for UTF-16/-32.This has been tested against a UTF-8 file filled with
Ažテ????
(10 bytes:0x41 C5 BE E3 83 86 F0 A3 98 BA
) and a UTF-16 file filled withA????
(6 bytes:0x41 00 01 D8 37 DC
). The point being to force-fragment characters along thebyte[1024]
boundaries, all the different ways they could be.UPDATE (2013-07-03): I fixed the method, which originally used the broken code from that other answer. This version has been tested against data containing a characters requiring use of surrogate pairs. The data was put into 3 files, each with a different encoding; one UTF-8, one UTF-16LE, and one UTF-16BE.
UPDATE (2016-02): The only correct way to handle bisected characters is to directly interpret the underlying bytes. UTF-8 is properly handled, and UTF-16/-32 work (given the length of byteBuffer).
是的,您可以,请参阅:
更新:
请注意,您不一定可以将 sr.BaseStream.Position 用于任何有用的事情,因为 StreamReader 使用缓冲区,因此它不会反映您实际读取的内容。我想你会很难找到真正的位置。因为你不能只计算字符(不同的编码,因此字符长度)。我认为最好的方法是使用 FileStream 本身。
更新:
使用此处的
TGREER.myStreamReader
:http://www.daniweb.com/software-development/csharp/threads/35078
这个类添加了 BytesRead 等(与 ReadLine() 一起使用,但显然不适用于其他读取方法)
然后你可以这样做:
Yes you can, see this:
Update:
Be aware that you can't necessarily use
sr.BaseStream.Position
to anything useful becauseStreamReader
uses buffers so it will not reflect what you actually have read. I guess you gonna have problems finding the true position. Because you can't just count characters (different encodings and therefore character lengths). I think the best way is to work withFileStream
´s themselves.Update:
Use the
TGREER.myStreamReader
from here:http://www.daniweb.com/software-development/csharp/threads/35078
this class adds
BytesRead
etc. (works withReadLine()
but apparently not with other reads methods)and then you can do like this:
如果您只想搜索文本流中的起始位置,我将此扩展添加到 StreamReader 中,以便我可以确定应在何处进行流的编辑。当然,这是基于字符作为逻辑的递增方面,但就我的目的而言,它非常有效,可以根据字符串模式获取基于文本/ASCII 的文件中的位置。然后,您可以使用该位置作为读取的起点,写入一个新文件,其中排除起点之前的数据。
流中返回的位置可以提供给 Seek,以从基于文本的流读取中的该位置开始。有用。我已经测试过了。但是,在匹配算法期间匹配非 ASCII Unicode 字符时可能会出现问题。这是基于美式英语和相关字符页面。
基础知识:它逐个字符地扫描文本流,仅在流中向前查找顺序字符串模式(与字符串参数匹配)。一旦模式与字符串参数不匹配(即向前,逐个字符),那么它将重新开始(从当前位置)尝试逐个字符地匹配。如果在流中找不到匹配项,它最终会退出。如果找到匹配项,则它会返回流中当前的“字符”位置,而不是 StreamReader.BaseStream.Position,因为该位置基于 StreamReader 所做的缓冲而位于前面。
如注释中所示,此方法将影响 StreamReader 的位置,并且在方法结束时它将被设置回开头 (0)。 StreamReader.BaseStream.Seek 应该用于运行到此扩展返回的位置。
注意:在处理文本文件时,此扩展返回的位置也将与 BinaryReader.Seek 一起用作起始位置。实际上,我使用此逻辑来将 PostScript 文件重写回磁盘,然后丢弃 PJL 标头信息以使该文件成为可由 GhostScript 使用的“正确”PostScript 可读文件。 :)
在 PostScript 中(在 PJL 标头之后)搜索的字符串是:“%!PS-”,后面跟着“Adobe”和版本。
If you want to just search for a start position within a text stream, I added this extension to StreamReader so that I could determine where the edit of the stream should occur. Granted, this is based upon characters as the incrementing aspect of the logic, but for my purposes, it works great, for getting the position within a text/ASCII based file based upon a string pattern. Then, you can use that location as a start point for reading, to write a new file that discludes the data prior to the start point.
The returned position within the stream can be provided to Seek to start from that position within text-based stream reads. It works. I've tested it. However, there may be issues when matching to non-ASCII Unicode chars during the matching algorithm. This was based upon American English and the associated character page.
Basics: it scans through a text stream, character-by-character, looking for the sequential string pattern (that matches the string parameter) forward only through the stream. Once the pattern doesn't match the string parameter (i.e. going forward, char by char), then it will start over (from the current position) trying to get a match, char-by-char. It will eventually quit if the match can't be found in the stream. If the match is found, then it returns the current "character" position within the stream, not the StreamReader.BaseStream.Position, as that position is ahead, based on the buffering that the StreamReader does.
As indicated in the comments, this method WILL affect the position of the StreamReader, and it will be set back to the beginning (0) at the end of the method. StreamReader.BaseStream.Seek should be used to run to the position returned by this extension.
Note: the position returned by this extension will also work with BinaryReader.Seek as a start position when working with text files. I actually used this logic for that purpose to rewrite a PostScript file back to disk, after discarding the PJL header information to make the file a "proper" PostScript readable file that could be consumed by GhostScript. :)
The string to search for within the PostScript (after the PJL header) is: "%!PS-", which is followed by "Adobe" and the version.
来自 MSDN:
在大多数涉及
StreamReader
的示例中,您将看到使用 ReadLine() 逐行读取。 Seek 方法来自Stream
类,该类主要用于读取或处理字节数据。From MSDN:
In most of the examples involving
StreamReader
, you will see reading line by line using the ReadLine(). The Seek method comes fromStream
class which is basically used to read or handle data in bytes.由于正在发生底层缓冲,FileStream.Position(或等效的 StreamReader.BaseStream.Position)通常会领先(可能远远领先)于 TextReader 位置。
如果您可以确定如何在文本文件中处理换行符,则可以根据行长度和行尾字符将读取的字节数相加。
对于更复杂的文本文件编码,您可能需要比这更高级,但它对我有用。
FileStream.Position (or equivalently, StreamReader.BaseStream.Position) will usually be ahead -- possibly way ahead -- of the TextReader position because of the underlying buffering taking place.
If you can determine how newlines are handled in your text files, you can add up the number of bytes read based on line lengths and end-of-line characters.
For more complex text file encodings you might need to get fancier than this, but it worked for me.
我发现上面的建议对我不起作用——我的用例是只需要备份一个读取位置(我使用默认编码一次读取一个字符)。我的简单解决方案受到上述评论的启发...您的里程可能会有所不同...
我在阅读之前保存了 BaseStream.Position,然后确定是否需要备份...如果是,则设置位置并调用 DiscardBufferedData()。
I found the suggestions above to not work for me -- my use case was to simply need to back up one read position (I'm reading one char at a time with a default encoding). My simple solution was inspired by above commentary ... your mileage may vary...
I saved the BaseStream.Position before reading, then determined if I needed to back up... if yes, then set position and invoke DiscardBufferedData().