当他的 BaseStream 有 BOM 时,将 StreamReader 返回到开始
我正在寻找一种万无一失的方法来将 StreamReader 重置为开始,特别是当他的底层 BaseStream 以 BOM 开头时,但在不存在 BOM 时也必须工作。创建一个从流开头读取的新 StreamReader 也是可以接受的。
原始 StreamReader 可以使用任何编码来创建,并且 detectorEncodingFromByteOrderMarks 设置为 true 或 false。此外,在调用复位之前可以已完成或未完成读取。
流可以是随机文本,以字节 0xef、0xbb、0xbf 开头的文件可以是带有 BOM 的文件或以有效字符序列开头的文件(例如,如果使用 ISO-8859-1 编码,则为 关于创建 StreamReader 时使用的参数。
我见过其他解决方案,但当BaseStream 以 BOM 开头。 StreamReader 会记住它已经检测到 BOM,并且执行读取时返回的第一个字符是特殊 BOM 字符。
我还可以创建一个新的 StreamReader,但我不知道原始 StreamReader 是在将 detectorEncodingFromByteOrderMarks 设置为 true 还是设置为 false 的情况下创建的。
这是我首先尝试过的:
//fails with TestMethod1
void ResetStream1(ref StreamReader sr) {
sr.BaseStream.Position = 0;
sr.DiscardBufferedData();
}
//fails with TestMethod2
void ResetStream2(ref StreamReader sr) {
sr.BaseStream.Position = 0;
sr = new StreamReader(sr.BaseStream, sr.CurrentEncoding, true);
}
//fails with TestMethod3
void ResetStream3(ref StreamReader sr) {
sr.BaseStream.Position = 0;
sr = new StreamReader(sr.BaseStream, sr.CurrentEncoding, false);
}
这些是最好的方法:
Stream StreamWithBOM = new MemoryStream(new byte[] {0xef,0xbb,0xbf,(byte)'X'});
[TestMethod]
public void TestMethod1() {
StreamReader sr=new StreamReader(StreamWithBOM);
int before=sr.Read(); //reads X
ResetStream(ref sr);
int after=sr.Read();
Assert.AreEqual(before, after);
}
[TestMethod]
public void TestMethod2() {
StreamReader sr = new StreamReader(StreamWithBOM,Encoding.GetEncoding("ISO-8859-1"),false);
int before = sr.Read(); //reads ï
ResetStream(ref sr);
int after = sr.Read();
Assert.AreEqual(before, after);
}
[TestMethod]
public void TestMethod3() {
StreamReader sr = new StreamReader(StreamWithBOM, Encoding.GetEncoding("ISO-8859-1"), true);
int expected = (int)'X'; //no Read() done before reset
ResetStream(ref sr);
int after = sr.Read();
Assert.AreEqual(expected, after);
}
最后,我找到了一个通过所有 3 个测试的解决方案(请参阅我自己的答案),但我想看看是否可以有更优雅或更快速的解决方案。
I'm looking for an infallible way to reset an StreamReader to beggining, particularly when his underlying BaseStream starts with BOM, but must also work when no BOM is present. Creating a new StreamReader which reads from the beginning of the stream is also acceptable.
The original StreamReader can be created with any encoding and with detectEncodingFromByteOrderMarks set either to true or false. Also, a read can have been done or not prior calling reset.
The Stream can be random text, and files starting with bytes 0xef,0xbb,0xbf can be files with a BOM or files starting with a valid sequence of characters (for example  if ISO-8859-1 encoding is used), depending on the parameters used when the StreamReader was created.
I've seen other solutions, but they don't work properly when the BaseStream starts with BOM. The StreamReader remembers that it has already detected the BOM, and the first character that is returned when a read is performed is the special BOM character.
Also I can create a new StreamReader, but I can't know if the original StreamReader was created with detectEncodingFromByteOrderMarks set to true or set to false.
This is what I have tried first:
//fails with TestMethod1
void ResetStream1(ref StreamReader sr) {
sr.BaseStream.Position = 0;
sr.DiscardBufferedData();
}
//fails with TestMethod2
void ResetStream2(ref StreamReader sr) {
sr.BaseStream.Position = 0;
sr = new StreamReader(sr.BaseStream, sr.CurrentEncoding, true);
}
//fails with TestMethod3
void ResetStream3(ref StreamReader sr) {
sr.BaseStream.Position = 0;
sr = new StreamReader(sr.BaseStream, sr.CurrentEncoding, false);
}
And those are the thest methods:
Stream StreamWithBOM = new MemoryStream(new byte[] {0xef,0xbb,0xbf,(byte)'X'});
[TestMethod]
public void TestMethod1() {
StreamReader sr=new StreamReader(StreamWithBOM);
int before=sr.Read(); //reads X
ResetStream(ref sr);
int after=sr.Read();
Assert.AreEqual(before, after);
}
[TestMethod]
public void TestMethod2() {
StreamReader sr = new StreamReader(StreamWithBOM,Encoding.GetEncoding("ISO-8859-1"),false);
int before = sr.Read(); //reads ï
ResetStream(ref sr);
int after = sr.Read();
Assert.AreEqual(before, after);
}
[TestMethod]
public void TestMethod3() {
StreamReader sr = new StreamReader(StreamWithBOM, Encoding.GetEncoding("ISO-8859-1"), true);
int expected = (int)'X'; //no Read() done before reset
ResetStream(ref sr);
int after = sr.Read();
Assert.AreEqual(expected, after);
}
Finally, I found a solution (see my own answer) which passes all 3 tests, but I want to see if a more ellegant or fast solution is possible.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这样做不需要创建新的 StreamReader:
如果没有 BOM,GetPreamble() 将返回一个空字节数组。
不管有没有 BOM,这都应该可以工作,因为 UTF8Encoding 类(以及其他类,例如 UTF32Encoding、UnicodeEncoding)有一个内部字段,用于跟踪是否包含 BOM,并在您第一次执行 Read() 时由 StreamReader 设置。
但是,您似乎需要将 Encoding 传递给 StreamReader 构造函数,并关闭 BOM 标识符标志,然后它将正确检测 BOM 的存在。如果您只是简单地将流作为唯一参数传递(如上面的 TestMethod1 中所示),那么由于某种原因,即使您的流没有 BOM,它也会将 CurrentEncoding 设置为带 BOM 的 UTF8。将 detectorEncodingFromByteOrderMarks 设置为 true 也没有帮助,因为默认为 true。
下面的测试都通过了,因为 UTF8Encoding 默认关闭 BOM。
This does the trick without needing to create a new StreamReader:
GetPreamble() will return an empty byte array if there is no BOM.
This should work with or without the BOM because the UTF8Encoding class (and others, e.g. UTF32Encoding, UnicodeEncoding) has an internal field which tracks whether the BOM is included and is set by the StreamReader when you first do a Read().
However, it seems you need to pass in an Encoding to the StreamReader constructor with the BOM identifier flag turned off, and it will then correctly detect the presence of the BOM. If you just simply pass the stream as the only parameter, as in TestMethod1 above, then for some reason it sets the CurrentEncoding to UTF8 with BOM even if your stream has no BOM. Setting the detectEncodingFromByteOrderMarks to true does not help either, as this defaults to true.
The tests below both pass, because default for UTF8Encoding is to have BOM off.