当他的 BaseStream 有 BOM 时,将 StreamReader 返回到开始

发布于 2024-11-17 03:54:58 字数 2334 浏览 5 评论 0原文

我正在寻找一种万无一失的方法来将 StreamReader 重置为开始,特别是当他的底层 BaseStream 以 BOM 开头时,但在不存在 BOM 时也必须工作。创建一个从流开头读取的新 StreamReader 也是可以接受的。

原始 StreamReader 可以使用任何编码来创建,并且 detectorEncodingFromByteOrderMarks 设置为 true 或 false。此外,在调用复位之前可以已完成或未完成读取。

流可以是随机文本,以字节 0xef、0xbb、0xbf 开头的文件可以是带有 BOM 的文件或以有效字符序列开头的文件(例如,如果使用 ISO-8859-1 编码,则为 关于创建 StreamReader 时使用的参数。

我见过其他解决方案,但当BaseStream 以 BOM 开头。 StreamReader 会记住它已经检测到 BOM,并且执行读取时返回的第一个字符是特殊 BOM 字符。

我还可以创建一个新的 StreamReader,但我不知道原始 StreamReader 是在将 detectorEncodingFromByteOrderMarks 设置为 true 还是设置为 false 的情况下创建的。

这是我首先尝试过的:

    //fails with TestMethod1
    void ResetStream1(ref StreamReader sr) {
        sr.BaseStream.Position = 0;
        sr.DiscardBufferedData();
    }

    //fails with TestMethod2
    void ResetStream2(ref StreamReader sr) {
        sr.BaseStream.Position = 0;
        sr = new StreamReader(sr.BaseStream, sr.CurrentEncoding, true);
    }

    //fails with TestMethod3
    void ResetStream3(ref StreamReader sr) {
        sr.BaseStream.Position = 0;
        sr = new StreamReader(sr.BaseStream, sr.CurrentEncoding, false);
    }

这些是​​最好的方法:

    Stream StreamWithBOM = new MemoryStream(new byte[] {0xef,0xbb,0xbf,(byte)'X'});


    [TestMethod]
    public void TestMethod1() {
        StreamReader sr=new StreamReader(StreamWithBOM);
        int before=sr.Read(); //reads X

        ResetStream(ref sr);
        int after=sr.Read();

        Assert.AreEqual(before, after);
    }

    [TestMethod]
    public void TestMethod2() {
        StreamReader sr = new StreamReader(StreamWithBOM,Encoding.GetEncoding("ISO-8859-1"),false);
        int before = sr.Read(); //reads ï

        ResetStream(ref sr);
        int after = sr.Read();

        Assert.AreEqual(before, after);
    }

    [TestMethod]
    public void TestMethod3() {
        StreamReader sr = new StreamReader(StreamWithBOM, Encoding.GetEncoding("ISO-8859-1"), true);
        int expected = (int)'X'; //no Read() done before reset

        ResetStream(ref sr);
        int after = sr.Read();

        Assert.AreEqual(expected, after);
    }

最后,我找到了一个通过所有 3 个测试的解决方案(请参阅我自己的答案),但我想看看是否可以有更优雅或更快速的解决方案。

I'm looking for an infallible way to reset an StreamReader to beggining, particularly when his underlying BaseStream starts with BOM, but must also work when no BOM is present. Creating a new StreamReader which reads from the beginning of the stream is also acceptable.

The original StreamReader can be created with any encoding and with detectEncodingFromByteOrderMarks set either to true or false. Also, a read can have been done or not prior calling reset.

The Stream can be random text, and files starting with bytes 0xef,0xbb,0xbf can be files with a BOM or files starting with a valid sequence of characters (for example  if ISO-8859-1 encoding is used), depending on the parameters used when the StreamReader was created.

I've seen other solutions, but they don't work properly when the BaseStream starts with BOM. The StreamReader remembers that it has already detected the BOM, and the first character that is returned when a read is performed is the special BOM character.

Also I can create a new StreamReader, but I can't know if the original StreamReader was created with detectEncodingFromByteOrderMarks set to true or set to false.

This is what I have tried first:

    //fails with TestMethod1
    void ResetStream1(ref StreamReader sr) {
        sr.BaseStream.Position = 0;
        sr.DiscardBufferedData();
    }

    //fails with TestMethod2
    void ResetStream2(ref StreamReader sr) {
        sr.BaseStream.Position = 0;
        sr = new StreamReader(sr.BaseStream, sr.CurrentEncoding, true);
    }

    //fails with TestMethod3
    void ResetStream3(ref StreamReader sr) {
        sr.BaseStream.Position = 0;
        sr = new StreamReader(sr.BaseStream, sr.CurrentEncoding, false);
    }

And those are the thest methods:

    Stream StreamWithBOM = new MemoryStream(new byte[] {0xef,0xbb,0xbf,(byte)'X'});


    [TestMethod]
    public void TestMethod1() {
        StreamReader sr=new StreamReader(StreamWithBOM);
        int before=sr.Read(); //reads X

        ResetStream(ref sr);
        int after=sr.Read();

        Assert.AreEqual(before, after);
    }

    [TestMethod]
    public void TestMethod2() {
        StreamReader sr = new StreamReader(StreamWithBOM,Encoding.GetEncoding("ISO-8859-1"),false);
        int before = sr.Read(); //reads ï

        ResetStream(ref sr);
        int after = sr.Read();

        Assert.AreEqual(before, after);
    }

    [TestMethod]
    public void TestMethod3() {
        StreamReader sr = new StreamReader(StreamWithBOM, Encoding.GetEncoding("ISO-8859-1"), true);
        int expected = (int)'X'; //no Read() done before reset

        ResetStream(ref sr);
        int after = sr.Read();

        Assert.AreEqual(expected, after);
    }

Finally, I found a solution (see my own answer) which passes all 3 tests, but I want to see if a more ellegant or fast solution is possible.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

似最初 2024-11-24 03:54:58
    //pass all 3 tests
    void ResetStream(ref StreamReader sr){
        sr.Read(); //ensure that BOM is detected if configured to do so
        sr.BaseStream.Position=0;
        sr=new StreamReader(sr.BaseStream, sr.CurrentEncoding, false);
    }
    //pass all 3 tests
    void ResetStream(ref StreamReader sr){
        sr.Read(); //ensure that BOM is detected if configured to do so
        sr.BaseStream.Position=0;
        sr=new StreamReader(sr.BaseStream, sr.CurrentEncoding, false);
    }
半山落雨半山空 2024-11-24 03:54:58

这样做不需要创建新的 StreamReader:

  void ResetStream(StreamReader sr)
  {
      sr.BaseStream.Position = sr.CurrentEncoding.GetPreamble().Length;
      sr.DiscardBufferedData();
  }

如果没有 BOM,GetPreamble() 将返回一个空字节数组。

不管有没有 BOM,这都应该可以工作,因为 UTF8Encoding 类(以及其他类,例如 UTF32Encoding、UnicodeEncoding)有一个内部字段,用于跟踪是否包含 BOM,并在您第一次执行 Read() 时由 StreamReader 设置。

但是,您似乎需要将 Encoding 传递给 StreamReader 构造函数,并关闭 BOM 标识符标志,然后它将正确检测 BOM 的存在。如果您只是简单地将流作为唯一参数传递(如上面的 TestMethod1 中所示),那么由于某种原因,即使您的流没有 BOM,它也会将 CurrentEncoding 设置为带 BOM 的 UTF8。将 detectorEncodingFromByteOrderMarks 设置为 true 也没有帮助,因为默认为 true。

下面的测试都通过了,因为 UTF8Encoding 默认关闭 BOM。

    Stream StreamWithBOM = new MemoryStream(new byte[] { 0xef, 0xbb, 0xbf, (byte)'X' });
    Stream StreamWithoutBOM = new MemoryStream(new byte[] { (byte)'X' });

    [TestMethod]
    public void TestMethod4()
    {
        StreamReader sr = new StreamReader(StreamWithBOM, new UTF8Encoding());
        int before = sr.Read(); //reads X

        ResetStream(sr);
        int after = sr.Read();

        Assert.AreEqual(before, after);
    }

    [TestMethod]
    public void TestMethod5()
    {
        StreamReader sr = new StreamReader(StreamWithoutBOM, new UTF8Encoding());
        int before = sr.Read(); //reads X

        ResetStream(sr);
        int after = sr.Read();

        Assert.AreEqual(before, after);
    }

This does the trick without needing to create a new StreamReader:

  void ResetStream(StreamReader sr)
  {
      sr.BaseStream.Position = sr.CurrentEncoding.GetPreamble().Length;
      sr.DiscardBufferedData();
  }

GetPreamble() will return an empty byte array if there is no BOM.

This should work with or without the BOM because the UTF8Encoding class (and others, e.g. UTF32Encoding, UnicodeEncoding) has an internal field which tracks whether the BOM is included and is set by the StreamReader when you first do a Read().

However, it seems you need to pass in an Encoding to the StreamReader constructor with the BOM identifier flag turned off, and it will then correctly detect the presence of the BOM. If you just simply pass the stream as the only parameter, as in TestMethod1 above, then for some reason it sets the CurrentEncoding to UTF8 with BOM even if your stream has no BOM. Setting the detectEncodingFromByteOrderMarks to true does not help either, as this defaults to true.

The tests below both pass, because default for UTF8Encoding is to have BOM off.

    Stream StreamWithBOM = new MemoryStream(new byte[] { 0xef, 0xbb, 0xbf, (byte)'X' });
    Stream StreamWithoutBOM = new MemoryStream(new byte[] { (byte)'X' });

    [TestMethod]
    public void TestMethod4()
    {
        StreamReader sr = new StreamReader(StreamWithBOM, new UTF8Encoding());
        int before = sr.Read(); //reads X

        ResetStream(sr);
        int after = sr.Read();

        Assert.AreEqual(before, after);
    }

    [TestMethod]
    public void TestMethod5()
    {
        StreamReader sr = new StreamReader(StreamWithoutBOM, new UTF8Encoding());
        int before = sr.Read(); //reads X

        ResetStream(sr);
        int after = sr.Read();

        Assert.AreEqual(before, after);
    }
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文