StreamWriter 和 UTF-8 字节顺序标记

发布于 2024-10-21 11:56:48 字数 248 浏览 7 评论 0原文

我遇到了 StreamWriter 和字节顺序标记的问题。该文档似乎指出 Encoding.UTF8 编码已启用字节顺序标记,但在写入文件时,有些文件具有标记,而其他文件则没有。

我正在通过以下方式创建流编写器:

this.Writer = new StreamWriter(this.Stream, System.Text.Encoding.UTF8);

任何关于可能发生的事情的想法将不胜感激。

I'm having an issue with StreamWriter and Byte Order Marks. The documentation seems to state that the Encoding.UTF8 encoding has byte order marks enabled but when files are being written some have the marks while other don't.

I'm creating the stream writer in the following way:

this.Writer = new StreamWriter(this.Stream, System.Text.Encoding.UTF8);

Any ideas on what could be happening would be appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

烏雲後面有陽光 2024-10-28 11:56:48

正如有人已经指出的那样,不带编码参数的调用就可以解决问题。
但是,如果您想明确,请尝试以下操作:

using (var sw = new StreamWriter(this.Stream, new UTF8Encoding(false)))

要禁用 BOM,关键是使用 new UTF8Encoding(false) 进行构造,而不仅仅是 Encoding.UTF8Encoding。这与调用不带编码参数的 StreamWriter 相同,在内部它只是做同样的事情。

要启用 BOM,请改用 new UTF8Encoding(true)

更新:自 Windows 10 v1903 起,在 notepad.exe 中另存为 UTF-8 时,BOM 字节现在是一个选择加入功能。

As someone pointed that out already, calling without the encoding argument does the trick.
However, if you want to be explicit, try this:

using (var sw = new StreamWriter(this.Stream, new UTF8Encoding(false)))

To disable BOM, the key is to construct with a new UTF8Encoding(false), instead of just Encoding.UTF8Encoding. This is the same as calling StreamWriter without the encoding argument, internally it's just doing the same thing.

To enable BOM, use new UTF8Encoding(true) instead.

Update: Since Windows 10 v1903, when saving as UTF-8 in notepad.exe, BOM byte is now an opt-in feature instead.

大姐,你呐 2024-10-28 11:56:48

该问题是由于您使用静态 UTF8<Encoding 上的 /code> 属性。

当调用 GetPreamble 方法 时由 UTF8 属性返回的 Encoding 类的实例,它返回字节顺序标记(三个字符的字节数组)并在任何其他内容之前写入流被写入流(假设是一个新流)。

您可以通过创建 UTF8Encoding 类的实例来避免这种情况< /a> 自己,如下所示:

// As before.
this.Writer = new StreamWriter(this.Stream, 
    // Create yourself, passing false will prevent the BOM from being written.
    new System.Text.UTF8Encoding());

根据 默认无参数构造函数 的文档(强调我的):

此构造函数创建一个实例,不提供 Unicode 字节顺序标记,并且在检测到无效编码时不会引发异常。

这意味着对 GetPreamble 的调用将返回一个空数组,因此不会将 BOM 写入底层流。

The issue is due to the fact that you are using the static UTF8 property on the Encoding class.

When the GetPreamble method is called on the instance of the Encoding class returned by the UTF8 property, it returns the byte order mark (the byte array of three characters) and is written to the stream before any other content is written to the stream (assuming a new stream).

You can avoid this by creating the instance of the UTF8Encoding class yourself, like so:

// As before.
this.Writer = new StreamWriter(this.Stream, 
    // Create yourself, passing false will prevent the BOM from being written.
    new System.Text.UTF8Encoding());

As per the documentation for the default parameterless constructor (emphasis mine):

This constructor creates an instance that does not provide a Unicode byte order mark and does not throw an exception when an invalid encoding is detected.

This means that the call to GetPreamble will return an empty array, and therefore no BOM will be written to the underlying stream.

献世佛 2024-10-28 11:56:48

我的答案基于 HelloSam 的答案,其中包含所有必要的信息。
只是我相信OP所要求的是如何确保BOM被发送到文件中。

因此,您需要传递 true,而不是将 false 传递给 UTF8Encoding ctor。

    using (var sw = new StreamWriter("text.txt", new UTF8Encoding(true)))

尝试下面的代码,在十六进制编辑器中打开生成的文件,看看哪个包含 BOM,哪个不包含。

class Program
{
    static void Main(string[] args)
    {
        const string nobomtxt = "nobom.txt";
        File.Delete(nobomtxt);

        using (Stream stream = File.OpenWrite(nobomtxt))
        using (var writer = new StreamWriter(stream, new UTF8Encoding(false)))
        {
            writer.WriteLine("HelloПривет");
        }

        const string bomtxt = "bom.txt";
        File.Delete(bomtxt);

        using (Stream stream = File.OpenWrite(bomtxt))
        using (var writer = new StreamWriter(stream, new UTF8Encoding(true)))
        {
            writer.WriteLine("HelloПривет");
        }
    }

My answer is based on HelloSam's one which contains all the necessary information.
Only I believe what OP is asking for is how to make sure that BOM is emitted into the file.

So instead of passing false to UTF8Encoding ctor you need to pass true.

    using (var sw = new StreamWriter("text.txt", new UTF8Encoding(true)))

Try the code below, open the resulting files in a hex editor and see which one contains BOM and which doesn't.

class Program
{
    static void Main(string[] args)
    {
        const string nobomtxt = "nobom.txt";
        File.Delete(nobomtxt);

        using (Stream stream = File.OpenWrite(nobomtxt))
        using (var writer = new StreamWriter(stream, new UTF8Encoding(false)))
        {
            writer.WriteLine("HelloПривет");
        }

        const string bomtxt = "bom.txt";
        File.Delete(bomtxt);

        using (Stream stream = File.OpenWrite(bomtxt))
        using (var writer = new StreamWriter(stream, new UTF8Encoding(true)))
        {
            writer.WriteLine("HelloПривет");
        }
    }
萤火眠眠 2024-10-28 11:56:48

我唯一一次看到构造函数不添加 UTF-8 BOM 是当您调用它时流不在位置 0 时。例如,在下面的代码中,没有写入 BOM:

using (var s = File.Create("test2.txt"))
{
    s.WriteByte(32);
    using (var sw = new StreamWriter(s, Encoding.UTF8))
    {
        sw.WriteLine("hello, world");
    }
}

正如其他人所说,如果您使用 StreamWriter(stream) 构造函数,而不指定编码,那么您将看不到物料清单。

The only time I've seen that constructor not add the UTF-8 BOM is if the stream is not at position 0 when you call it. For example, in the code below, the BOM isn't written:

using (var s = File.Create("test2.txt"))
{
    s.WriteByte(32);
    using (var sw = new StreamWriter(s, Encoding.UTF8))
    {
        sw.WriteLine("hello, world");
    }
}

As others have said, if you're using the StreamWriter(stream) constructor, without specifying the encoding, then you won't see the BOM.

浮生未歇 2024-10-28 11:56:48

您是否对每个文件使用相同的 StreamWriter 构造函数?因为文档说:

要使用 UTF-8 编码和 BOM 创建 StreamWriter,请考虑使用指定编码的构造函数,例如 StreamWriter(String, Boolean, Encoding)。

不久前我也遇到过类似的情况。我最终使用了 Stream.Write方法而不是 StreamWriter,并在写入 Encoding.GetBytes(stringToWrite) 之前写入 Encoding.GetPreamble() 的结果

Do you use the same constructor of the StreamWriter for every file? Because the documentation says:

To create a StreamWriter using UTF-8 encoding and a BOM, consider using a constructor that specifies encoding, such as StreamWriter(String, Boolean, Encoding).

I was in a similar situation a while ago. I ended up using the Stream.Write method instead of the StreamWriter and wrote the result of Encoding.GetPreamble() before writing the Encoding.GetBytes(stringToWrite)

沉鱼一梦 2024-10-28 11:56:48

我发现这个答案很有用(感谢@Philipp Grathwohl 和@Nik),但就我而言,我使用 FileStream 来完成任务,因此,生成 BOM 的代码如下所示:

using (FileStream vStream = File.Create(pfilePath))
{
    // Creates the UTF-8 encoding with parameter "encoderShouldEmitUTF8Identifier" set to true
    Encoding vUTF8Encoding = new UTF8Encoding(true);
    // Gets the preamble in order to attach the BOM
    var vPreambleByte = vUTF8Encoding.GetPreamble();

    // Writes the preamble first
    vStream.Write(vPreambleByte, 0, vPreambleByte.Length);

    // Gets the bytes from text
    byte[] vByteData = vUTF8Encoding.GetBytes(pTextToSaveToFile);
    vStream.Write(vByteData, 0, vByteData.Length);
    vStream.Close();
}

I found this answer useful (thanks to @Philipp Grathwohl and @Nik), but in my case I'm using FileStream to accomplish the task, so, the code that generates the BOM goes like this:

using (FileStream vStream = File.Create(pfilePath))
{
    // Creates the UTF-8 encoding with parameter "encoderShouldEmitUTF8Identifier" set to true
    Encoding vUTF8Encoding = new UTF8Encoding(true);
    // Gets the preamble in order to attach the BOM
    var vPreambleByte = vUTF8Encoding.GetPreamble();

    // Writes the preamble first
    vStream.Write(vPreambleByte, 0, vPreambleByte.Length);

    // Gets the bytes from text
    byte[] vByteData = vUTF8Encoding.GetBytes(pTextToSaveToFile);
    vStream.Write(vByteData, 0, vByteData.Length);
    vStream.Close();
}
你的心境我的脸 2024-10-28 11:56:48

似乎如果文件已经存在并且不包含 BOM,那么在覆盖时它不会包含 BOM,换句话说 StreamWriter 在覆盖文件时保留 BOM(或不存在)。

Seems that if the file already existed and didn't contain BOM, then it won't contain BOM when overwritten, in other words StreamWriter preserves BOM (or it's absence) when overwriting a file.

像极了他 2024-10-28 11:56:48

你能展示一下它不产生它的情况吗?我能找到的唯一不存在序言的情况是,没有任何东西写给作者(吉姆·米歇尔似乎找到了另一个,合乎逻辑的,更可能是你的问题,请参阅它的答案)。

我的测试代码:

var stream = new MemoryStream();
using(var writer = new StreamWriter(stream, System.Text.Encoding.UTF8))
{
    writer.Write('a');
}
Console.WriteLine(stream.ToArray()
    .Select(b => b.ToString("X2"))
    .Aggregate((i, a) => i + " " + a)
    );

Could you please show a situation where it don't produce it ? The only case where the preamble isn't present that I can find is when nothing is ever written to the writer (Jim Mischel seem to have find an other, logical and more likely to be your problem, see it's answer).

My test code :

var stream = new MemoryStream();
using(var writer = new StreamWriter(stream, System.Text.Encoding.UTF8))
{
    writer.Write('a');
}
Console.WriteLine(stream.ToArray()
    .Select(b => b.ToString("X2"))
    .Aggregate((i, a) => i + " " + a)
    );
帅哥哥的热头脑 2024-10-28 11:56:48

阅读完SteamWriter的源代码后,您需要确保您正在创建一个新文件,然后字节顺序标记将添加到该文件中。
https://github.com/dotnet/runtime/blob/6ef4b2e7aba70c514d85c2b43eac1616216bea55/src/libraries/System.Private.CoreLib/src/System/IO/StreamWriter.cs#L267
Flush方法中的代码

if (!_haveWrittenPreamble)
{
_haveWrittenPreamble = true;
ReadOnlySpan 前导码 = _encoding.Preamble;
if (前导码长度 > 0)
{
_stream.Write(前导码);
}
}

https://github.com/dotnet/runtime/blob/6ef4b2e7aba70c514d85c2b43eac1616216bea55/src/libraries/System.Private.CoreLib/src/System/IO/StreamWriter.cs#L129" rel="nofollow noreferrer" >https://github.com/dotnet/runtime/blob/6ef4b2e7aba70c514d85c2b43eac1616216bea55/src/libraries/System.Private.CoreLib/src/System/IO/StreamWriter.cs#L129
代码设置_haveWrittenPreamble的值

// 如果我们要附加到已有数据的 Stream,则不要

// 序言。
if (_stream.CanSeek && _stream.Position > 0)
{
_haveWrittenPreamble = true;
}

After reading the source code of SteamWriter, you need to make sure you are creating a new file, then the byte order mark will add to the file.
https://github.com/dotnet/runtime/blob/6ef4b2e7aba70c514d85c2b43eac1616216bea55/src/libraries/System.Private.CoreLib/src/System/IO/StreamWriter.cs#L267
Code in Flush method

if (!_haveWrittenPreamble)
{
_haveWrittenPreamble = true;
ReadOnlySpan preamble = _encoding.Preamble;
if (preamble.Length > 0)
{
_stream.Write(preamble);
}
}

https://github.com/dotnet/runtime/blob/6ef4b2e7aba70c514d85c2b43eac1616216bea55/src/libraries/System.Private.CoreLib/src/System/IO/StreamWriter.cs#L129
Code set the value of _haveWrittenPreamble

// If we're appending to a Stream that already has data, don't
write
// the preamble.
if (_stream.CanSeek && _stream.Position > 0)
{
_haveWrittenPreamble = true;
}

ま昔日黯然 2024-10-28 11:56:48

使用 Encoding.Default 而不是 Encoding.UTF8 解决了我的问题

using Encoding.Default instead of Encoding.UTF8 solved my problem

已下线请稍等 2024-10-28 11:56:48

当未使用 FileStream 且未指定编码时,文件将以 ANSI 写入,除非存在非英语字符,然后将其转换为不带 BOM 的 UTF-8。

StreamWriter writer = new StreamWriter("C:\\file.txt");

添加UTF-8编码将创建并写入带有BOM的文件。没有 BOM 的现有文件在覆盖时将添加 BOM。 false 表示追加

StreamWriter writer = new StreamWriter("C:\\file.txt", false, Encoding.UTF8);

When FileStream is not used and encoding is not specified, file is written in ANSI unless there's a non-english character then it's converted to UTF-8 without BOM.

StreamWriter writer = new StreamWriter("C:\\file.txt");

Adding UTF-8 encoding will create and write file with BOM. Existing file without BOM will have BOM added when overwritten. false means append

StreamWriter writer = new StreamWriter("C:\\file.txt", false, Encoding.UTF8);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文