写入没有字节顺序标记 (BOM) 的文本文件?

发布于 2024-08-25 08:46:42 字数 842 浏览 7 评论 0原文

我正在尝试使用 VB.Net 创建一个文本文件,采用 UTF8 编码,不带 BOM。谁能帮助我,该怎么做?
我可以使用 UTF8 编码写入文件,但是如何从中删除字节顺序标记?

编辑1: 我尝试过这样的代码;

    Dim utf8 As New UTF8Encoding()
    Dim utf8EmitBOM As New UTF8Encoding(True)
    Dim strW As New StreamWriter("c:\temp\bom\1.html", True, utf8EmitBOM)
    strW.Write(utf8EmitBOM.GetPreamble())
    strW.WriteLine("hi there")
    strW.Close()

        Dim strw2 As New StreamWriter("c:\temp\bom\2.html", True, utf8)
        strw2.Write(utf8.GetPreamble())
        strw2.WriteLine("hi there")
        strw2.Close()

1.html 仅使用 UTF8 编码创建,2.html 使用 ANSI 编码格式创建。

简化方法 - http://whatilearnttuday.blogspot。 com/2011/10/write-text-files-without-byte-order.html

I am trying to create a text file using VB.Net with UTF8 encoding, without BOM. Can anybody help me, how to do this?

I can write file with UTF8 encoding but, how to remove Byte Order Mark from it?

edit1:
I have tried code like this;

    Dim utf8 As New UTF8Encoding()
    Dim utf8EmitBOM As New UTF8Encoding(True)
    Dim strW As New StreamWriter("c:\temp\bom\1.html", True, utf8EmitBOM)
    strW.Write(utf8EmitBOM.GetPreamble())
    strW.WriteLine("hi there")
    strW.Close()

        Dim strw2 As New StreamWriter("c:\temp\bom\2.html", True, utf8)
        strw2.Write(utf8.GetPreamble())
        strw2.WriteLine("hi there")
        strw2.Close()

1.html get created with UTF8 encoding only and 2.html get created with ANSI encoding format.

Simplified approach - http://whatilearnttuday.blogspot.com/2011/10/write-text-files-without-byte-order.html

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

静赏你的温柔 2024-09-01 08:46:42

为了省略字节顺序标记 (BOM),您的流必须使用 UTF8Encoding 除外,System.Text.Encoding.UTF8(配置为生成 BOM)。有两种简单的方法可以做到这一点:

1.显式指定合适的编码:

  1. 调用UTF8Encoding 构造函数,其中 encoderShouldEmitUTF8Identifier 参数为 False

  2. UTF8Encoding 实例传递给流构造函数。

' VB.NET:
Dim utf8WithoutBom As New System.Text.UTF8Encoding(False)
Using sink As New StreamWriter("Foobar.txt", False, utf8WithoutBom)
    sink.WriteLine("...")
End Using
// C#:
var utf8WithoutBom = new System.Text.UTF8Encoding(false);
using (var sink = new StreamWriter("Foobar.txt", false, utf8WithoutBom))
{
    sink.WriteLine("...");
}

2.使用默认编码:

如果您根本不向 StreamWriter 的构造函数提供 EncodingStreamWriter 将默认使用没有 BOM 的 UTF8 编码,因此以下内容应该同样有效:

' VB.NET:
Using sink As New StreamWriter("Foobar.txt")
    sink.WriteLine("...")
End Using
// C#:
using (var sink = new StreamWriter("Foobar.txt"))
{
    sink.WriteLine("...");
}

最后,请注意,仅 UTF-8 允许省略 BOM,而 UTF-16 则不允许。

In order to omit the byte order mark (BOM), your stream must use an instance of UTF8Encoding other than System.Text.Encoding.UTF8 (which is configured to generate a BOM). There are two easy ways to do this:

1. Explicitly specifying a suitable encoding:

  1. Call the UTF8Encoding constructor with False for the encoderShouldEmitUTF8Identifier parameter.

  2. Pass the UTF8Encoding instance to the stream constructor.

' VB.NET:
Dim utf8WithoutBom As New System.Text.UTF8Encoding(False)
Using sink As New StreamWriter("Foobar.txt", False, utf8WithoutBom)
    sink.WriteLine("...")
End Using
// C#:
var utf8WithoutBom = new System.Text.UTF8Encoding(false);
using (var sink = new StreamWriter("Foobar.txt", false, utf8WithoutBom))
{
    sink.WriteLine("...");
}

2. Using the default encoding:

If you do not supply an Encoding to StreamWriter's constructor at all, StreamWriter will by default use an UTF8 encoding without BOM, so the following should work just as well:

' VB.NET:
Using sink As New StreamWriter("Foobar.txt")
    sink.WriteLine("...")
End Using
// C#:
using (var sink = new StreamWriter("Foobar.txt"))
{
    sink.WriteLine("...");
}

Finally, note that omitting the BOM is only permissible for UTF-8, not for UTF-16.

淡忘如思 2024-09-01 08:46:42

试试这个:

Encoding outputEnc = new UTF8Encoding(false); // create encoding with no BOM
TextWriter file = new StreamWriter(filePath, false, outputEnc); // open file with encoding
// write data here
file.Close(); // save and close it

Try this:

Encoding outputEnc = new UTF8Encoding(false); // create encoding with no BOM
TextWriter file = new StreamWriter(filePath, false, outputEnc); // open file with encoding
// write data here
file.Close(); // save and close it
遇到 2024-09-01 08:46:42

只需使用System.IO.File 中的WriteAllText 方法即可。

请检查 File.WriteAllText 中的示例。

此方法使用没有字节顺序标记 (BOM) 的 UTF-8 编码,因此
使用 GetPreamble 方法将返回一个空字节数组。如果是的话
需要包含 UTF-8 标识符,例如字节顺序标记,位于
文件的开头,使用 WriteAllText(String, String,
Encoding) 使用 UTF8 编码的方法重载。

Just Simply use the method WriteAllText from System.IO.File.

Please check the sample from File.WriteAllText.

This method uses UTF-8 encoding without a Byte-Order Mark (BOM), so
using the GetPreamble method will return an empty byte array. If it is
necessary to include a UTF-8 identifier, such as a byte order mark, at
the beginning of a file, use the WriteAllText(String, String,
Encoding) method overload with UTF8 encoding.

◇流星雨 2024-09-01 08:46:42

如果在创建新的 StreamWriter 使用的默认 Encoding 对象是 UTF-8 No BOM 通过 new UTF8Encoding 创建(假,真)

因此,要创建不带 BOM 的文本文件,请使用不需要提供编码的构造函数:

new StreamWriter(Stream)
new StreamWriter(String)
new StreamWriter(String, Boolean)

If you do not specify an Encoding when creating a new StreamWriter the default Encoding object used is UTF-8 No BOM which is created via new UTF8Encoding(false, true).

So to create a text file without the BOM use of of the constructors that do not require you to provide an encoding:

new StreamWriter(Stream)
new StreamWriter(String)
new StreamWriter(String, Boolean)
烟柳画桥 2024-09-01 08:46:42

与此相关的有趣说明:奇怪的是,System.IO.File 类的静态“CreateText()”方法创建 BOM 的 UTF-8 文件。

一般来说,这是错误的根源,但就您而言,这可能是最简单的解决方法:)

Interesting note with respect to this: strangely, the static "CreateText()" method of the System.IO.File class creates UTF-8 files without BOM.

In general this the source of bugs, but in your case it could have been the simplest workaround :)

紫南 2024-09-01 08:46:42

我认为罗曼·尼基丁是对的。构造函数参数的含义被颠倒了。 False 表示无 BOM,true 表示有 BOM。

您会得到 ANSI 编码,因为没有 BOM 且不包含非 ansi 字符的文件与 ANSI 文件完全相同。在“hi There”字符串中尝试一些特殊字符,您将看到 ANSI 编码更改为无 BOM。

I think Roman Nikitin is right. The meaning of the constructor argument is flipped. False means no BOM and true means with BOM.

You get an ANSI encoding because a file without a BOM that does not contain non-ansi characters is exactly the same as an ANSI file. Try some special characters in you "hi there" string and you'll see the ANSI encoding change to without-BOM.

一页 2024-09-01 08:46:42

无 BOM 的 XML 编码 UTF-8
我们需要向 EPA 提交 XML 数据,而他们接受我们输入的应用程序需要无 BOM 的 UTF-8。哦,是的,普通的 UTF-8 应该对每个人来说都是可以接受的,但对 EPA 来说却不然。这样做的答案在上面的评论中。谢谢罗曼·尼基丁

下面是 XML 编码的 C# 代码片段:

    Encoding utf8noBOM = new UTF8Encoding(false);  
    XmlWriterSettings settings = new XmlWriterSettings();  
    settings.Encoding = utf8noBOM;  
        …  
    using (XmlWriter xw = XmlWriter.Create(filePath, settings))  
    {  
        xDoc.WriteTo(xw);  
        xw.Flush();  
    }    

查看这是否实际上从输出文件中删除了三个前导字符可能会产生误导。例如,如果您使用 Notepad++ (www.notepad-plus-plus.org),它将报告“Encode in ANSI”。我猜大多数文本编辑器都是依靠 BOM 字符来判断它是否是 UTF-8。清楚地看到这一点的方法是使用像 WinHex (www.winhex.com) 这样的二进制工具。由于我正在寻找前后差异,因此我使用了 Microsoft WinDiff 应用程序。

XML Encoding UTF-8 without BOM
We need to submit XML data to the EPA and their application that takes our input requires UTF-8 without BOM. Oh yes, plain UTF-8 should be acceptable for everyone, but not for the EPA. The answer to doing this is in the above comments. Thank you Roman Nikitin.

Here is a C# snippet of the code for the XML encoding:

    Encoding utf8noBOM = new UTF8Encoding(false);  
    XmlWriterSettings settings = new XmlWriterSettings();  
    settings.Encoding = utf8noBOM;  
        …  
    using (XmlWriter xw = XmlWriter.Create(filePath, settings))  
    {  
        xDoc.WriteTo(xw);  
        xw.Flush();  
    }    

To see if this actually removes the three leading character from the output file can be misleading. For example, if you use Notepad++ (www.notepad-plus-plus.org), it will report “Encode in ANSI”. I guess most text editors are counting on the BOM characters to tell if it is UTF-8. The way to clearly see this is with a binary tool like WinHex (www.winhex.com). Since I was looking for a before and after difference I used the Microsoft WinDiff application.

爱给你人给你 2024-09-01 08:46:42

对于 VB.Net Visual Basic,其工作原理如下:

My.Computer.FileSystem.WriteAllText("FileName", Data, False, System.Text.Encoding.ASCII)

For VB.Net visual basic, this is how to make it work:

My.Computer.FileSystem.WriteAllText("FileName", Data, False, System.Text.Encoding.ASCII)
无人问我粥可暖 2024-09-01 08:46:42

您的输入文本可能包含字节顺序标记。在这种情况下,您应该在写入之前将其删除。

It might be that your input text contains a byte order mark. In that case, you should remove it before writing.

日裸衫吸 2024-09-01 08:46:42
Dim sWriter As IO.StreamWriter = New IO.StreamWriter(shareworklist & "\" & getfilename() & ".txt", False, Encoding.Default)

给你你想要的结果(我认为)。

Dim sWriter As IO.StreamWriter = New IO.StreamWriter(shareworklist & "\" & getfilename() & ".txt", False, Encoding.Default)

Gives you results as those you want(I think).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文