写入没有字节顺序标记 (BOM) 的文本文件?
我正在尝试使用 VB.Net 创建一个文本文件,采用 UTF8 编码,不带 BOM。谁能帮助我,该怎么做?
我可以使用 UTF8 编码写入文件,但是如何从中删除字节顺序标记?
编辑1: 我尝试过这样的代码;
Dim utf8 As New UTF8Encoding()
Dim utf8EmitBOM As New UTF8Encoding(True)
Dim strW As New StreamWriter("c:\temp\bom\1.html", True, utf8EmitBOM)
strW.Write(utf8EmitBOM.GetPreamble())
strW.WriteLine("hi there")
strW.Close()
Dim strw2 As New StreamWriter("c:\temp\bom\2.html", True, utf8)
strw2.Write(utf8.GetPreamble())
strw2.WriteLine("hi there")
strw2.Close()
1.html 仅使用 UTF8 编码创建,2.html 使用 ANSI 编码格式创建。
简化方法 - http://whatilearnttuday.blogspot。 com/2011/10/write-text-files-without-byte-order.html
I am trying to create a text file using VB.Net with UTF8 encoding, without BOM. Can anybody help me, how to do this?
I can write file with UTF8 encoding but, how to remove Byte Order Mark from it?
edit1:
I have tried code like this;
Dim utf8 As New UTF8Encoding()
Dim utf8EmitBOM As New UTF8Encoding(True)
Dim strW As New StreamWriter("c:\temp\bom\1.html", True, utf8EmitBOM)
strW.Write(utf8EmitBOM.GetPreamble())
strW.WriteLine("hi there")
strW.Close()
Dim strw2 As New StreamWriter("c:\temp\bom\2.html", True, utf8)
strw2.Write(utf8.GetPreamble())
strw2.WriteLine("hi there")
strw2.Close()
1.html get created with UTF8 encoding only and 2.html get created with ANSI encoding format.
Simplified approach - http://whatilearnttuday.blogspot.com/2011/10/write-text-files-without-byte-order.html
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
为了省略字节顺序标记 (BOM),您的流必须使用
UTF8Encoding
除外,System.Text.Encoding.UTF8
(配置为生成 BOM)。有两种简单的方法可以做到这一点:1.显式指定合适的编码:
调用
UTF8Encoding
构造函数,其中encoderShouldEmitUTF8Identifier
参数为False
。将
UTF8Encoding
实例传递给流构造函数。2.使用默认编码:
如果您根本不向
StreamWriter
的构造函数提供Encoding
,StreamWriter
将默认使用没有 BOM 的 UTF8 编码,因此以下内容应该同样有效:最后,请注意,仅 UTF-8 允许省略 BOM,而 UTF-16 则不允许。
In order to omit the byte order mark (BOM), your stream must use an instance of
UTF8Encoding
other thanSystem.Text.Encoding.UTF8
(which is configured to generate a BOM). There are two easy ways to do this:1. Explicitly specifying a suitable encoding:
Call the
UTF8Encoding
constructor withFalse
for theencoderShouldEmitUTF8Identifier
parameter.Pass the
UTF8Encoding
instance to the stream constructor.2. Using the default encoding:
If you do not supply an
Encoding
toStreamWriter
's constructor at all,StreamWriter
will by default use an UTF8 encoding without BOM, so the following should work just as well:Finally, note that omitting the BOM is only permissible for UTF-8, not for UTF-16.
试试这个:
Try this:
只需使用
System.IO.File
中的WriteAllText
方法即可。请检查 File.WriteAllText 中的示例。
Just Simply use the method
WriteAllText
fromSystem.IO.File
.Please check the sample from File.WriteAllText.
如果在创建新的
StreamWriter
使用的默认Encoding
对象是UTF-8 No BOM
通过new UTF8Encoding 创建(假,真)
。因此,要创建不带 BOM 的文本文件,请使用不需要提供编码的构造函数:
If you do not specify an
Encoding
when creating a newStreamWriter
the defaultEncoding
object used isUTF-8 No BOM
which is created vianew UTF8Encoding(false, true)
.So to create a text file without the BOM use of of the constructors that do not require you to provide an encoding:
与此相关的有趣说明:奇怪的是,System.IO.File 类的静态“CreateText()”方法创建 不 BOM 的 UTF-8 文件。
一般来说,这是错误的根源,但就您而言,这可能是最简单的解决方法:)
Interesting note with respect to this: strangely, the static "CreateText()" method of the System.IO.File class creates UTF-8 files without BOM.
In general this the source of bugs, but in your case it could have been the simplest workaround :)
我认为罗曼·尼基丁是对的。构造函数参数的含义被颠倒了。 False 表示无 BOM,true 表示有 BOM。
您会得到 ANSI 编码,因为没有 BOM 且不包含非 ansi 字符的文件与 ANSI 文件完全相同。在“hi There”字符串中尝试一些特殊字符,您将看到 ANSI 编码更改为无 BOM。
I think Roman Nikitin is right. The meaning of the constructor argument is flipped. False means no BOM and true means with BOM.
You get an ANSI encoding because a file without a BOM that does not contain non-ansi characters is exactly the same as an ANSI file. Try some special characters in you "hi there" string and you'll see the ANSI encoding change to without-BOM.
无 BOM 的 XML 编码 UTF-8
我们需要向 EPA 提交 XML 数据,而他们接受我们输入的应用程序需要无 BOM 的 UTF-8。哦,是的,普通的 UTF-8 应该对每个人来说都是可以接受的,但对 EPA 来说却不然。这样做的答案在上面的评论中。谢谢罗曼·尼基丁。
下面是 XML 编码的 C# 代码片段:
查看这是否实际上从输出文件中删除了三个前导字符可能会产生误导。例如,如果您使用 Notepad++ (www.notepad-plus-plus.org),它将报告“Encode in ANSI”。我猜大多数文本编辑器都是依靠 BOM 字符来判断它是否是 UTF-8。清楚地看到这一点的方法是使用像 WinHex (www.winhex.com) 这样的二进制工具。由于我正在寻找前后差异,因此我使用了 Microsoft WinDiff 应用程序。
XML Encoding UTF-8 without BOM
We need to submit XML data to the EPA and their application that takes our input requires UTF-8 without BOM. Oh yes, plain UTF-8 should be acceptable for everyone, but not for the EPA. The answer to doing this is in the above comments. Thank you Roman Nikitin.
Here is a C# snippet of the code for the XML encoding:
To see if this actually removes the three leading character from the output file can be misleading. For example, if you use Notepad++ (www.notepad-plus-plus.org), it will report “Encode in ANSI”. I guess most text editors are counting on the BOM characters to tell if it is UTF-8. The way to clearly see this is with a binary tool like WinHex (www.winhex.com). Since I was looking for a before and after difference I used the Microsoft WinDiff application.
对于 VB.Net Visual Basic,其工作原理如下:
For VB.Net visual basic, this is how to make it work:
您的输入文本可能包含字节顺序标记。在这种情况下,您应该在写入之前将其删除。
It might be that your input text contains a byte order mark. In that case, you should remove it before writing.
给你你想要的结果(我认为)。
Gives you results as those you want(I think).