MSXMLWriter60 不输出 UTF-16 编码的 byteOrderMark
我正在使用 “How to make XMLDOMDocument include”中看到的代码变体XML 声明?”(也可以在 MSDN。如果我将编码更改为“UTF-16”,人们会认为它会输出为 UTF-16...并且它“确实”...通过查看文本中的输出 ;但在十六进制编辑器中检查,字节顺序标记丢失(尽管属性设置为 true),并且 XML 编辑器将文档视为无效 UTF-16,因为缺少 BOM
编辑器
'' # Create and load a DOMDocument object.
Dim xmlDoc As New DOMDocument60
xmlDoc.loadXML("<doc><one>test1</one><two>test2</two></doc>")
'' # Set properties on the XML writer - including BOM, XML declaration and encoding
Dim wrt As New MXXMLWriter60
wrt.byteOrderMark = True
wrt.omitXMLDeclaration = False
wrt.encoding = "UTF-16"
wrt.indent = False
'' # Set the XML writer to the SAX content handler.
Dim rdr As New SAXXMLReader60
Set rdr.contentHandler = wrt
Set rdr.dtdHandler = wrt
Set rdr.errorHandler = wrt
rdr.putProperty "http://xml.org/sax/properties/lexical-handler", wrt
rdr.putProperty "http://xml.org/sax/properties/declaration-handler", wrt
'' # Now pass the DOM through the SAX handler, and it will call the writer
rdr.parse xmlDoc
'' # Let the writer do its thing
Dim iFileNo As Integer
iFileNo = FreeFile
Open App.Path + "\saved.xml" For Output As #iFileNo
Print #iFileNo, wrt.output
Close #iFileNo
。输出看起来像:
<?xml version="1.0" encoding="UTF-16" standalone="no"?>
<doc><one>test1</one><two>test2</two></doc>
为什么我使用 VB6? 它实际上是 VBA(同一代,VB6 的小子集),用作 EMC-Captiva 的 InputAccel/FormWare 的脚本语言,因此无法进行切换。
I'm using a variant on code seen in "How to make XMLDOMDocument include the XML Declaration?" (which can also be seen at MSDN. If I change the encoding to "UTF-16" one would think it would output as UTF-16... and it "does"... by looking at the output in a text editor; but checking it in a hex editor, the byte-order mark is missing (despite the property being set to true), and XML editors reject the document as invalid UTF-16, for the missing BOM.
What am I overlooking?
'' # Create and load a DOMDocument object.
Dim xmlDoc As New DOMDocument60
xmlDoc.loadXML("<doc><one>test1</one><two>test2</two></doc>")
'' # Set properties on the XML writer - including BOM, XML declaration and encoding
Dim wrt As New MXXMLWriter60
wrt.byteOrderMark = True
wrt.omitXMLDeclaration = False
wrt.encoding = "UTF-16"
wrt.indent = False
'' # Set the XML writer to the SAX content handler.
Dim rdr As New SAXXMLReader60
Set rdr.contentHandler = wrt
Set rdr.dtdHandler = wrt
Set rdr.errorHandler = wrt
rdr.putProperty "http://xml.org/sax/properties/lexical-handler", wrt
rdr.putProperty "http://xml.org/sax/properties/declaration-handler", wrt
'' # Now pass the DOM through the SAX handler, and it will call the writer
rdr.parse xmlDoc
'' # Let the writer do its thing
Dim iFileNo As Integer
iFileNo = FreeFile
Open App.Path + "\saved.xml" For Output As #iFileNo
Print #iFileNo, wrt.output
Close #iFileNo
The output looks like:
<?xml version="1.0" encoding="UTF-16" standalone="no"?>
<doc><one>test1</one><two>test2</two></doc>
Why am I using VB6? It's actually in VBA (same generation, slight subset of VB6), used as the scripting-language for EMC-Captiva's InputAccel/FormWare, so switching is not an option.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
问题是,当您从 writer 的输出属性检索值时,您将得到一个字符串。由于 VB 中的字符串始终是 UTF-16,因此无论编码如何,您都会得到这样的结果。由于字符串在 VB 中始终为 UTF-16,因此不存在它们需要 BOM 的概念,因此也不包含在内。
当 IStream 的实现分配到输出属性时,编码和 BOM 属性仅影响编写器编写 XML 的方式。
尝试围绕解析调用修改代码,如下所示:-
这应该生成所需的输出。
The problem is that when you retrieve a value from the writer's output property you will get a string. Since strings in VB are always UTF-16 thats what you get regardless of the encoding. Since strings are always UTF-16 in VB there is no notion of them needing a BOM so that isn't included either.
The encoding and the BOM properties only affect how the writer will write the XML when an implementation of IStream is assigned to the output property.
Try modifying you code around the call to parse as follows:-
This should generate the desired output.