使用 MSXML 以 UTF-8 格式保存 XML

发布于 2024-08-27 16:42:34 字数 647 浏览 4 评论 0原文

我正在尝试加载一个简单的 Xml 文件(以 UTF-8 编码):

<?xml version="1.0" encoding="UTF-8"?>
<Test/>

并在 vbscript 中使用 MSXML 保存它:

Set xmlDoc = CreateObject("MSXML2.DOMDocument.6.0")

xmlDoc.Load("C:\test.xml")

xmlDoc.Save "C:\test.xml" 

问题是,MSXML 以 ANSI 而不是 UTF-8 保存文件(尽管原始文件以 UTF-8 编码) 8).

MSXML 的 MSDN 文档 说 save( ) 将以 XML 定义的任何编码写入文件:

字符编码基于 XML 声明中的编码属性,例如 .当未指定编码属性时,默认设置为 UTF-8。

但这显然至少在我的机器上不起作用。

MSXML如何保存为UTF-8?

I'm trying to load a simple Xml file (encoded in UTF-8):

<?xml version="1.0" encoding="UTF-8"?>
<Test/>

And save it with MSXML in vbscript:

Set xmlDoc = CreateObject("MSXML2.DOMDocument.6.0")

xmlDoc.Load("C:\test.xml")

xmlDoc.Save "C:\test.xml" 

The problem is, MSXML saves file in ANSI instead of UTF-8 (despite the original file being encoded in UTF-8).

The MSDN docs for MSXML says that save() will write the file in whatever encoding the XML is defined in:

Character encoding is based on the encoding attribute in the XML declaration, such as . When no encoding attribute is specified, the default setting is UTF-8.

But this is clearly not working at least on my machine.

How can MSXML save in UTF-8?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

囍笑 2024-09-03 16:42:34

您可以使用 MSXML 中的另外两个类将正确编码的 XML 写出到输出流。

这是我写入通用 IStream 的辅助方法:

class procedure TXMLHelper.WriteDocumentToStream(const Document60: IXMLDOMDocument2; const stream: IStream; Encoding: string = 'UTF-8');
var
    writer: IMXWriter;
    reader: IVBSAXXMLReader;
begin
{
    From http://support.microsoft.com/kb/275883
    INFO: XML Encoding and DOM Interface Methods

    MSXML has native support for the following encodings:
        UTF-8
        UTF-16
        UCS-2
        UCS-4
        ISO-10646-UCS-2
        UNICODE-1-1-UTF-8
        UNICODE-2-0-UTF-16
        UNICODE-2-0-UTF-8

    It also recognizes (internally using the WideCharToMultibyte API function for mappings) the following encodings:
        US-ASCII
        ISO-8859-1
        ISO-8859-2
        ISO-8859-3
        ISO-8859-4
        ISO-8859-5
        ISO-8859-6
        ISO-8859-7
        ISO-8859-8
        ISO-8859-9
        WINDOWS-1250
        WINDOWS-1251
        WINDOWS-1252
        WINDOWS-1253
        WINDOWS-1254
        WINDOWS-1255
        WINDOWS-1256
        WINDOWS-1257
        WINDOWS-1258
}

    if Document60 = nil then
        raise Exception.Create('TXMLHelper.WriteDocument: Document60 cannot be nil');
    if stream = nil then
        raise Exception.Create('TXMLHelper.WriteDocument: stream cannot be nil');

    // Set properties on the XML writer - including BOM, XML declaration and encoding
    writer := CoMXXMLWriter60.Create;
    writer.byteOrderMark := True; //Determines whether to write the Byte Order Mark (BOM). The byteOrderMark property has no effect for BSTR or DOM output. (Default True)
    writer.omitXMLDeclaration := False; //Forces the IMXWriter to skip the XML declaration. Useful for creating document fragments. (Default False)
    writer.encoding := Encoding; //Sets and gets encoding for the output. (Default "UTF-16")
    writer.indent := True; //Sets whether to indent output. (Default False)
    writer.standalone := True;

    // Set the XML writer to the SAX content handler.
    reader := CoSAXXMLReader60.Create;
    reader.contentHandler := writer as IVBSAXContentHandler;
    reader.dtdHandler := writer as IVBSAXDTDHandler;
    reader.errorHandler := writer as IVBSAXErrorHandler;
    reader.putProperty('http://xml.org/sax/properties/lexical-handler', writer);
    reader.putProperty('http://xml.org/sax/properties/declaration-handler', writer);


    writer.output := stream; //The resulting document will be written into the provided IStream

    // Now pass the DOM through the SAX handler, and it will call the writer
    reader.parse(Document60);

    writer.flush;
end;

为了保存到文件,我使用 FileStream 调用 Stream 版本:

class procedure TXMLHelper.WriteDocumentToFile(const Document60: IXMLDOMDocument2; const filename: string; Encoding: string='UTF-8');
var
    fs: TFileStream;
begin
    fs := TFileStream.Create(filename, fmCreate or fmShareDenyWrite);
    try
        TXMLHelper.WriteDocumentToStream(Document60, fs, Encoding);
    finally
        fs.Free;
    end;
end;

您可以将函数转换为任何语言你喜欢。这些是德尔福。

You use two other classes in MSXML to write out XML properly encoded to an output stream.

Here is my helper method that writes to a generic IStream:

class procedure TXMLHelper.WriteDocumentToStream(const Document60: IXMLDOMDocument2; const stream: IStream; Encoding: string = 'UTF-8');
var
    writer: IMXWriter;
    reader: IVBSAXXMLReader;
begin
{
    From http://support.microsoft.com/kb/275883
    INFO: XML Encoding and DOM Interface Methods

    MSXML has native support for the following encodings:
        UTF-8
        UTF-16
        UCS-2
        UCS-4
        ISO-10646-UCS-2
        UNICODE-1-1-UTF-8
        UNICODE-2-0-UTF-16
        UNICODE-2-0-UTF-8

    It also recognizes (internally using the WideCharToMultibyte API function for mappings) the following encodings:
        US-ASCII
        ISO-8859-1
        ISO-8859-2
        ISO-8859-3
        ISO-8859-4
        ISO-8859-5
        ISO-8859-6
        ISO-8859-7
        ISO-8859-8
        ISO-8859-9
        WINDOWS-1250
        WINDOWS-1251
        WINDOWS-1252
        WINDOWS-1253
        WINDOWS-1254
        WINDOWS-1255
        WINDOWS-1256
        WINDOWS-1257
        WINDOWS-1258
}

    if Document60 = nil then
        raise Exception.Create('TXMLHelper.WriteDocument: Document60 cannot be nil');
    if stream = nil then
        raise Exception.Create('TXMLHelper.WriteDocument: stream cannot be nil');

    // Set properties on the XML writer - including BOM, XML declaration and encoding
    writer := CoMXXMLWriter60.Create;
    writer.byteOrderMark := True; //Determines whether to write the Byte Order Mark (BOM). The byteOrderMark property has no effect for BSTR or DOM output. (Default True)
    writer.omitXMLDeclaration := False; //Forces the IMXWriter to skip the XML declaration. Useful for creating document fragments. (Default False)
    writer.encoding := Encoding; //Sets and gets encoding for the output. (Default "UTF-16")
    writer.indent := True; //Sets whether to indent output. (Default False)
    writer.standalone := True;

    // Set the XML writer to the SAX content handler.
    reader := CoSAXXMLReader60.Create;
    reader.contentHandler := writer as IVBSAXContentHandler;
    reader.dtdHandler := writer as IVBSAXDTDHandler;
    reader.errorHandler := writer as IVBSAXErrorHandler;
    reader.putProperty('http://xml.org/sax/properties/lexical-handler', writer);
    reader.putProperty('http://xml.org/sax/properties/declaration-handler', writer);


    writer.output := stream; //The resulting document will be written into the provided IStream

    // Now pass the DOM through the SAX handler, and it will call the writer
    reader.parse(Document60);

    writer.flush;
end;

In order to save to a file i call the Stream version with a FileStream:

class procedure TXMLHelper.WriteDocumentToFile(const Document60: IXMLDOMDocument2; const filename: string; Encoding: string='UTF-8');
var
    fs: TFileStream;
begin
    fs := TFileStream.Create(filename, fmCreate or fmShareDenyWrite);
    try
        TXMLHelper.WriteDocumentToStream(Document60, fs, Encoding);
    finally
        fs.Free;
    end;
end;

You can convert the functions to whatever language you like. These are Delphi.

静待花开 2024-09-03 16:42:34

XML 文件中没有任何非 ANSI 文本,因此无论是 UTF-8 还是 ASCII 编码,它都是相同的。在我的测试中,将非 ASCII 文本添加到 test.xml 后,MSXML 始终以 UTF-8 编码保存,并且如果一开始就有 BOM,也会写入 BOM。

http://en.wikipedia.org/wiki/UTF-8
http://en.wikipedia.org/wiki/Byte_order_mark

There isn't any non-ANSI text in your XML file, so it will be identical whether UTF-8 or ASCII encoded. In my tests, after adding non-ASCII text to test.xml, MSXML always saves in UTF-8 encoding and also writes the BOM if there was one to begin with.

http://en.wikipedia.org/wiki/UTF-8
http://en.wikipedia.org/wiki/Byte_order_mark

手长情犹 2024-09-03 16:42:34

当您执行 load 时,msxml 不会将处理指令中的编码复制到创建的文档中。所以它不包含任何编码,看起来 msxml 选择了它喜欢的东西。在我的环境中,它是我不喜欢的 UTF-16。

解决方案是提供处理指令并在那里指定编码。如果您知道该文档没有处理指令,则代码很简单:

Set pi = xmlDoc.createProcessingInstruction("xml", _
         "version=""1.0"" encoding=""windows-1250""")
If xmlDoc.childNodes.Length > 0 Then
  Call xmlDoc.insertBefore(pi, xmlDoc.childNodes.Item(0))
End If

如果该文档可能包含其他处理指令,则必须首先将其删除(因此下面的代码必须位于上面的代码之前)。我不知道如何使用 selectNode 来做到这一点,所以我只是迭代了所有根节点:

For ich=xmlDoc.childNodes.Length-1 to 0 step -1
  Set ch = xmlDoc.childNodes.Item(ich)
  If ch.NodeTypeString = "processinginstruction" and ch.NodeName = "xml" Then
    xmlDoc.removeChild(ch)
  End If
Next ich

抱歉,如果代码没有直接执行,因为我修改了工作版本,这是这样写的在自定义的东西中,而不是 vbscript 中。

When you perform load msxml does not copy encoding from the processing instruction into the created document. So it doesn't contain any encoding and seems like msxml chooses something which it likes. In my environment it's UTF-16 which I don't prefer.

The solution is to provide processing instructions and specify encoding there. If you know that the document has no processing instructions, the code is trivial:

Set pi = xmlDoc.createProcessingInstruction("xml", _
         "version=""1.0"" encoding=""windows-1250""")
If xmlDoc.childNodes.Length > 0 Then
  Call xmlDoc.insertBefore(pi, xmlDoc.childNodes.Item(0))
End If

If it's possible, that the document contained other processing instruction, it must be removed first (so the code below must come before the code above). I don't know how to use selectNode to do it, so I just iterated all root nodes:

For ich=xmlDoc.childNodes.Length-1 to 0 step -1
  Set ch = xmlDoc.childNodes.Item(ich)
  If ch.NodeTypeString = "processinginstruction" and ch.NodeName = "xml" Then
    xmlDoc.removeChild(ch)
  End If
Next ich

Sorry if the code doesn't execute directly, because I modified working version, that was written in something custom, not vbscript.

つ可否回来 2024-09-03 16:42:34

此解决方法似乎有效:将 xml 字符串传递给 ADODB.Stream 以保存文件。

更新:处理指令按照以下方式丢失:

https:// learn.microsoft.com/en-us/previous-versions/windows/desktop/ms755989(v=vs.85)
xml 属性始终返回 Unicode 字符串。也就是说,DOMDocument 的 xml 属性将文档从其原始编码转换为 Unicode。这样一来,原来的编码属性就被删除了。

Sub Facturae()
On Error GoTo ExceptionHandling

Dim i As Integer

'Declare document objects
Dim xDoc As MSXML2.DOMDocument60
Dim xRoot As MSXML2.IXMLDOMElement

'Create new DOMDocument
Set xDoc = New DOMDocument60

'https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms755989(v=vs.85
'Add the XML declaration as a processing instruction:
'Dim xmlDecl As MSXML2.IXMLDOMProcessingInstruction
'Set xmlDecl = xDoc.createProcessingInstruction("xml", "version='1.0' encoding='UTF-8' standalone='yes'")
'xDoc.appendChild xmlDecl

'Create the root element
Set xRoot = xDoc.createElement("fe:Facturae")
xDoc.appendChild xRoot

'The namespace declarations are attributes on the root element, so you can add them using:
xRoot.setAttribute "xmlns:ds", "http://www.w3.org/2000/09/xmldsig#"
xRoot.setAttribute "xmlns:fe", "http://www.facturae.es/Facturae/2009/v3.2/Facturae"
'xDoc.DocumentElement.setAttribute "xmlns:ds", "http://www.w3.org/2000/09/xmldsig#"
'xDoc.DocumentElement.setAttribute "xmlns:fe", "http://www.facturae.es/Facturae/2009/v3.2/Facturae"

'Add child to root
'Create security element
Dim objSecElem As MSXML2.IXMLDOMElement
Set objSecElem = xDoc.createElement("Security")
xRoot.appendChild objSecElem

Dim str(1 To 3) As String
str(1) = "A"
str(2) = "B"
str(3) = "C"

Dim objProp As IXMLDOMElement
For i = 1 To UBound(str)
    Set objProp = xDoc.createElement(str(i))
    objSecElem.appendChild objProp
    objProp.Text = i
Next i

Dim objStream As Stream, strData As String, sFilePath As String
'Debug.Print xDoc.XML
strData = xDoc.XML
sFilePath = ThisWorkbook.Path & "\my_file.xml"

Set objStream = CreateObject("ADODB.Stream")
objStream.Type = adTypeText
objStream.charset = "utf-8"
objStream.LineSeparator = adCRLF
objStream.Open
objStream.WriteText "<?xml version='1.0' encoding='utf-8' standalone='yes'?>", adWriteLine
objStream.WriteText strData, adWriteChar
objStream.SaveToFile sFilePath, adSaveCreateOverWrite

'Save the XML file
'xDoc.Save sFilePath

CleanUp:
    On Error Resume Next
    objStream.Close
    Exit Sub
ExceptionHandling:
    MsgBox "Error: " & Err.Description
    Resume CleanUp
    'https://stackoverflow.com/questions/52205786/excel-vba-global-variables-are-assigned-when-workbook-is-opened-get-erased-if?rq=1
    'error handler
    'Note the unreachable Resume at the end of the error handler. It's unreachable because Resume ExitProcedure will always execute first.
    'So when you get the messagebox, you can use ctrl+break to break into code, which will then take you to the Resume ExitProcedure.
    'You can then drag the yellow arrow over to the unreachable Resume, press F8 to step once which will then take you back to the line that caused the error.
    Resume 'for debugging
End Sub

This workaround seems to work: passing the xml string to ADODB.Stream to save the file.

Update: the processing instruction get lost according to this:

https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms755989(v=vs.85)
The xml property always returns a Unicode string. That is, the xml property for DOMDocument converts the document from its original encoding to Unicode. As a result, the original encoding attribute is removed.

Sub Facturae()
On Error GoTo ExceptionHandling

Dim i As Integer

'Declare document objects
Dim xDoc As MSXML2.DOMDocument60
Dim xRoot As MSXML2.IXMLDOMElement

'Create new DOMDocument
Set xDoc = New DOMDocument60

'https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms755989(v=vs.85
'Add the XML declaration as a processing instruction:
'Dim xmlDecl As MSXML2.IXMLDOMProcessingInstruction
'Set xmlDecl = xDoc.createProcessingInstruction("xml", "version='1.0' encoding='UTF-8' standalone='yes'")
'xDoc.appendChild xmlDecl

'Create the root element
Set xRoot = xDoc.createElement("fe:Facturae")
xDoc.appendChild xRoot

'The namespace declarations are attributes on the root element, so you can add them using:
xRoot.setAttribute "xmlns:ds", "http://www.w3.org/2000/09/xmldsig#"
xRoot.setAttribute "xmlns:fe", "http://www.facturae.es/Facturae/2009/v3.2/Facturae"
'xDoc.DocumentElement.setAttribute "xmlns:ds", "http://www.w3.org/2000/09/xmldsig#"
'xDoc.DocumentElement.setAttribute "xmlns:fe", "http://www.facturae.es/Facturae/2009/v3.2/Facturae"

'Add child to root
'Create security element
Dim objSecElem As MSXML2.IXMLDOMElement
Set objSecElem = xDoc.createElement("Security")
xRoot.appendChild objSecElem

Dim str(1 To 3) As String
str(1) = "A"
str(2) = "B"
str(3) = "C"

Dim objProp As IXMLDOMElement
For i = 1 To UBound(str)
    Set objProp = xDoc.createElement(str(i))
    objSecElem.appendChild objProp
    objProp.Text = i
Next i

Dim objStream As Stream, strData As String, sFilePath As String
'Debug.Print xDoc.XML
strData = xDoc.XML
sFilePath = ThisWorkbook.Path & "\my_file.xml"

Set objStream = CreateObject("ADODB.Stream")
objStream.Type = adTypeText
objStream.charset = "utf-8"
objStream.LineSeparator = adCRLF
objStream.Open
objStream.WriteText "<?xml version='1.0' encoding='utf-8' standalone='yes'?>", adWriteLine
objStream.WriteText strData, adWriteChar
objStream.SaveToFile sFilePath, adSaveCreateOverWrite

'Save the XML file
'xDoc.Save sFilePath

CleanUp:
    On Error Resume Next
    objStream.Close
    Exit Sub
ExceptionHandling:
    MsgBox "Error: " & Err.Description
    Resume CleanUp
    'https://stackoverflow.com/questions/52205786/excel-vba-global-variables-are-assigned-when-workbook-is-opened-get-erased-if?rq=1
    'error handler
    'Note the unreachable Resume at the end of the error handler. It's unreachable because Resume ExitProcedure will always execute first.
    'So when you get the messagebox, you can use ctrl+break to break into code, which will then take you to the Resume ExitProcedure.
    'You can then drag the yellow arrow over to the unreachable Resume, press F8 to step once which will then take you back to the line that caused the error.
    Resume 'for debugging
End Sub
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文