使用 MSXML 的 XSLT 转换未使用正确的编码
我正在使用 IXMLDOMDocument::transformNode
从 MSXML 3.0 开始应用 XSLT 转换。每个转换都有一个 xsl:output
指令,指定 UTF-8
作为编码。例如,
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
...
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:str="http://exslt.org/strings"
xmlns:math="http://exslt.org/math"
extension-element-prefixes="str math">
<xsl:output encoding="UTF-8" indent="yes" method="xml" />
...
</xsl:stylesheet>
然而转换后的结果始终是 UTF-16
(编码属性为 UTF-16
)。
<?xml version="1.0" encoding="UTF-16"?>
这是 MSXML 中的错误吗?
由于各种原因,我真的很想要 UTF-8
。有解决方法吗?或者我是否必须自己将转换后的结果转换为 UTF-8
并修补编码属性?
更新:我已经通过接受 UTF-16
编码并在前面添加字节顺序标记来解决这个问题,这满足了转换结果的下游用户的需求,但是我我仍然对如何获取 UTF-8
输出感兴趣。
I'm using IXMLDOMDocument::transformNode
from MSXML 3.0 to apply XSLT transforms. Each of the transforms has an xsl:output
directive that specifies UTF-8
as the encoding. For example,
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
...
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:str="http://exslt.org/strings"
xmlns:math="http://exslt.org/math"
extension-element-prefixes="str math">
<xsl:output encoding="UTF-8" indent="yes" method="xml" />
...
</xsl:stylesheet>
Yet the transformed result is always UTF-16
(and the encoding attribute says UTF-16
).
<?xml version="1.0" encoding="UTF-16"?>
Is this a bug in MSXML?
For various reasons, I'd really like to have UTF-8
. Is there a workaround? Or do I have to convert the transformed result to UTF-8
myself and patch up the encoding attribute?
Update: I've worked around the problem by accepting the UTF-16
encoding and prepending a byte-order mark, which satisfies the downstream users of the transformed result, but I'm still be interested in how to get UTF-8
output.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可能将输出发送到 DOM 树或字符流,而不是字节流。如果是这种情况,那么就不是 MSXML 进行编码,并且最终编码所做的任何事情都不了解 xsl:output 指令(或者实际上是 XSLT)。
You're probably sending the ouput to a DOM tree or to a character stream, not to a byte stream. If that's the case then it's not MSXML that's doing the encoding, and whatever does do the final encoding has no knowledge of the xsl:output directive (or indeed, of XSLT).
补充 Michael Kay 所说的内容(当然,这是正确的),这里有一个 JScript 示例,说明如何在过程中使用 XSLT 序列化转换为流:
您可以使用此输入进行测试:
还有此样式表:
我认为它会您可以轻松地将 JScript 示例改编为 C++。
Supplementing what Michael Kay said (which is spot on, of course), here's a JScript example how to transform to a stream, using the XSLT serialization in the process:
You may test using this input:
And this stylesheet:
I think it'll be easy for you to adapt the JScript example to C++.
正如您所指出的,BSTR 都是 UTF-16。然而,我认为迈克尔·路德维希可能在这里有所发现。您尝试过使用这种方法吗?
您应该能够仅使用 CreateStreamOnHGlobal,将生成的 IStream ptr 存储到 VARIANT 中,并将其作为 outputObject 参数传递。理论上来说。不过我还没有真正尝试过:)
As you noted, BSTRs are all UTF-16. However, I think Michael Ludwig might be on to something here. Have you tried using this method?
You should be able to just use CreateStreamOnHGlobal, stash the resultant IStream ptr into a VARIANT, and pass that in as the outputObject parameter. Theoretically. I haven't actually tried this, though :)