使用 XSLT 转换 XML 时保留实体引用?
使用 XSLT (2.0) 转换 XML 时如何保留实体引用?对于我尝试过的所有处理器,默认情况下都会解析实体。我可以使用 xsl:character-map 来处理字符实体,但是文本实体呢?
例如,此 XML:
<!DOCTYPE doc [
<!ENTITY so "stackoverflow">
<!ENTITY question "How can I preserve the entity reference when transforming with XSLT??">
]>
<doc>
<text>Hello &so;!</text>
<text>&question;</text>
</doc>
使用以下 XSLT: 进行转换
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
会产生以下输出:
<doc>
<text>Hello stackoverflow!</text>
<text>How can I preserve the entity reference when transforming with XSLT??</text>
</doc>
输出应该类似于输入(暂时减去 doctype 声明):
<doc>
<text>Hello &so;!</text>
<text>&question;</text>
</doc>
我希望我没有通过用 &
替换所有&符号(如 &question;
)来预处理输入,然后通过替换所有符号来后处理输出&
与 &
。
也许这是特定于处理器的?我正在使用 Saxon 9。
谢谢!
How can I preserve entity references when transforming XML with XSLT (2.0)? With all of the processors I've tried, the entity gets resolved by default. I can use xsl:character-map
to handle the character entities, but what about text entities?
For example, this XML:
<!DOCTYPE doc [
<!ENTITY so "stackoverflow">
<!ENTITY question "How can I preserve the entity reference when transforming with XSLT??">
]>
<doc>
<text>Hello &so;!</text>
<text>&question;</text>
</doc>
transformed with the following XSLT:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
produces the following output:
<doc>
<text>Hello stackoverflow!</text>
<text>How can I preserve the entity reference when transforming with XSLT??</text>
</doc>
The output should look like the input (minus the doctype declaration for now):
<doc>
<text>Hello &so;!</text>
<text>&question;</text>
</doc>
I'm hoping that I don't have to pre-process the input by replacing all ampersands with &
(like &question;
) and then post-process the output by replacing all &
with &
.
Maybe this is processor specific? I'm using Saxon 9.
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
如果您知道将使用哪些实体以及如何定义它们,则可以执行以下操作(相当原始且容易出错,但仍然比没有好):
当应用于提供的 XML 时文档:
产生了想要的结果:
请注意:
替换中的特殊(RegEx)字符必须进行转义。
我们需要解析为 DOE,但不建议这样做,因为它违反了 XSLT 架构和处理模型的原则 - 换句话说,这个解决方案是一个令人讨厌的 hack。
If you know what entities will be used and how they are defined, you can do the following (quite primitive and error-prone, but still better than nothing):
when applied on the provided XML document:
the wanted result is produced:
Do note:
The special (RegEx) characters in the replacements must be escaped.
We needed to resolve to DOE, which isn't recommended, because it violates the principles of the XSLT architecture and processing model -- in other words this solution is a nasty hack.
如果您使用的是 S1000D 之类的设备,这可能是一个特别麻烦的问题。它使用实体和 @boardno 属性链接到图形。这是对 SGML 根源的回归。
由于这种自动实体扩展行为是正确但不可取的,因此在使用 S1000D 作为输入时,我经常不得不使用 sed、awk 和批处理脚本等工具来管理某些数据分析任务。
恕我直言,这将是对即将推出的 XSLT 规范之一的重大更改提案,兼容的处理器接受可以打开和关闭实体扩展的运行时参数。
This can be an especially troublesome issue if you are using something like S1000D. It uses entities and @boardno attributes to link to figures. It's a throwback to its SGML roots.
Because this automatic entity expanding behavior, which is correct but undesireable, I often have to drop back to tools like sed, awk and batch scripts to manage certain data analysis tasks when using S1000D as input.
IMHO, this would be a great change proposal to one of the upcoming XSLT specifications that a compliant processor accept a runtime parameter that can turn on and off entitiy expansions.
如果您使用 XSLT 2.0 处理器的 Java 实现(例如 Saxon 9 Java),您可能需要检查 http://andrewjwelch .com/lexev/ 可以帮助您,您可以使用实体和字符引用预处理 XML,这样可以将它们标记为 XML 元素,然后您可以根据需要进行转换。
If you use a Java implementation of an XSLT 2.0 processor (like Saxon 9 Java) you might want to check whether http://andrewjwelch.com/lexev/ helps out, you can preprocess your XML with entity and character references that way to get them marked up as XML elements you can then transform as necessary.
我使用这个解决方案并且效果很好:
I use this solution and it works well :
您可以使用 DOM LS 解析器并将“entities”参数设置为 true,将 EntityReference 节点保留在文档中。
http://docs.oracle.com/javase /6/docs/api/org/w3c/dom/DOMConfiguration.html
规范表示默认值是 true,但根据解析器,它可能是 false,请注意这一点。
要加载 Xerces:
您也可以使用注册表,如下所示,但就我个人而言,我宁愿像上面那样对我想要的实现进行硬编码:
然后,加载您的文档:
然后,您的 XML 实体不会在 DOM 中扩展。
然后,由于 SAXON 不处理未扩展的实体(“DOM 中不支持的节点类型!5”错误),因此无法使用
net.sf.saxon.xpath.XPathFactoryImpl
,必须设置默认值Xerces 的 XPathFactory 与 XPathFactory.newInstance()You can keep EntityReference nodes in the document by using a DOM LS parser with "entities" parameter set to true.
http://docs.oracle.com/javase/6/docs/api/org/w3c/dom/DOMConfiguration.html
The specification says the default value is true but depending on the parser, it could be false, be aware of that.
To load Xerces :
You can use registry as below too but personnaly, I would rather hardcode the implementation I want as above:
Then, to load your document :
Then, your XML entities are not expanded in the DOM.
Then, because SAXON does not handle entities not expanded ('Unsupported node type in DOM! 5' error), you can not use
net.sf.saxon.xpath.XPathFactoryImpl
, you have to set the default XPathFactory of Xerces with XPathFactory.newInstance()