xslt编码特殊字符
我使用 Java 和 Xalan 2.7 将一种 XML 转换为另一种 XML。
来源是StreamSource(UTF-8 Reader);
结果是 StreamResult(ByteArrayOutputStream)
现在我的模板设置为使用 UTF-8 (注意版本 1.0):
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml"
indent="yes"
omit-xml-declaration="yes"
encoding="UTF-8"/>
现在我希望输出对所有特殊字符进行编码。类似于 2.0
字符映射表,因此 € = &#euro;
或等效的十六进制。
如何以最少的努力做到这一点?
I'm using Java with Xalan 2.7 to transform one XML to another.
Source is StreamSource(UTF-8 Reader);
Result is StreamResult(ByteArrayOutputStream)
Now my template is set to use UTF-8 (note the version 1.0):
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml"
indent="yes"
omit-xml-declaration="yes"
encoding="UTF-8"/>
Now I want that output would have all special characters encoded. Something like 2.0
character map, so that € = euro;
or hex equivalent.
How to do this with least effort?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我假设“特殊”字符指的是 ASCII 范围之外的任何字符。
如果您不希望在生成的 XML 中包含这些字符,则无需指定 UTF-8 作为编码,因为内容中不会直接包含任何非 ASCII 字符。
您只需将
ASCII
指定为输出编码< /a> 在您的 XSLT 样式表上,让 XSLT 处理器输出数字字符引用 适用于所有非 ASCII 字符。JDK 6 默认处理器和 Xalan 2.7 都支持这一点。
I'll assume that by "special" characters you mean anything outside of the ASCII range.
If you don't want those characters in your resulting XML, then you don't need to specify UTF-8 as the encoding, since you won't have any non-ASCII characters in your content directly.
You can simply specify
ASCII
as the output encoding on your XSLT stylesheet to get the XSLT processor to output numeric character references for all non-ASCII characters.Both the JDK 6 default processor as well as Xalan 2.7 support this.
我曾经有过类似的需求,因为我需要通过 XSLT 处理不可打印的字符。
我想出了使用 FilterInputStream/FilterOutputStream 的方法,它使用一个小型有限状态自动机来编组和解组此类符号。
希望这能给你一些想法:-)
I once had a similar requirement because I needed to process unprintable chars through XSLT.
I came up with using a FilterInputStream/FilterOutputStream which used a small finite state automata to marshal and unmarshal such notation.
Hope this gives you some ideas :-)