使用字符 8221 进行 XSLT 转换

发布于 2024-10-14 03:03:05 字数 305 浏览 2 评论 0原文

我正在使用 javax.xml.transform.Transformer 和 XSLT 转换 XML 文档。该文档包含字符“和”（Java 整数代码 8220 和 8221）。这些不是正常的引号。

当我转换文档时，这些字符被转换为  和  现在，我的困难是如何将这些字符转换回人们可以使用的内容可以阅读吗？我尝试使用 utf-8、utf-16、ascii 等编码，使用 DOMReader 和 SAXReader 读取文档。但没有成功。

非常感谢您的帮助。最大限度。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

丑疤怪 2024-10-21 03:03:05

这些是 utf-8 字符 201c 和 201d。您正在转换为 HTML 吗？如果是这样，并且您的 xslt 指定了 HTML 输出，我希望它输出 &ldquo 和 &rldquo，因为它们是字符实体引用： http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
引用 XSLT 规范：

“html输出方法可能会输出一个
使用字符实体的字符
参考，如果在中为其定义了一个
输出的 HTML 版本
方法正在使用。”

http://www.w3.org/TR/ xslt#section-HTML-输出方法

回复收藏 0 原文

幼儿园老大 2024-10-21 03:03:05

输入：

<p> “ and ” </p>

使用此样式表（仅身份规则）：

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" encoding="utf-8" omit-xml-declaration="yes"/>
    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()" />
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

输出：

<p> “ and ” </p>

仅具有 html 序列化方法的 Xalan，输出：

<p> “ and ” </p>

因此，如果您想要正确的渲染，则需要输出正确的 HTML 文档...

此样式表：

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="html" encoding="utf-8"/>
    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()" />
        </xsl:copy>
    </xsl:template>
    <xsl:template match="/">
        <html>
            <head>
                <title>Test</title>
            </head>
            <body>
                <xsl:apply-templates/>
            </body>
        </html>
    </xsl:template>
</xsl:stylesheet>

输出：

<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
        <title>Test</title>
    </head>
    <body>
        <p> “ and ” </p>
    </body>
</html>

注意：正确的字符集编码声明。

This input:

<p> “ and ” </p>

With this stylesheet (just identity rule):

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" encoding="utf-8" omit-xml-declaration="yes"/>
    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()" />
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

Output:

<p> “ and ” </p>

Only Xalan with html serialization method, output:

<p> “ and ” </p>

So, if you want a proper renderization you need to output a proper HTML document...

This stylesheet:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="html" encoding="utf-8"/>
    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()" />
        </xsl:copy>
    </xsl:template>
    <xsl:template match="/">
        <html>
            <head>
                <title>Test</title>
            </head>
            <body>
                <xsl:apply-templates/>
            </body>
        </html>
    </xsl:template>
</xsl:stylesheet>

Output:

<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
        <title>Test</title>
    </head>
    <body>
        <p> “ and ” </p>
    </body>
</html>

Note: Proper charset encoding declaration.

回复收藏 0 原文

天荒地未老 2024-10-21 03:03:05

您需要了解，XSL 转换不是应用于 XML 文档本身，而是应用于该文档的树表示。文本节点包含特定编码的值，无论它们在输入文档中如何表示 - 构建树后它们是相同的。在转换过程中，您只需创建另一棵树，然后将其序列化。

您提到的某些字符需要特殊处理，具体取决于您选择的目标格式。在序列化为 XML 文档的情况下，它们会被“转义”，而在序列化为 HTML 的情况下，它们不会被“转义”。这就是为什么第一个答案为您提供了解决方法。

然而，这两种方法在转义方面的区别仅在于“disable-output-escaping”属性（XSLT 1.0）的默认值。如果是 XML 输出，则设置为“no”；如果是 HTML，则设置为“yes”。

因此，为了在不更改整个序列化方法的情况下解决您的问题，您可以在复制某些可能包含“特殊”字符的值时编写如下内容：

<xsl:value-of select="/my/node/text()" disable-output-escaping="yes"/>

PS 在 XSLT 2.0 中执行此类操作的首选方法是通过使用字符映射指令。

You need to understand that XSL transformation is applied not to the XML document per se but rather to tree representation of this document(s). Text nodes contain values in particular encoding regardless of how they were represented in input document - after tree is built they are same. During transformation you just create another tree and then it's serialized.

Some of characters like ones that you mentioned require special treatment depending on what destination format you choose. In case of serialization to XML document they are "escaped" and in case of serialization to HTML they are not. This is why first answer gives you a workaround.

However difference between these two methods in regard of escaping is just in the default value for "disable-output-escaping" attribute (XSLT 1.0). In case of XML output it's set to "no" and in case of HTML it's set to "yes".

So in order to fix your issue without changing the whole serialization method you could write something like this when you're copying some value which might contain "special" characters:

<xsl:value-of select="/my/node/text()" disable-output-escaping="yes"/>

P.S. In XSLT 2.0 preferred way to do this kind of things is by using character-map instruction.

回复收藏 0 原文

~没有更多了~

关于作者

不喜欢何必死缠烂打

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

使用字符 8221 进行 XSLT 转换

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

使用字符 8221 进行 XSLT 转换

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。