如何停止 XmlSerializer 转换 ê到ê在属性中?

发布于 2024-09-05 01:34:16 字数 1151 浏览 8 评论 0 原文

我有以下 DOM

    <row>
        <link href="B&#252;ro.txt" target="_blank">
            my link
        </link>
    </row>

当我使用 Java XmlSerializer 将其序列化为文件时,结果如下:

    <row>
        <link href="B&amp;#252;ro.txt" target="_blank">
            my link
        </link>
    </row>

有没有办法控制 XmlSerializer 处理属性转义的方式?我应该以不同的方式做这件事吗?

更新

我还应该说我正在使用 jre 1.6。直到最近我一直在使用 jre 1.5,并且我非常确定它已“正确”序列化(即“&”未转义)

说明

DOM 是通过编程方式创建的。这是一个示例:

        Document doc = createDocument();
        Element root = doc.createElement("root");
        doc.appendChild(root);
        root.setAttribute("test1", "&#234;");
        root.setAttribute("test2", "üöä");
        root.appendChild(doc.createTextNode("&#234;"));

        StringWriter sw = new StringWriter();

        serializeDocument(doc, sw);
        System.out.println(sw.toString());

我的解决方案 我真的不想这样做,因为它涉及大量的代码更改和测试,但我决定将属性数据移动到 CDATA 元素中。问题已解决已避免。

I have the following DOM

    <row>
        <link href="Büro.txt" target="_blank">
            my link
        </link>
    </row>

When I serialize it to a file using the Java XmlSerializer it comes out like this:

    <row>
        <link href="B&#252;ro.txt" target="_blank">
            my link
        </link>
    </row>

Is there any way to control the way XmlSerializer handles escaping in attributes? Should I be doing this differently any way?

Update

I should also say that I am using jre 1.6. I had been using jre 1.5 until recently and I am pretty sure that it was serialized 'correctly' (i.e. the '&' was not escaped)

Clarification

The DOM is created programmatically. Here is an example:

        Document doc = createDocument();
        Element root = doc.createElement("root");
        doc.appendChild(root);
        root.setAttribute("test1", "ê");
        root.setAttribute("test2", "üöä");
        root.appendChild(doc.createTextNode("ê"));

        StringWriter sw = new StringWriter();

        serializeDocument(doc, sw);
        System.out.println(sw.toString());

My solution
I didn't really want to do this because it involved a fair amount of code change and testing but I decided to move the attribute data into a CDATA element. Problem solved avoided.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

望她远 2024-09-12 01:34:17

问题是您正在使用已根据 XML 约定“转义”的属性值构建 DOM。 DOM(当然)没有意识到您已经这样做了并且正在转义&符号。

您应该更改

root.setAttribute("test1", "ê");

root.setAttribute("test1", "\u00EA");

换句话说,在构造 DOM 时使用由纯 Unicode 代码点组成的字符串。然后,XMLSerializer 应根据需要用字符实体替换 Unicode 字符...具体取决于为输出文档选择的字符编码。

编辑 - 您可能仍然在输出 XML 中看到原始字符而不是字符实体的原因是 XMLSerializer 使用 XML 的默认编码;即UTF-8。解决这个问题的方法是使用 XMLSerializer(OutputFormat) 构造函数,传递指定 XML 所需字符编码的 OutputFormat。 (听起来您正在使用“ASCII”。)请务必使用 OutputStream 的兼容字符编码。

The problem is that you are building the DOM with attribute values that have already been "escaped" according to the XML conventions. The DOM (of course) doesn't realize that you have done this and is escaping the ampersand.

You should change

root.setAttribute("test1", "ê");

to

root.setAttribute("test1", "\u00EA");

In other words, use strings consisting of plain Unicode codepoints when constructing the DOM. The XMLSerializer should then replace Unicode characters with character entities as required ... depending on the chosen character encoding for the output document.

EDIT - The reason that you may still be seeing raw characters rather than character entities in the ouput XML is that the XMLSerializer is using the default encoding for XML; i.e. UTF-8. The way to address this is use the XMLSerializer(OutputFormat) constructor, passing an OutputFormat that specifies the required character encoding for the XML. (It sounds like you are using "ASCII".) Be sure to use to compatible character encoding for the OutputStream.

尸血腥色 2024-09-12 01:34:17

如何获取 DOM?会不会跟这个有关系?我使用 Sun Java 6 和最新的 Xerces-J (2.9.1) 尝试了使用标准 DocumentBuilder(只是因为我更熟悉它)的示例 XML,顺便说一下,它弃用了 XmlSerializer,转而使用 LSSerializer 或 TrAX。

无论如何,使用这种技术,序列化文档甚至不再包含字符引用,并被转换为“Büro.txt”。我使用了以下代码:

String xml = "<row>\n"
    + "        <link href=\"Büro.txt\" target=\"_blank\">\n"
    + "            my link\n" + "        </link>\n" + "    </row>";

InputStream is = new ByteArrayInputStream(xml.getBytes());
Document doc = DocumentBuilderFactory.newInstance()
    .newDocumentBuilder().parse(is);

XMLSerializer xs = new XMLSerializer();
xs.setOutputCharStream(new PrintWriter(System.err));

xs.serialize(doc);

How do you obtain the DOM? Could it have something to do with that? I tried your sample XML with the standard DocumentBuilder (just b/c I'm more familiar with it) using Sun Java 6 and the latest Xerces-J (2.9.1) which by the way deprecates XmlSerializer in favor of LSSerializer or TrAX.

Anyway, using this technique, the serialized document does not even contain the character reference anymore and gets converted to "Büro.txt". I used the following code:

String xml = "<row>\n"
    + "        <link href=\"Büro.txt\" target=\"_blank\">\n"
    + "            my link\n" + "        </link>\n" + "    </row>";

InputStream is = new ByteArrayInputStream(xml.getBytes());
Document doc = DocumentBuilderFactory.newInstance()
    .newDocumentBuilder().parse(is);

XMLSerializer xs = new XMLSerializer();
xs.setOutputCharStream(new PrintWriter(System.err));

xs.serialize(doc);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文