java 标准库产生错误的 xml 1.1

发布于 2024-10-17 09:06:27 字数 2508 浏览 2 评论 0原文

上周我发现了这个有趣的问题。运行下面的程序。很简单,首先创建一个虚拟的 xml 文件,然后使用标准 lib 读取它并将其写回到文件中。

查看生成的 gtest2.xml,您会发现其中有一些莫名其妙的内容。

就我而言,这是错误部分的示例(不同机器上的位置有所不同)。

<test>1924</test>
<test>1925</test>
<test>t&gt;24</test>
<test>1927</test>
<test>1928</test>
<test>1929</test>

如果我将 xml 版本更改为 1.0,则不会发生这种情况。我的代码或者jdk有问题吗?

这是测试代码:

import java.io.File;
import java.io.PrintWriter;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import org.w3c.dom.Document;

public class DocumentBuilderCheck {

    public static void main(String[] args) throws Exception {
        String filename = "/tmp/gtest.xml";
        generateXmlFile(filename, 2500);
        Document doc = readXmlFile(filename);

        String filename2 = "/tmp/gtest2.xml";
        writeDocument(doc, filename2);
    }

    private static void writeDocument(Document document, String filename) throws Exception {
        StreamResult streamResult = new StreamResult(filename);
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        Transformer transformer = transformerFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty(OutputKeys.METHOD, "xml");
        transformer.transform(new DOMSource(document), streamResult);
    }

    private static Document readXmlFile(String filename) throws Exception {
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        dbf.setNamespaceAware(true);
        DocumentBuilder db = dbf.newDocumentBuilder();
        Document doc = db.parse(new File(filename));
        return doc;
    }

    private static void generateXmlFile(String filename, int total)
            throws Exception {
        File f = new File(filename);

        PrintWriter pw = new PrintWriter(f);
        pw.write("<?xml version=\"1.1\" encoding=\"UTF-8\"?>");
        pw.write("<main_tag>");
        for (int i = 0; i < total; i++) {
            pw.write("<test>" + String.format("%04d", i) + "</test>");
        }
        pw.write("</main_tag>");
        pw.close();
    }
}

I found this interesting problem last week. Run the program below. It's very simple, first create a dummy xml file, and read it with standard lib and write it back to a file.

Look through the generated gtest2.xml, you will see that it has some content that were come out of nowhere.

In my case, this is the sample of wrong section (the place vary on different machine).

<test>1924</test>
<test>1925</test>
<test>t>24</test>
<test>1927</test>
<test>1928</test>
<test>1929</test>

This does not happen if I changed my xml version to 1.0. So something wrong with my code or jdk?

Here is the test code:

import java.io.File;
import java.io.PrintWriter;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import org.w3c.dom.Document;

public class DocumentBuilderCheck {

    public static void main(String[] args) throws Exception {
        String filename = "/tmp/gtest.xml";
        generateXmlFile(filename, 2500);
        Document doc = readXmlFile(filename);

        String filename2 = "/tmp/gtest2.xml";
        writeDocument(doc, filename2);
    }

    private static void writeDocument(Document document, String filename) throws Exception {
        StreamResult streamResult = new StreamResult(filename);
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        Transformer transformer = transformerFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty(OutputKeys.METHOD, "xml");
        transformer.transform(new DOMSource(document), streamResult);
    }

    private static Document readXmlFile(String filename) throws Exception {
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        dbf.setNamespaceAware(true);
        DocumentBuilder db = dbf.newDocumentBuilder();
        Document doc = db.parse(new File(filename));
        return doc;
    }

    private static void generateXmlFile(String filename, int total)
            throws Exception {
        File f = new File(filename);

        PrintWriter pw = new PrintWriter(f);
        pw.write("<?xml version=\"1.1\" encoding=\"UTF-8\"?>");
        pw.write("<main_tag>");
        for (int i = 0; i < total; i++) {
            pw.write("<test>" + String.format("%04d", i) + "</test>");
        }
        pw.write("</main_tag>");
        pw.close();
    }
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

无名指的心愿 2024-10-24 09:06:28

我不知道会带来什么,但是 JDK 的一个众所周知的(?)问题是它通常包含旧版本的库,例如 Xerces(XML 解析器)和 Xalan(XSLT 处理器)。更糟糕的是,有时这些是使用旧版本作为基准的自定义版本,以及一些补丁集,因此很难验证会发生什么。

因此,建议不要依赖捆绑的任何内容,而是明确使用官方 Xerces/Xalas 版本,以确保所使用的版本是已知的,并且您至少可以检查存在哪些已知问题。

因此,也许您可​​以使用最新的 Xerces 和 Xalan 版本来确保它不是之前已修复的问题。

I don't know what gives, but one well-known (?) problem with JDK is that it often includes old version of libraries such as Xerces (XML parser) and Xalan (XSLT processor). Worse, sometimes these are custom versions using old version as baseline, and some set of patches, so it is hard to even verify what to expect.

As a result, recommendation is not to rely on whatever is bundled but instead explicitly use official Xerces/Xalas versions to ensure that version used is known and you can at least check what known issues exist.

So maybe you can use latest Xerces and Xalan versions to ensure it's not something that has been fixed earlier.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文