创建 XML 的最快且最有效的方法
在 Java 中创建 XML 文档最快、最有效的方法是什么?那里有大量的库(woodstox、xom、xstream...),只是想知道是否有人有任何意见。我应该采用代码生成方法(因为 xml 模式众所周知)吗?或者运行时的反射方法?
编辑附加信息:
- 定义良好的 XML 模式可用且很少更改
- 要求是将 java 对象转换为 XML,反之亦然
- 每秒将数千个 java 对象转换为 XML
- 代码生成、代码复杂性、配置、维护等仅次于更高的性能。
What is the fastest and most efficient way to create XML documents in Java? There is a plethora of libraries out there (woodstox, xom, xstream...), just wondering if any one has any input. Should I go with code generation approach (since xml schema is well known)? Or reflection approach at run-time?
Edited with Additional information:
- Well defined XML Schema is available and rarely changes
- Requirement is to convert a java object to XML, and not vice versa
- Thousands of java objects to XML per second
- Code generation, code complexity, configuration, maintenance etc. is second to higher performance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
如果我要创建一个非常简单的 XML 内容,我会坚持仅使用 JDK api,不引入第三方依赖项。
因此,对于简单的 XML,如果我要将 XML 文件映射到 Java 类(或反之亦然),我会选择 JAXB。 查看本教程,了解它是多么简单。
现在。
如果我要使用恒定方案创建一些更复杂的 XML 输出,我会使用一些模板引擎,Freemarker也许。 Thymeleaf 看起来也不错。
最后。
如果我要非常有效地创建巨大的 XML 文件,我会使用 SAX 解析器。
我希望您现在明白,您有很多可能性 - 选择最适合您需求的匹配:)
祝您玩得开心!
If I was to create a very simple XML content, I would stick to the JDK api only, introducing no third party dependencies.
So for simple XML and if I was to map XML file to Java classes (or vice-versa), I would go for JAXB. See this tutorial to see how easy it is.
Now.
If I was to create some more sophisticated XML output with constant scheme, I would use some templating engine, Freemarker perhaps. Thymeleaf looks nice as well.
And finally.
If I was to create huge XML files very effectively, I would use SAX parser.
I hope you understand now, that you have plenty of possibilities - choose the best match for your needs :)
And have fun!
尝试 Xembly,这是一个小型开源库,它使 XML 创建过程变得非常简单和直观:
Xembly 是一个包装器围绕原生 Java DOM,是一个非常轻量级的库(我是一名开发人员)。
Try Xembly, a small open source library that makes this XML creating process very easy and intuitive:
Xembly is a wrapper around native Java DOM, and is a very lightweight library (I'm a developer).
首先,序列化正确很重要。手写序列化器通常不是。例如,他们倾向于忘记字符串“]]>”不能出现在文本节点中。
如果您是一名有能力的 Java 程序员,那么编写自己的既正确又快速的序列化程序并不太困难,但由于一些非常有能力的 Java 程序员之前已经出现过,我认为您不太可能以足够的优势击败他们使编写自己的代码的努力变得值得。
除了大多数通用库可能会通过提供序列化选项(例如缩进、编码或选择行结尾)而减慢速度。您可能会通过避免不需要的功能来压缩额外的性能。
此外,一些通用库可能会检查您向它们抛出的内容的格式正确性,例如检查名称空间前缀是否已声明(如果没有,则声明它们)。如果不进行检查,您可能会做得更快。另一方面,您可能会创建一个速度很快但难以使用的库。将绩效置于所有其他目标之上几乎总是一个错误。
至于可用库的性能,请对其进行测量,然后告诉我们您发现了什么。
Firstly, it's important that the serialization is correct. Hand-written serializers usually aren't. For example, they have a tendency to forget that the string "]]>" can't appear in a text node.
It's not too difficult to write your own serializer that is both correct and fast, if you're a capable Java programmer, but since some very capable Java programmers have been here before I think you're unlikely to beat them by a sufficient margin to make it worth the effort of writing your own code.
Except perhaps that most general-purpose libraries might be slowed down a little by offering serialization options - like indenting, or encoding, or like choosing your line endings. You might just squeeze an extra ounce of performance by avoiding unwanted features.
Also, some general-purpose libraries might check the well-formedness of what you throw at them, for example checking that namespace prefixes are declared (or declaring them if not). You might make it faster if it does no checking. On the other hand, you might create a library that is fast, but a pig to work with. Putting performance above all other objectives is almost invariably a mistake.
As for the performance of available libraries, measure them, and tell us what you find out.
我知道的最好的方法是使用能够创建节点的 XPath 引擎。 XMLBeam 能够做到这一点(在此处的 JUnit 测试中):
该程序打印出:
The nicest way I know is using an XPath engine that is able to create Nodes. XMLBeam is able to do this (in a JUnit test here):
This program prints out:
使用 XMLStreamWriter。
我运行了一个微基准测试,序列化了其中一百万个:
结果如下
:
StringBuilder 是最有效的,但那是因为它不需要遍历所有搜索 "、&、< 和 > 并将它们转换为 XML 实体的文本。
Use XMLStreamWriter.
I ran a microbenchmark serializing one million of these:
with these results:
which gives:
StringBuilder is the most efficient, but that's because it doesn't need to go through all the text searching for ", &, <, and > and converting them into XML entities.
受到 Petr 回答的启发,我花了一天的大部分时间来实现这样的基准,在此过程中阅读了很多关于 JMH 的文章。该项目位于:https://github.com/62mkv/xml-serialization-benchmark
结果如下:
我没有包含 Xembly,因为根据它的描述,对于这种特殊情况来说,它看起来有点矫枉过正。
令我有点惊讶的是,
XStream
的记录如此糟糕,因为它来自 ThoughtWorks,但这可能只是因为我没有针对这种特殊情况对其进行足够好的自定义。XMLStreamWriter
的默认 Java 8 标准库 StAX 实现在性能方面无疑是最好的。但就开发人员体验而言,XStream
是使用最简单的一种,而XMLStreamWriter
还需要更容易出错的工作才能完全实现;而JAXB
在这两项提名中当之无愧地位居第二。PS:非常欢迎反馈和改进套件的建议!
Inspired by answer by Petr, I spent better part of the day implementing such a benchmark, reading lots on JMH in the process. The project is here: https://github.com/62mkv/xml-serialization-benchmark
and the results were as follows:
I did not include Xembly, because by it's description it looked like an overkill for this particular case.
I was a bit surprised that
XStream
had such a poor track record, given it comes from ThoughtWorks, but might be just because I did not customize it good enough for this particular case. And the default, Java 8 standard library StAX implementation forXMLStreamWriter
is hands down the best in terms of performance. But in terms of developer experience,XStream
is the simplest one to use, whileXMLStreamWriter
also requires way more error-prone effort to fully implement; whileJAXB
is on a well-deserved second place in both nominations.PS: Feedback and suggestions to improve the suite are very much welcome!