XML API 实现最佳性能
我有一个可以处理大量 XML 数据的应用程序。所以,我想问你哪一个是java中处理XML的最好的API。今天,我使用 W3,为了性能,我想迁移到一些 API。 我从 0 开始制作 XML,进行大量转换、导入数据库(mysql、mssql 等)、从数据库导出到 html、修改这些 XML 等等。
JDOM 是最好的选择吗?除了JDOM,你还知道其他更好的东西吗? 我(通过阅读页面)听说过javolution。有人用吗?
您推荐我哪个 API?
I have an application that works with a lot of XML data. So, I want to ask you which is the best API to handle XML in java. Today, I'm using W3 and, for performance, I want to migrate to some API.
I make XML from 0, a lot of transforms, import into database (mysql, mssql, etc), export from database to html, modifi of those XML, and more.
Is JDOM the best option? do you know some other better than JDOM?
I heard (by reading pages) about javolution. Somebody use it?
Which API you recommend me?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您有大量数据,最主要的是避免一次将其全部加载到内存中(因为它将使用大量内存,并且因为它可以防止您重叠 IO 和处理)。遗憾的是,我相信大多数 DOM 和类 DOM 库(如 DOM4J)都这样做,因此它们不太适合高效处理大量 XML。
相反,请考虑使用流 API,例如 SAX 或 StAX。根据我的经验,StAX 通常更容易使用。
还有其他 API 试图为您提供 DOM 的便利性和 SAX 的性能。 Javolution 可能就是其中之一; VTD-XML 是另一个。但说实话,我发现 StAX 很容易使用 - 它基本上是一个奇特的流,因此您只需以与从流中读取文本文件相同的方式思考即可。
您可以尝试的一件事是将 JAXB 与 StAX 结合起来。这个想法是使用 StAX 流式传输文件,然后使用 JAXB 解组其中的块。例如,如果您正在处理 Atom feed,您可以打开它,阅读过去标头,然后循环工作,每次将
entry
元素解组到对象。仅当您的格式由一系列独立元素(例如 Atom)组成时,这才真正有效;它对于像 XHTML 这样更丰富的东西来说基本上没有用处。您可以在 JAXB 参考实现 和 一个人的博客文章。If you have vast amounts of data, the main thing is to avoid having to load it all into memory at once (because it will use a vast amount of memory, and because it prevents you overlapping IO and processing). Sadly, i believe most DOM and DOM-like libraries (like DOM4J) do just that, so they are not well suited for processing vast amounts of XML efficiently.
Instead, look at using a streaming API, like SAX or StAX. StAX is, in my experience, usually easier to use.
There are other APIs that try to give you the convenience of DOM with the performance of SAX. Javolution might be one; VTD-XML is another. But to be honest, i find StAX quite easy to work with - it's basically a fancy stream, so you just think in the same way as if you were reading a text file from a stream.
One thing you might try is combining JAXB with StAX. The idea is that you stream the file using StAX, then use JAXB to unmarshal chunks within it. For instance, if you were processing an Atom feed, you could open it, read past the header, then work in a loop unmarshalling
entry
elements to objects one at a time. This only really works if your format consists of a sequence of independent elements, like Atom; it would be largely useless on something richer like XHTML. You can see examples of this in the JAXB reference implementation and a guy's blog post.答案取决于哪些性能方面对您的应用程序很重要。因素之一是您是否正在处理大型 XML 文档。
对于解析,基于 DOM 的方法无法很好地扩展到大型文档。如果您需要解析大型文档,非 DOM 解析器(例如使用 SAX 和 StAX 的解析器)将更快且资源消耗更少。但是,如果您需要在解析后使用 XSL 或 DOM API 转换 XML,则无论如何您都需要将整个文档存储在内存中。
为了从代码创建 XML,StAX 为此提供了一个很好的 API。由于该方法是基于流的,因此这可以很好地扩展到编写非常大的文档。
The answer depends on what performance aspects are important for your application. One factor is whether you are handling large XML documents.
For parsing, DOM-based approaches will not scale well to large documents. If you need to parse large documents, non-DOM parsers such as those using SAX and StAX will be faster and less resource intensive. However, if you need to transform XML after parsing, using either XSL or a DOM API, you are going to need the whole document in memory in any case.
For creating XML from code, StAX provides a nice API for this. Since the approach is stream-based, this will scale well to writing very large documents.
好吧,我认识的大多数开发人员和我自己,我们使用 dom4J,也许如果你有时间,你可以写一个使用这两个框架进行小型性能测试,然后您就会看到差异。我更喜欢dom4j。
Well, the most developers I know and myself, we use dom4J, maybe if you have the time you could write a small performancetest with use of both frameworks, then you will see the difference. I prefere dom4j.