与在 Dom4J 中使用 DOM 相比,XPath 的效率如何?
例如,考虑以下 xml
<root>
<childNode attribute1="value1">
<grandChildNode attrib1="val1" attrib2="val2">some content1
</grandChildNode>
<grandChildNode attrib1="val1" attrib2="val2">some content2
</grandChildNode>
<grandChildNode attrib1="val1" attrib2="val2">some content3
</grandChildNode>
</childNode>
<childNode attribute1="value1">
<grandChildNode attrib1="val1" attrib2="val2">some content1
</grandChildNode>
<grandChildNode attrib1="val1" attrib2="val2">some content2
</grandChildNode>
<grandChildNode attrib1="val1" attrib2="val2">some content3
</grandChildNode>
</childNode>
<childNode attribute1="value1">
<grandChildNode attrib1="val1" attrib2="val2">some content1
</grandChildNode>
<grandChildNode attrib1="val1" attrib2="val2">some content2
</grandChildNode>
<grandChildNode attrib1="val1" attrib2="val2">some content3
</grandChildNode>
</childNode>
</root>
使用 DOM 获取根节点,然后循环访问 childNode 和 grandChildNode 是否高效,或者使用 XPath 表达式收集 child 和 grandChild 节点的详细信息是否高效?
For example consider the following xml
<root>
<childNode attribute1="value1">
<grandChildNode attrib1="val1" attrib2="val2">some content1
</grandChildNode>
<grandChildNode attrib1="val1" attrib2="val2">some content2
</grandChildNode>
<grandChildNode attrib1="val1" attrib2="val2">some content3
</grandChildNode>
</childNode>
<childNode attribute1="value1">
<grandChildNode attrib1="val1" attrib2="val2">some content1
</grandChildNode>
<grandChildNode attrib1="val1" attrib2="val2">some content2
</grandChildNode>
<grandChildNode attrib1="val1" attrib2="val2">some content3
</grandChildNode>
</childNode>
<childNode attribute1="value1">
<grandChildNode attrib1="val1" attrib2="val2">some content1
</grandChildNode>
<grandChildNode attrib1="val1" attrib2="val2">some content2
</grandChildNode>
<grandChildNode attrib1="val1" attrib2="val2">some content3
</grandChildNode>
</childNode>
</root>
Would using DOM to get the root node, then cycle through the childNode and grandChildNode be efficient or using XPath expressions to gather the details of the child and grandChild nodes be efficient?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您想要完整地处理 XML 文档,那么从反序列化时间、CPU 使用率和内存使用率来看,将 XML 解析为 DOM 几乎总是效率最低的。
解析为 DOM 需要的内存量大约是 XML 文档所需磁盘空间的 10-15 倍。 例如,一个 1 MB 的 XML 文档将解析为占用 10-15 MB 内存的 DOM。
仅当您打算修改部分或全部数据,然后将结果放回 XML 文档时,才解析为 DOM。 对于所有其他用例,DOM 都是一个糟糕的选择。
XPath 通常占用的资源要少得多,但这确实取决于文档的长度(即您有多少个“childNode”元素)以及您感兴趣的数据在文档中的位置。
随着文档的深入,XPath 内存使用量和完成时间往往会增加。 例如,假设您有一个包含 20,000 个 childNode 元素的 XML 文档,每个 childNode 都有一个您事先知道的唯一标识符,并且您希望从文档中提取一个已知的 childNode。 提取第 18,345 个子节点将比提取第三个子节点使用更多、更多、更多的内存。
因此,如果您使用 XPath 提取所有 childNode 元素,您可能会发现它比解析为 DOM 效率低。 XPath 通常是提取 XML 文档一部分的简单方法。 我不建议使用它来处理整个 XML 文档。
到目前为止,如果您确实希望提取和处理 XML 文档中的所有数据,最好的方法是使用基于 SAX 的读取器。 与任何其他方法相比,这将速度快几个数量级并且资源消耗更少。
也就是说,它也取决于您正在处理的数据量。 对于您提供的示例 XML 文档,您不会注意到任何实际差异。 是的,DOM 会“慢”,SAX 会“快”,但我们谈论的是毫秒或微秒的差异。
SAX 可以轻松地比 DOM 快数百或数千倍,但是如果这是 2 微秒和 2 毫秒之间的差异,您将不会注意到。 当您处理包含 20,000 个 childNode 元素的文档时,2 秒与 200 秒相比将成为一个更大的问题。
If you want to process an XML document in its entirety, parsing XML into a DOM will almost always be the least efficient in terms of deserialisation time, CPU usage and memory usage.
Parsing to a DOM requires around 10-15 times the amount of memory as the XML document requires disk space. For example, a 1 megabyte XML document will parse into a DOM taking up 10-15 megabytes of memory.
Only ever parse into a DOM if you intend to modify some or all of the data and then put the result back into an XML document. For all other use cases, DOM is a poor choice.
XPath is often significantly less resource heavy, but this does depend on the length of the document (i.e. how many 'childNode' elements you have) and the location in the document of the data in which you are interested.
XPath memory usage and completion time tends to increase the further down the document you go. For example, let's say you have an XML document with 20,000 childNode elements, each childNode has a unique identifier that you know in advance, and you want to extract a known childNode from the document. Extracting the 18,345th childNode would use much, much, much more memory than extracting the 3rd.
So if you are using XPath to extract all childNode elements, you may find it less efficient than parsing into a DOM. XPath is generally an easy way of extracting a portion of an XML doucment. I'd not recommend using it for processing all of an XML document.
By far the best approach, if you are indeed looking to extract and process all data in an XML document, would be to use a SAX-based reader. This will be both orders of magnitude faster and less resource heavy than any other approach.
That said, it does also depend on the volume of data you are dealing with. For the example XML document you gave, you won't notice any practical difference. Yes, DOM will be 'slow' and SAX will be 'fast', but we're talking milli- or micro-second differences.
SAX can easily be hundreds or thousands of times faster than DOM, however if that's the difference between 2 microseconds and 2 milliseconds you're not going to notice. When you're dealing with a document containing 20,000 childNode elements, 2 seconds versus 200 seconds will become more of a problem.