为什么sax解析比dom解析快？斯塔税是如何运作的？

发布于 2024-09-25 18:53:48 字数 921 浏览 2 评论 0原文

是的，这个问题相当冗长 - 抱歉。我尽可能保持密集。我将问题加粗，以便在阅读全文之前更容易浏览。

为什么 sax 解析比 dom 解析快？ 我唯一能想到的是，使用 sax 你可能会忽略大部分传入数据，因此不会浪费时间处理部分数据你不关心的xml。 IOW - 使用 SAX 解析后，您无法重新创建原始输入。 如果您编写 SAX 解析器，使其能够解释每个 xml 节点（因此可以重新创建原始节点），那么它不会比 DOM 更快，不是吗？

我这样做的原因是问题是我正在尝试更快地解析 xml 文档。我需要在解析后访问整个 xml 树。我正在编写一个供第三方服务插入的平台，因此我无法预测需要 xml 文档的哪些部分以及不需要哪些部分。我什至不知道传入文档的结构。这就是为什么我不能使用 jaxb 或 sax 的原因。内存占用对我来说不是问题，因为 xml 文档很小，而且我一次只需要内存中的 1 个文档。解析这个相对较小的 xml 文档所花费的时间让我很烦恼。我以前没有使用过 stax，但也许我需要进一步调查，因为它可能是中间立场？ 如果我理解正确的话，stax会保留原始的xml结构并按需处理我要求的部分？这样，原始的解析时间可能很快，但每次我要求它遍历部分尚未遍历的树的哪个部分，就是处理发生的时间？

如果您提供的链接可以回答大多数问题，我将接受您的答案（如果我的问题已在其他地方得到解答，您不必直接回答我的问题）。

更新：我用 sax 重写了它，它解析文档的平均时间为 2.1 毫秒。与 dom 所花费的 2.5 毫秒相比，这是一个改进（快了 16%），但这并不是我（等人）猜测的幅度

谢谢

原文

somewhat related to: libxml2 from java

yes, this question is rather long-winded - sorry. I kept is as dense as I felt possible. I bolded the questions to make it easier to peek at before reading the whole thing.

Why is sax parsing faster than dom parsing? The only thing I can come up with is that w/ sax you're probably ignoring the majority of the incoming data, and thus not wasting time processing parts of the xml you don't care about. IOW - after parsing w/ SAX, you can't recreate the original input. If you wrote your SAX parser so that it accounted for each and every xml node (and could thus recreate the original), then it wouldn't be any faster than DOM would it?

The reason I'm asking is that I'm trying to parse xml documents more quickly. I need to have access to the entire xml tree AFTER parsing. I am writing a platform for 3rd party services to plug into, so I can't anticipate what parts of the xml document will be needed and which parts won't. I don't even know the structure of the incoming document. This is why I can't use jaxb or sax. Memory footprint isn't an issue for me because the xml documents are small and I only need 1 in memory at a time. It's the time it takes to parse this relatively small xml document that is killing me. I haven't used stax before, but perhaps I need to investigate further because it might be the middle ground? If I understand correctly, stax keeps the original xml structure and processes the parts that I ask for on demand? In this way, the original parse time might be quick, but each time I ask it to traverse part of the tree it hasn't yet traversed, that's when the processing takes place?

If you provide a link that answers most of the questions, I will accept your answer (you don't have to directly answer my questions if they're already answered elsewhere).

update: I rewrote it in sax and it parses documents on avg 2.1 ms. This is an improvement (16% faster) over the 2.5 ms that dom was taking, however it is not the magnitude that I (et al) would've guessed

Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夜巴黎 2024-10-02 18:53:48

假设您除了解析文档之外什么都不做，则不同解析器标准的排名如下：

1。 StAX 最快

将事件报告给您

2。接下来是 SAX

它可以完成 StAX 所做的所有事情，而且内容是自动实现的（元素名称、命名空间、属性......）

3。 DOM 是最后一个，

它执行 SAX 执行的所有操作，并将信息呈现为 Node 的实例。

您的用例

如果您需要维护所有 XML，DOM 就是标准表示形式。它与 XSLT 转换完美集成 (javax. xml.transform), XPath (javax.xml.xpath）和模式验证（javax.xml.validation) API。然而，如果性能是关键，那么您可以使用 StAX 构建自己的树结构，速度比 DOM 解析器构建 DOM 的速度更快。