当前位置：文江博客话题详情

SAX 和 DOM 有什么区别？

发布于 2024-11-26 05:20:19 字数 343 浏览 11 评论 0原文

我读了一些关于 XML 解析器的文章，并遇到了 SAX 和 DOM。

SAX 是基于事件的，DOM 是树模型——我不明白这些概念之间的区别。

据我了解，基于事件意味着节点上发生了某种事件。就像当单击某个特定节点时，它会提供所有子节点，而不是同时加载所有节点。但在 DOM 解析的情况下，它将加载所有节点并创建树模型。

我的理解正确吗？

如果我错了，请纠正我，或者以更简单的方式向我解释基于事件和树模型。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

燕归巢 2024-12-03 05:20:19

好吧，你很接近了。

在 SAX 中，当 解析 XML 时会触发事件。当解析器解析 XML 时，遇到标记开始（例如），则会触发 tagStarted 事件（事件的实际名称可能有所不同）。同样，当解析时遇到标签结尾时 ()，它会触发 tagEnded。使用 SAX 解析器意味着您需要处理这些事件并理解每个事件返回的数据。

在 DOM 中，解析时不会触发任何事件。解析整个 XML，并生成并返回一个 DOM 树（XML 中的节点）。解析后，用户可以导航树以访问先前嵌入 XML 中各个节点中的各种数据。

一般来说，DOM 更容易使用，但在开始使用它之前需要解析整个 XML。

回复收藏 0 原文

留蓝 2024-12-03 05:20:19

简而言之...

SAX（Simple API for XML)：是一个基于流的处理器。任何时候，内存中都只有一小部分，您可以通过实现诸如 tagStarted() 等事件的回调代码来“嗅探”XML 流。它几乎不使用内存，但您不能这样做“DOM”的东西，比如使用 xpath 或遍历树。

DOM（D文档O对象 <模型）：将整个内容加载到内存中 - 这是一个巨大的内存消耗。即使是中等大小的文档也会耗尽内存。但你可以使用 xpath 并遍历树等。

回复收藏 0 原文

谜兔 2024-12-03 05:20:19

这里用更简单的话来说：

DOM

树模型解析器（基于对象）（节点树）。
DOM 将文件加载到内存中，然后解析文件。
有内存限制，因为它在解析之前加载整个 XML 文件。
DOM 可读写（可以插入或删除节点）。
如果 XML 内容较小，则首选 DOM 解析器。
可以向后和向前搜索来搜索标签和评估
标签内的信息。因此，这使得导航变得容易。
运行时较慢。

SAX

基于事件的解析器（事件序列）。
SAX 在读取文件时解析文件，即逐节点解析。
无内存限制，因为它不在内存中存储 XML 内容。
SAX 是只读的，即无法插入或删除节点。
内存内容较大时使用SAX解析器。
SAX 从上到下读取 XML 文件，并且无法向后导航。
运行时更快。

回复收藏 0 原文

从﹋此江山别 2024-12-03 05:20:19

您对基于 DOM 的模型的理解是正确的。 XML 文件将作为一个整体加载，其所有内容将构建为文档所代表的树的内存中表示形式。这可能会耗费时间和内存，具体取决于输入文件的大小。这种方法的好处是您可以轻松查询文档的任何部分，并自由操作树中的所有节点。

DOM 方法通常用于小型 XML 结构（其中小型取决于您的平台拥有多少马力和内存），这些结构在加载后可能需要以不同的方式进行修改和查询。

另一方面，SAX 旨在处理几乎任何大小的 XML 输入。 SAX 完全将这些工作留给您，而不是 XML 框架为您完成确定文档结构并为所有节点、属性等准备潜在大量对象的艰苦工作。

它的基本作用是从顶部读取输入并在发生某些“事件”时调用您提供的回调方法。事件可能会遇到开始标记、标记中的属性、在元素内查找文本或遇到结束标记。

SAX 顽固地读取输入并以这种方式告诉您它看到了什么。您需要维护所有您需要的状态信息。通常这意味着您将构建某种状态机。

虽然这种 XML 处理方法比较乏味，但它也非常强大。想象一下，您只想从博客提要中提取新闻文章的标题。如果您使用 DOM 读取此 XML，它会将 XML 中包含的所有文章内容、所有图像等加载到内存中，即使您对此不感兴趣。

使用 SAX，只要调用“startTag”事件方法，您就可以检查元素名称是否为（例如）“title”。如果是这样，您就知道需要添加下一个“elementText”事件为您提供的任何内容。当您收到“endTag”事件调用时，您再次检查这是否是“title”的结束元素。之后，您只需忽略所有其他元素，直到输入结束，或者出现另一个名称为“title”的“startTag”。等等...

您可以通过这种方式读取兆字节和兆字节的 XML，只需提取所需的少量数据。

当然，这种方法的缺点是，您需要自己做更多的簿记工作，具体取决于您需要提取哪些数据以及 XML 结构的复杂程度。此外，您自然无法修改 XML 树的结构，因为您永远无法掌握它的整体情况。

因此，一般来说，SAX 适合于通过特定的“查询”来梳理您收到的潜在大量数据，但不需要修改，而 DOM 更旨在为您提供更改结构和内容的充分灵活性，但代价是更高的资源需求。

You are correct in your understanding of the DOM based model. The XML file will be loaded as a whole and all its contents will be built as an in-memory representation of the tree the document represents. This can be time- and memory-consuming, depending on how large the input file is. The benefit of this approach is that you can easily query any part of the document, and freely manipulate all the nodes in the tree.

The DOM approach is typically used for small XML structures (where small depends on how much horsepower and memory your platform has) that may need to be modified and queried in different ways once they have been loaded.

SAX on the other hand is designed to handle XML input of virtually any size. Instead of the XML framework doing the hard work for you in figuring out the structure of the document and preparing potentially lots of objects for all the nodes, attributes etc., SAX completely leaves that to you.

What it basically does is read the input from the top and invoke callback methods you provide when certain "events" occur. An event might be hitting an opening tag, an attribute in the tag, finding text inside an element or coming across an end-tag.

SAX stubbornly reads the input and tells you what it sees in this fashion. It is up to you to maintain all state-information you require. Usually this means you will build up some sort of state-machine.

While this approach to XML processing is a lot more tedious, it can be very powerful, too. Imagine you want to just extract the titles of news articles from a blog feed. If you read this XML using DOM it would load all the article contents, all the images etc. that are contained in the XML into memory, even though you are not even interested in it.

With SAX you can just check if the element name is (e. g.) "title" whenever your "startTag" event method is called. If so, you know that you needs to add whatever the next "elementText" event offers you. When you receive the "endTag" event call, you check again if this is the closing element of the "title". After that, you just ignore all further elements, until either the input ends, or another "startTag" with a name of "title" comes along. And so on...

You could read through megabytes and megabytes of XML this way, just extracting the tiny amount of data you need.

The negative side of this approach is of course, that you need to do a lot more book-keeping yourself, depending on what data you need to extract and how complicated the XML structure is. Furthermore, you naturally cannot modify the structure of the XML tree, because you never have it in hand as a whole.

So in general, SAX is suitable for combing through potentially large amounts of data you receive with a specific "query" in mind, but need not modify, while DOM is more aimed at giving you full flexibility in changing structure and contents, at the expense of higher resource demand.

回复收藏 0 原文

假面具 2024-12-03 05:20:19

您正在比较苹果和梨。 SAX 是一个解析序列化 DOM 结构的解析器。有许多不同的解析器，“基于事件”是指解析方法。

也许有必要回顾一下：

文档对象模型 (DOM) 是一个抽象数据模型，它描述了分层的、基于树的文档结构；文档树由节点组成，即元素、属性和文本节点（以及其他一些节点）。节点有父节点、同级节点和子节点，并且可以被遍历等等，所有这些都是你在 JavaScript 中习惯做的事情（顺便说一句，这与 DOM 无关）。
DOM 结构可以序列化，即使用 HTML 或 XML 等标记语言写入文件。因此，HTML 或 XML 文件包含抽象文档树的“写出”或“展平”版本。
对于计算机来说，要操作甚至显示文件中的 DOM 树，它必须反序列化或解析文件并在其中重建抽象树记忆。这就是解析的用武之地。

现在我们来了解解析器的本质。一种解析方法是读入整个文档并在内存中递归地构建树结构，最后将整个结果公开给用户。（我想您可以将这些解析器称为“DOM 解析器”。）这对于用户来说非常方便（我认为这就是 PHP 的 XML 解析器所做的），但它存在可扩展性问题，并且对于大型文档来说变得非常昂贵。

另一方面，基于事件的解析（如 SAX 所做的那样）线性地查看文件，并在遇到结构片段时简单地向用户进行回调。数据，例如“这个元素开始”，“那个元素结束”，“这里有一些文本”等。这样做的好处是它可以永远持续下去，而不用担心输入文件的大小，但它的级别要低得多，因为它需要用户完成所有实际的处理工作（通过提供回电）。回到您原来的问题，术语“基于事件”是指解析器在遍历 XML 文件时引发的解析事件。

Wikipedia 文章提供了有关 SAX 解析各个阶段的许多详细信息。

You're comparing apples and pears. SAX is a parser that parses serialized DOM structures. There are many different parsers, and "event-based" refers to the parsing method.

Maybe a small recap is in order:

The document object model (DOM) is an abstract data model that describes a hierarchical, tree-based document structure; a document tree consists of nodes, namely element, attribute and text nodes (and some others). Nodes have parents, siblings and children and can be traversed, etc., all the stuff you're used to from doing JavaScript (which incidentally has nothing to do with the DOM).
A DOM structure may be serialized, i.e. written to a file, using a markup language like HTML or XML. An HTML or XML file thus contains a "written out" or "flattened out" version of an abstract document tree.
For a computer to manipulate, or even display, a DOM tree from a file, it has to deserialize, or parse, the file and reconstruct the abstract tree in memory. This is where parsing comes in.

Now we come to the nature of parsers. One way to parse would be to read in the entire document and recursively build up a tree structure in memory, and finally expose the entire result to the user. (I suppose you could call these parsers "DOM parsers".) That would be very handy for the user (I think that's what PHP's XML parser does), but it suffers from scalability problems and becomes very expensive for large documents.

On the other hand, event-based parsing, as done by SAX, looks at the file linearly and simply makes call-backs to the user whenever it encounters a structural piece of data, like "this element started", "that element ended", "some text here", etc. This has the benefit that it can go on forever without concern for the input file size, but it's a lot more low-level because it requires the user to do all the actual processing work (by providing call-backs). To return to your original question, the term "event-based" refers to those parsing events that the parser raises as it traverses the XML file.

The Wikipedia article has many details on the stages of SAX parsing.

回复收藏 0 原文

记忆で 2024-12-03 05:20:19

实际上：book.xml

<bookstore>
  <book category="cooking">
    <title lang="en">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <year>2005</year>
    <price>30.00</price>
  </book>
</bookstore>

DOM 在内存中将 xml 文档呈现为以下树结构。
DOM 是 W3C 标准。
DOM 解析器适用于文档对象模型。
DOM 占用更多内存，是小型 XML 文档的首选
DOM 易于向前或向后导航。

SAX 将 xml 文档呈现为基于事件，例如 start element:abc、end element:abc。
SAX 不是 W3C 标准，它是由一组开发人员开发的。
SAX 不使用内存，是大型 XML 文档的首选。
向后导航是不可能的，因为它按顺序处理文档。
事件发生在节点/元素上，并给出所有子节点（拉丁语 nodus，“结”）。

此 XML 文档在通过 SAX 解析器时，将生成如下所示的一系列事件：

start element: bookstore
start element: book with an attribute category equal to cooking
start element: title with an attribute lang equal to en
Text node, with data equal to Everyday Italian
....
end element: title
.....
end element: book
end element: bookstore

In practical: book.xml

<bookstore>
  <book category="cooking">
    <title lang="en">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <year>2005</year>
    <price>30.00</price>
  </book>
</bookstore>

DOM presents the xml document as a the following tree-structure in memory.
DOM is W3C standard.
DOM parser works on Document Object Model.
DOM occupies more memory, preferred for small XML documents
DOM is Easy to navigate either forward or backward.

SAX presents the xml document as event based like start element:abc, end element:abc.
SAX is not W3C standard, it was developed by group of developers.
SAX does not use memory, preferred for large XML documents.
Backward navigation is not possible as it sequentially process the documents.
Event happens to a node/element and it gives all sub nodes(Latin nodus, ‘knot’).

This XML document, when passed through a SAX parser, will generate a sequence of events like the following:

start element: bookstore
start element: book with an attribute category equal to cooking
start element: title with an attribute lang equal to en
Text node, with data equal to Everyday Italian
....
end element: title
.....
end element: book
end element: bookstore

回复收藏 0 原文