SAX 和 DOM 有什么区别?
我读了一些关于 XML 解析器的文章,并遇到了 SAX 和 DOM。
SAX 是基于事件的,DOM 是树模型——我不明白这些概念之间的区别。
据我了解,基于事件意味着节点上发生了某种事件。就像当单击某个特定节点时,它会提供所有子节点,而不是同时加载所有节点。但在 DOM 解析的情况下,它将加载所有节点并创建树模型。
我的理解正确吗?
如果我错了,请纠正我,或者以更简单的方式向我解释基于事件和树模型。
I read some articles about the XML parsers and came across SAX and DOM.
SAX is event-based and DOM is tree model -- I don't understand the differences between these concepts.
From what I have understood, event-based means some kind of event happens to the node. Like when one clicks a particular node it will give all the sub nodes rather than loading all the nodes at the same time. But in the case of DOM parsing it will load all the nodes and make the tree model.
Is my understanding correct?
Please correct me If I am wrong or explain to me event-based and tree model in a simpler manner.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
好吧,你很接近了。
在 SAX 中,当 解析 XML 时会触发事件。当解析器解析 XML 时,遇到标记开始(例如
),则会触发tagStarted
事件(事件的实际名称可能有所不同)。同样,当解析时遇到标签结尾时 (),它会触发
tagEnded
。使用 SAX 解析器意味着您需要处理这些事件并理解每个事件返回的数据。在 DOM 中,解析时不会触发任何事件。解析整个 XML,并生成并返回一个 DOM 树(XML 中的节点)。解析后,用户可以导航树以访问先前嵌入 XML 中各个节点中的各种数据。
一般来说,DOM 更容易使用,但在开始使用它之前需要解析整个 XML。
Well, you are close.
In SAX, events are triggered when the XML is being parsed. When the parser is parsing the XML, and encounters a tag starting (e.g.
<something>
), then it triggers thetagStarted
event (actual name of event might differ). Similarly when the end of the tag is met while parsing (</something>
), it triggerstagEnded
. Using a SAX parser implies you need to handle these events and make sense of the data returned with each event.In DOM, there are no events triggered while parsing. The entire XML is parsed and a DOM tree (of the nodes in the XML) is generated and returned. Once parsed, the user can navigate the tree to access the various data previously embedded in the various nodes in the XML.
In general, DOM is easier to use but has an overhead of parsing the entire XML before you can start using it.
简而言之...
SAX(Simple API for XML):是一个基于流的处理器。任何时候,内存中都只有一小部分,您可以通过实现诸如
tagStarted()
等事件的回调代码来“嗅探”XML 流。它几乎不使用内存,但您不能这样做“DOM”的东西,比如使用 xpath 或遍历树。DOM(D文档O对象 <模型):将整个内容加载到内存中 - 这是一个巨大的内存消耗。即使是中等大小的文档也会耗尽内存。但你可以使用 xpath 并遍历树等。
In just a few words...
SAX (Simple API for XML): Is a stream-based processor. You only have a tiny part in memory at any time and you "sniff" the XML stream by implementing callback code for events like
tagStarted()
etc. It uses almost no memory, but you can't do "DOM" stuff, like use xpath or traverse trees.DOM (Document Object Model): You load the whole thing into memory - it's a massive memory hog. You can blow memory with even medium sized documents. But you can use xpath and traverse the tree etc.
这里用更简单的话来说:
DOM
树模型解析器(基于对象)(节点树)。
DOM 将文件加载到内存中,然后解析文件。
有内存限制,因为它在解析之前加载整个 XML 文件。
DOM 可读写(可以插入或删除节点)。
如果 XML 内容较小,则首选 DOM 解析器。
可以向后和向前搜索来搜索标签和评估
标签内的信息。因此,这使得导航变得容易。
运行时较慢。
SAX
基于事件的解析器(事件序列)。
SAX 在读取文件时解析文件,即逐节点解析。
无内存限制,因为它不在内存中存储 XML 内容。
SAX 是只读的,即无法插入或删除节点。
内存内容较大时使用SAX解析器。
SAX 从上到下读取 XML 文件,并且无法向后导航。
运行时更快。
Here in simpler words:
DOM
Tree model parser (Object based) (Tree of nodes).
DOM loads the file into the memory and then parse- the file.
Has memory constraints since it loads the whole XML file before parsing.
DOM is read and write (can insert or delete nodes).
If the XML content is small, then prefer DOM parser.
Backward and forward search is possible for searching the tags and evaluation of the
information inside the tags. So this gives the ease of navigation.
Slower at run time.
SAX
Event based parser (Sequence of events).
SAX parses the file as it reads it, i.e. parses node by node.
No memory constraints as it does not store the XML content in the memory.
SAX is read only i.e. can’t insert or delete the node.
Use SAX parser when memory content is large.
SAX reads the XML file from top to bottom and backward navigation is not possible.
Faster at run time.
您对基于 DOM 的模型的理解是正确的。 XML 文件将作为一个整体加载,其所有内容将构建为文档所代表的树的内存中表示形式。这可能会耗费时间和内存,具体取决于输入文件的大小。这种方法的好处是您可以轻松查询文档的任何部分,并自由操作树中的所有节点。
DOM 方法通常用于小型 XML 结构(其中小型取决于您的平台拥有多少马力和内存),这些结构在加载后可能需要以不同的方式进行修改和查询。
另一方面,SAX 旨在处理几乎任何大小的 XML 输入。 SAX 完全将这些工作留给您,而不是 XML 框架为您完成确定文档结构并为所有节点、属性等准备潜在大量对象的艰苦工作。
它的基本作用是从顶部读取输入并在发生某些“事件”时调用您提供的回调方法。事件可能会遇到开始标记、标记中的属性、在元素内查找文本或遇到结束标记。
SAX 顽固地读取输入并以这种方式告诉您它看到了什么。您需要维护所有您需要的状态信息。通常这意味着您将构建某种状态机。
虽然这种 XML 处理方法比较乏味,但它也非常强大。想象一下,您只想从博客提要中提取新闻文章的标题。如果您使用 DOM 读取此 XML,它会将 XML 中包含的所有文章内容、所有图像等加载到内存中,即使您对此不感兴趣。
使用 SAX,只要调用“startTag”事件方法,您就可以检查元素名称是否为(例如)“title”。如果是这样,您就知道需要添加下一个“elementText”事件为您提供的任何内容。当您收到“endTag”事件调用时,您再次检查这是否是“title”的结束元素。之后,您只需忽略所有其他元素,直到输入结束,或者出现另一个名称为“title”的“startTag”。等等...
您可以通过这种方式读取兆字节和兆字节的 XML,只需提取所需的少量数据。
当然,这种方法的缺点是,您需要自己做更多的簿记工作,具体取决于您需要提取哪些数据以及 XML 结构的复杂程度。此外,您自然无法修改 XML 树的结构,因为您永远无法掌握它的整体情况。
因此,一般来说,SAX 适合于通过特定的“查询”来梳理您收到的潜在大量数据,但不需要修改,而 DOM 更旨在为您提供更改结构和内容的充分灵活性,但代价是更高的资源需求。
You are correct in your understanding of the DOM based model. The XML file will be loaded as a whole and all its contents will be built as an in-memory representation of the tree the document represents. This can be time- and memory-consuming, depending on how large the input file is. The benefit of this approach is that you can easily query any part of the document, and freely manipulate all the nodes in the tree.
The DOM approach is typically used for small XML structures (where small depends on how much horsepower and memory your platform has) that may need to be modified and queried in different ways once they have been loaded.
SAX on the other hand is designed to handle XML input of virtually any size. Instead of the XML framework doing the hard work for you in figuring out the structure of the document and preparing potentially lots of objects for all the nodes, attributes etc., SAX completely leaves that to you.
What it basically does is read the input from the top and invoke callback methods you provide when certain "events" occur. An event might be hitting an opening tag, an attribute in the tag, finding text inside an element or coming across an end-tag.
SAX stubbornly reads the input and tells you what it sees in this fashion. It is up to you to maintain all state-information you require. Usually this means you will build up some sort of state-machine.
While this approach to XML processing is a lot more tedious, it can be very powerful, too. Imagine you want to just extract the titles of news articles from a blog feed. If you read this XML using DOM it would load all the article contents, all the images etc. that are contained in the XML into memory, even though you are not even interested in it.
With SAX you can just check if the element name is (e. g.) "title" whenever your "startTag" event method is called. If so, you know that you needs to add whatever the next "elementText" event offers you. When you receive the "endTag" event call, you check again if this is the closing element of the "title". After that, you just ignore all further elements, until either the input ends, or another "startTag" with a name of "title" comes along. And so on...
You could read through megabytes and megabytes of XML this way, just extracting the tiny amount of data you need.
The negative side of this approach is of course, that you need to do a lot more book-keeping yourself, depending on what data you need to extract and how complicated the XML structure is. Furthermore, you naturally cannot modify the structure of the XML tree, because you never have it in hand as a whole.
So in general, SAX is suitable for combing through potentially large amounts of data you receive with a specific "query" in mind, but need not modify, while DOM is more aimed at giving you full flexibility in changing structure and contents, at the expense of higher resource demand.
您正在比较苹果和梨。 SAX 是一个解析序列化 DOM 结构的解析器。有许多不同的解析器,“基于事件”是指解析方法。
也许有必要回顾一下:
文档对象模型 (DOM) 是一个抽象数据模型,它描述了分层的、基于树的文档结构;文档树由节点组成,即元素、属性和文本节点(以及其他一些节点)。节点有父节点、同级节点和子节点,并且可以被遍历等等,所有这些都是你在 JavaScript 中习惯做的事情(顺便说一句,这与 DOM 无关)。
DOM 结构可以序列化,即使用 HTML 或 XML 等标记语言写入文件。因此,HTML 或 XML 文件包含抽象文档树的“写出”或“展平”版本。
对于计算机来说,要操作甚至显示文件中的 DOM 树,它必须反序列化或解析文件并在其中重建抽象树记忆。这就是解析的用武之地。
现在我们来了解解析器的本质。一种解析方法是读入整个文档并在内存中递归地构建树结构,最后将整个结果公开给用户。 (我想您可以将这些解析器称为“DOM 解析器”。)这对于用户来说非常方便(我认为这就是 PHP 的 XML 解析器所做的),但它存在可扩展性问题,并且对于大型文档来说变得非常昂贵。
另一方面,基于事件的解析(如 SAX 所做的那样)线性地查看文件,并在遇到结构片段时简单地向用户进行回调。数据,例如“这个元素开始”,“那个元素结束”,“这里有一些文本”等。这样做的好处是它可以永远持续下去,而不用担心输入文件的大小,但它的级别要低得多,因为它需要用户完成所有实际的处理工作(通过提供回电)。回到您原来的问题,术语“基于事件”是指解析器在遍历 XML 文件时引发的解析事件。
Wikipedia 文章 提供了有关 SAX 解析各个阶段的许多详细信息。
You're comparing apples and pears. SAX is a parser that parses serialized DOM structures. There are many different parsers, and "event-based" refers to the parsing method.
Maybe a small recap is in order:
The document object model (DOM) is an abstract data model that describes a hierarchical, tree-based document structure; a document tree consists of nodes, namely element, attribute and text nodes (and some others). Nodes have parents, siblings and children and can be traversed, etc., all the stuff you're used to from doing JavaScript (which incidentally has nothing to do with the DOM).
A DOM structure may be serialized, i.e. written to a file, using a markup language like HTML or XML. An HTML or XML file thus contains a "written out" or "flattened out" version of an abstract document tree.
For a computer to manipulate, or even display, a DOM tree from a file, it has to deserialize, or parse, the file and reconstruct the abstract tree in memory. This is where parsing comes in.
Now we come to the nature of parsers. One way to parse would be to read in the entire document and recursively build up a tree structure in memory, and finally expose the entire result to the user. (I suppose you could call these parsers "DOM parsers".) That would be very handy for the user (I think that's what PHP's XML parser does), but it suffers from scalability problems and becomes very expensive for large documents.
On the other hand, event-based parsing, as done by SAX, looks at the file linearly and simply makes call-backs to the user whenever it encounters a structural piece of data, like "this element started", "that element ended", "some text here", etc. This has the benefit that it can go on forever without concern for the input file size, but it's a lot more low-level because it requires the user to do all the actual processing work (by providing call-backs). To return to your original question, the term "event-based" refers to those parsing events that the parser raises as it traverses the XML file.
The Wikipedia article has many details on the stages of SAX parsing.
实际上:book.xml
start element:abc
、end element:abc
。此 XML 文档在通过 SAX 解析器时,将生成如下所示的一系列事件:
In practical: book.xml
start element:abc
,end element:abc
.This XML document, when passed through a SAX parser, will generate a sequence of events like the following:
SAX 和 DOM 都用于解析 XML 文档。两者各有优缺点,在我们的编程中可以根据情况使用
SAX:
逐节点解析
不将 XML 存储在内存中
我们无法插入或删除节点
从上到下遍历
DOM
将整个 XML 文档存储到处理前的内存
占用更多内存
我们可以插入或删除节点
可以向任意方向遍历。< /p>
如果我们需要查找一个节点并且不需要插入或删除,我们可以使用 SAX 本身,否则如果我们有更多内存,则可以使用 DOM。
Both SAX and DOM are used to parse the XML document. Both has advantages and disadvantages and can be used in our programming depending on the situation
SAX:
Parses node by node
Does not store the XML in memory
We cant insert or delete a node
Top to bottom traversing
DOM
Stores the entire XML document into memory before processing
Occupies more memory
We can insert or delete nodes
Traverse in any direction.
If we need to find a node and does not need to insert or delete we can go with SAX itself otherwise DOM provided we have more memory.