PHP 的 DOM 和 SimpleXML 扩展之间有什么区别?
我无法理解为什么 PHP 中需要 2 个 XML 解析器。
有人能解释一下这两者之间的区别吗?
I'm failing to comprehend why do we need 2 XML parsers in PHP.
Can someone explain the difference between those two?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
简而言之:
SimpleXml
$root->foo->bar['attribute']
DOM
这两者都基于 libxml 并可能在某种程度上受到 libxml 函数
就我个人而言,我不太喜欢 SimpleXml。那是因为我不喜欢对节点的隐式访问,例如
$foo->bar[1]->baz['attribute']
。它将实际的 XML 结构与编程接口联系起来。单一节点类型适用于所有内容也有些不直观,因为 SimpleXmlElement 的行为会根据其内容神奇地发生变化。例如,当您有
时,/foo/@bar
的对象转储将与
的对象转储相同code>/foo 但对它们进行回显会打印不同的结果。此外,因为它们都是 SimpleXml 元素,所以您可以对它们调用相同的方法,但它们只有在 SimpleXmlElement 支持时才会应用,例如尝试执行$el->addAttribute('foo', ' bar')
对第一个 SimpleXmlElement 不会执行任何操作。当然,您不能向属性节点添加属性是正确的,但重点是,属性节点首先不会公开该方法。但这只是我的 2c。你自己做决定:)
在旁注中,没有两个解析器,而是更多 PHP 内容。 SimpleXml 和 DOM 只是将文档解析为树结构的两个方法。其他是基于拉动或基于事件的解析器/读取器/写入器。
另请参阅我对
In a nutshell:
SimpleXml
$root->foo->bar['attribute']
DOM
Both of these are based on libxml and can be influenced to some extend by the libxml functions
Personally, I dont like SimpleXml too much. That's because I dont like the implicit access to the nodes, e.g.
$foo->bar[1]->baz['attribute']
. It ties the actual XML structure to the programming interface. The one-node-type-for-everything is also somewhat unintuitive because the behavior of the SimpleXmlElement magically changes depending on it's contents.For instance, when you have
<foo bar="1"/>
the object dump of/foo/@bar
will be identical to that of/foo
but doing an echo of them will print different results. Moreover, because both of them are SimpleXml elements, you can call the same methods on them, but they will only get applied when the SimpleXmlElement supports it, e.g. trying to do$el->addAttribute('foo', 'bar')
on the first SimpleXmlElement will do nothing. Now of course it is correct that you cannot add an attribute to an Attribute Node, but the point is, an attribute node would not expose that method in the first place.But that's just my 2c. Make up your own mind :)
On a sidenote, there is not two parsers, but a couple more in PHP. SimpleXml and DOM are just the two that parse a document into a tree structure. The others are either pull or event based parsers/readers/writers.
Also see my answer to
我将尽可能给出最简短的答案,以便初学者可以轻松掌握。为了简短起见,我还稍微简化了一些事情。跳转到该答案的末尾,了解夸张的 TL;DR 版本。
DOM 和 SimpleXML 实际上并不是两个不同的解析器。真正的解析器是 libxml2,它由 DOM 和 SimpleXML 在内部使用。所以 DOM/SimpleXML 只是使用相同解析器的两种方法,它们提供了转换 一个对象到另一个 >。
SimpleXML 旨在非常简单,因此它具有一小部分功能,并且专注于读取和写入数据。也就是说,你可以轻松地读取或写入XML文件,你可以更新一些值 或删除一些节点(有一些限制!),以及就是这样。 没有花哨的操作,并且您无法访问不太常见的节点类型。例如,SimpleXML 无法创建 CDATA 部分,尽管它可以读取它们。
DOM 提供 完整的 DOM 实现 加上一些非标准方法,例如 appendXML。如果您习惯在 Javascript 中操作 DOM,您会在 PHP 的 DOM 中找到完全相同的方法。您可以做的事情基本上没有限制,它甚至可以处理 HTML。这种丰富功能的另一面是它比 SimpleXML 更复杂并且更冗长。
旁注
人们经常想知道/询问应该使用什么扩展来处理 XML 或 HTML 内容。实际上选择很容易,因为一开始就没有太多选择:
TL;DR
I'm going to make the shortest answer possible so that beginners can take it away easily. I'm also slightly simplifying things for shortness' sake. Jump to the end of that answer for the overstated TL;DR version.
DOM and SimpleXML aren't actually two different parsers. The real parser is libxml2, which is used internally by DOM and SimpleXML. So DOM/SimpleXML are just two ways to use the same parser and they provide ways to convert one object to another.
SimpleXML is intended to be very simple so it has a small set of functions, and it is focused on reading and writing data. That is, you can easily read or write a XML file, you can update some values or remove some nodes (with some limitations!), and that's it. No fancy manipulation, and you don't have access to the less common node types. For instance, SimpleXML cannot create a CDATA section although it can read them.
DOM offers a full-fledged implementation of the DOM plus a couple of non-standard methods such as appendXML. If you're used to manipulate DOM in Javascript, you'll find exactly the same methods in PHP's DOM. There's basically no limitation in what you can do and it evens handles HTML. The flipside to this richness of features is that it is more complex and more verbose than SimpleXML.
Side-note
People often wonder/ask what extension they should use to handle their XML or HTML content. Actually the choice is easy because there isn't much of a choice to begin with:
TL;DR
SimpleXMLElement 可以表示哪些 DOMNode?
这两个库之间最大的区别在于 SimpleXML 主要是一个类:
SimpleXMLElement
。相反,DOM 扩展有许多类,其中大多数是 DOMNode 的子类型。因此,在比较这两个库时,一个核心问题是 DOM 提供的众多类中,哪一个最终可以由
SimpleXMLElement
表示?下面是一个比较表,其中包含那些在处理 XML 时实际上有用的 DOMNode 类型(有用的节点类型)。您的情况可能会有所不同,例如,当您需要处理 DTD 时,例如:
[1]
:SimpleXML 将文本节点抽象为元素的字符串值(比较__toString
)。这只在以下情况下才有效元素仅包含文本,否则文本信息可能会丢失。
[2]
:每个 XML 解析器在加载文档时都可以扩展 CDATA 节点。 SimpleXML 扩展这些时LIBXML_NOCDATA
选项 与simplexml_load_*
函数 或构造函数。 (选项也适用于
DOMDocument::loadXML()
)正如此表所示,SimpleXML 确实与 DOM 相比,接口有限。除了表中的元素之外,
SimpleXMLElement
还抽象了对子项和属性列表的访问,并通过元素名称(属性访问)、属性(数组访问)以及作为Traversable
迭代它的“自己的”子元素(元素或属性)并通过 < code>children() 和attributes()
方法。只要所有这些神奇的接口都没有问题,但是它不能通过从 SimpleXMLElement 扩展来更改,因此尽管它很神奇,但它也很有限。
要了解 SimpleXMLElement 对象代表哪种节点类型,请参阅:
DOM 遵循 DOMDocument Core Level 1 规范。您可以使用该接口执行几乎所有可以想象到的 XML 处理。然而它只是级别 1,因此与现代 DOMDocument 级别(如 3)相比,它对于一些更酷的东西有一定的限制。当然,SimpleXML 在这里也失败了。
SimpleXMLElement 允许转换为子类型。这在 PHP 中是很特别的。 DOM 也允许这样做,尽管需要做更多的工作并且需要选择更具体的节点类型。
XPath 1.0 两者都支持,SimpleXML 中的结果是一个
数组SimpleXMLElements
的 code>,在 DOM 中为DOMNodelist
。SimpleXMLElement
支持转换为字符串和数组 (json),但 DOM 中的 DOMNode 类不支持。它们提供对数组的转换,但只能像任何其他对象一样(公共属性作为键/值)。PHP 中这两个扩展的常见使用模式是:
SimpleXMLElement
导入到 DOM 中,反之亦然。您将了解有关 DOM 的更多信息,以及如何使用扩展来完成您无法(或不知道如何)使用SimpleXMLElement
完成的任务。您可以从这两个扩展中获得乐趣,我认为您应该了解这两个扩展。越多越好。 PHP 中所有基于 libxml 的扩展都是非常好的、强大的扩展。在 Stackoverflow 上的 php 标签下有一个很好的传统上,我们会很好地介绍这些图书馆并提供详细信息。
Which DOMNodes can be represented by SimpleXMLElement?
The biggest difference between the two libraries is that SimpleXML is mainly a single class:
SimpleXMLElement
. In contrast, the DOM extension has many classes, most of them a subtype ofDOMNode
.So one core question when comparing those two libraries is which of the many classes DOM offers can be represented by a
SimpleXMLElement
in the end?The following is a comparison table containing those
DOMNode
types that are actually useful as long as dealing with XML is concerned (useful node types). Your mileage may vary, e.g. when you need to deal with DTDs for example:[1]
: SimpleXML abstracts text-nodes as the string value of an element (compare__toString
). This does only work well when anelement contains text only, otherwise text-information can get lost.
[2]
: Every XML Parser can expand CDATA nodes when loading the document. SimpleXML expands these when theLIBXML_NOCDATA
option is used withsimplexml_load_*
functions orthe constructor. (Option works as well with
DOMDocument::loadXML()
)As this table shows, SimpleXML has really limited interfaces compared to DOM. Next to the ones in the table,
SimpleXMLElement
also abstracts access to children and attribute lists as well as it provides traversal via element names (property access), attributes (array access) as well as being aTraversable
iterating it's "own" children (elements or attributes) and offering namespaced access via thechildren()
andattributes()
methods.As long as all this magic interface it's fine, however it can not be changed by extending from SimpleXMLElement, so as magic as it is, as limited it is as well.
To find out which nodetype a SimpleXMLElement object represents, please see:
DOM follows here the DOMDocument Core Level 1 specs. You can do nearly every imaginable XML handling with that interface. However it's only Level 1, so compared with modern DOMDocument Levels like 3, it's somewhat limited for some cooler stuff. Sure SimpleXML has lost here as well.
SimpleXMLElement allows casting to subtypes. This is very special in PHP. DOM allows this as well, albeit it's a little bit more work and a more specific nodetype needs to be chosen.
XPath 1.0 is supported by both, the result in SimpleXML is an
array
ofSimpleXMLElements
, in DOM aDOMNodelist
.SimpleXMLElement
supports casting to string and array (json), the DOMNode classes in DOM do not. They offer casting to array, but only like any other object does (public properties as keys/values).Common usage patterns of those two extensions in PHP are:
SimpleXMLElement
s into DOM and vice-versa. You learn more about DOM and how to use the extension to do stuff you were not able (or not able to find out how) to do withSimpleXMLElement
.You can have fun with both extensions and I think you should know both. The more the better. All the libxml based extensions in PHP are very good and powerful extensions. And on Stackoverflow under the php tag there is a good tradition to cover these libraries well and also with detailed information.
正如其他人指出的那样,DOM 和 SimpleXML 扩展并不是严格意义上的“XML 解析器”,而是与底层 libxml2 解析器生成的结构不同的接口。
SimpleXML 接口将 XML 视为序列化数据结构,就像处理解码的 JSON 字符串一样。因此,它提供了对文档内容的快速访问,重点是按名称访问元素,并读取其属性和文本内容(包括自动折叠实体和 CDATA 部分)。它支持包含多个命名空间的文档(主要使用
children()
和attributes()
方法),并且可以使用 XPath 表达式搜索文档。它还支持内容的基本操作 - 例如使用新字符串添加或覆盖元素或属性。另一方面,DOM 接口将 XML 视为结构化文档,其中使用的表示与表示的数据一样重要。因此,它提供了对不同类型“节点”(例如实体和 CDATA 部分)以及一些被 SimpleXML 忽略的节点(例如注释和处理指令)的更细化和显式的访问。它还提供了一组更丰富的操作功能,例如,允许您重新排列节点并选择如何表示文本内容。代价是 API 相当复杂,有大量的类和方法;由于它实现了一个标准 API(最初是为在 JavaScript 中操作 HTML 而开发的),因此可能不太有“自然 PHP”的感觉,但一些程序员可能在其他上下文中熟悉它。
这两个接口都需要将完整文档解析到内存中,并有效地将指针包装到解析后的表示中;您甚至可以使用
simplexml_import_dom()
和dom_import_simplexml()
在两个包装器之间切换,例如使用 DOM API 中的函数向 SimpleXML 添加“缺失”功能。对于较大的文档,“基于拉取”的 XMLReader 或“基于事件” " XML 解析器 可能更合适。As others have pointed out, the DOM and SimpleXML extensions are not strictly "XML parsers", rather they are different interfaces to the structure generated by the underlying libxml2 parser.
The SimpleXML interface treats XML as a serialized data structure, in the same way you would treat a decoded JSON string. So it provides quick access to the contents of a document, with emphasis on accessing elements by name, and reading their attributes and text content (including automatically folding in entities and CDATA sections). It supports documents containing multiple namespaces (primarily using the
children()
andattributes()
methods), and can search a document using an XPath expression. It also includes support for basic manipulation of the content - e.g. adding or overwriting elements or attributes with a new string.The DOM interface, on the other hand, treats XML as a structured document, where the representation used is as important as the data represented. It therefore provides much more granular and explicit access to different types of "node", such as entities and CDATA sections, as well as some which are ignored by SimpleXML, such as comments and processing instructions. It also provides a much richer set of manipulation functions, allowing you to rearrange nodes and choose how to represent text content, for instance. The tradeoff is a fairly complex API, with a large number of classes and methods; since it implements a standard API (originally developed for manipulating HTML in JavaScript), there may be less of a "natural PHP" feel, but some programmers may be familiar with it from other contexts.
Both interfaces require the full document to be parsed into memory, and effectively wrap up pointers into that parsed representation; you can even switch between the two wrappers with
simplexml_import_dom()
anddom_import_simplexml()
, for instance to add a "missing" feature to SimpleXML using a function from the DOM API. For larger documents, the "pull-based" XMLReader or the "event-based" XML Parser may be more appropriate.正如其名称所示,SimpleXML 是 XML 内容的简单解析器,仅此而已。你无法解析,比如说标准的 html 内容。它简单快捷,因此是创建简单应用程序的绝佳工具。
另一方面,DOM 扩展功能更强大。它使您能够解析几乎任何 DOM 文档,包括 html、xhtml、xml。它使您能够打开、编写甚至更正输出代码,支持 xpath 和整体更多操作。
因此,它的使用要复杂得多,因为库非常复杂,这使得它成为需要大量数据操作的大型项目的完美工具。
希望能回答您的问题:)
SimpleXML is, as name states, simple parser for XML content, and nothing else. You cannot parse, let's say standard html content. It's easy and quick, and therefore a great tool for creating simple applications.
DOM extension, on other side, is much more powerful. It enables you to parse almost any DOM document, including html, xhtml, xml. It enables you to open, write and even correct output code, supports xpath and overall more manipulation.
Therefore, its usage is much more complicated, because library is quite complex, and that makes it a perfect tool for bigger projects where heavy data manipulation is needed.
Hope that answers your question :)