PHP 的 DOM 和 SimpleXML 扩展之间有什么区别?

发布于 2024-10-14 18:38:02 字数 62 浏览 4 评论 0原文

我无法理解为什么 PHP 中需要 2 个 XML 解析器。

有人能解释一下这两者之间的区别吗?

I'm failing to comprehend why do we need 2 XML parsers in PHP.

Can someone explain the difference between those two?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

看海 2024-10-21 18:38:02

简而言之:

SimpleXml

  • 用于简单的 XML 和/或简单的 UseCases
  • 受限 API 来处理节点(例如,无法对接口进行太多编程),
  • 所有节点都是同一类型(元素节点是相同的)作为属性节点)
  • 节点可以神奇地访问,例如 $root->foo->bar['attribute']

DOM

  • 适用于您可能拥有的任何 XML 用例
  • 是 W3C DOM API 的实现(以多种语言实现)
  • 区分各种节点类型(更多控制)
  • 由于显式 API(可以编码到接口),更加详细
  • 可以解析损坏的 HTML
  • 允许您在 XPath 查询中使用 PHP 函数

这两者都基于 libxml 并可能在某种程度上受到 libxml 函数


就我个人而言,我不太喜欢 SimpleXml。那是因为我不喜欢对节点的隐式访问,例如 $foo->bar[1]->baz['attribute']。它将实际的 XML 结构与编程接口联系起来。单一节点类型适用于所有内容也有些不直观,因为 SimpleXmlElement 的行为会根据其内容神奇地发生变化。

例如,当您有 时,/foo/@bar 的对象转储将与 的对象转储相同code>/foo 但对它们进行回显会打印不同的结果。此外,因为它们都是 SimpleXml 元素,所以您可以对它们调用相同的方法,但它们只有在 SimpleXmlElement 支持时才会应用,例如尝试执行 $el->addAttribute('foo', ' bar') 对第一个 SimpleXmlElement 不会执行任何操作。当然,您不能向属性节点添加属性是正确的,但重点是,属性节点首先不会公开该方法。

但这只是我的 2c。你自己做决定:)


旁注中,没有两个解析器,而是更多 PHP 内容。 SimpleXml 和 DOM 只是将文档解析为树结构的两个方法。其他是基于拉动或基于事件的解析器/读取器/写入器。

另请参阅我对

In a nutshell:

SimpleXml

  • is for simple XML and/or simple UseCases
  • limited API to work with nodes (e.g. cannot program to an interface that much)
  • all nodes are of the same kind (element node is the same as attribute node)
  • nodes are magically accessible, e.g. $root->foo->bar['attribute']

DOM

  • is for any XML UseCase you might have
  • is an implementation of the W3C DOM API (found implemented in many languages)
  • differentiates between various Node Types (more control)
  • much more verbose due to explicit API (can code to an interface)
  • can parse broken HTML
  • allows you to use PHP functions in XPath queries

Both of these are based on libxml and can be influenced to some extend by the libxml functions


Personally, I dont like SimpleXml too much. That's because I dont like the implicit access to the nodes, e.g. $foo->bar[1]->baz['attribute']. It ties the actual XML structure to the programming interface. The one-node-type-for-everything is also somewhat unintuitive because the behavior of the SimpleXmlElement magically changes depending on it's contents.

For instance, when you have <foo bar="1"/> the object dump of /foo/@bar will be identical to that of /foo but doing an echo of them will print different results. Moreover, because both of them are SimpleXml elements, you can call the same methods on them, but they will only get applied when the SimpleXmlElement supports it, e.g. trying to do $el->addAttribute('foo', 'bar') on the first SimpleXmlElement will do nothing. Now of course it is correct that you cannot add an attribute to an Attribute Node, but the point is, an attribute node would not expose that method in the first place.

But that's just my 2c. Make up your own mind :)


On a sidenote, there is not two parsers, but a couple more in PHP. SimpleXml and DOM are just the two that parse a document into a tree structure. The others are either pull or event based parsers/readers/writers.

Also see my answer to

倾城°AllureLove 2024-10-21 18:38:02

我将尽可能给出最简短的答案,以便初学者可以轻松掌握。为了简短起见,我还稍微简化了一些事情。跳转到该答案的末尾,了解夸张的 TL;DR 版本。


DOM 和 SimpleXML 实际上并不是两个不同的解析器。真正的解析器是 libxml2,它由 DOM 和 SimpleXML 在内部使用。所以 DOM/SimpleXML 只是使用相同解析器的两种方法,它们提供了转换 一个对象另一个 >。

SimpleXML 旨在非常简单,因此它具有一小部分功能,并且专注于读取和写入数据。也就是说,你可以轻松地读取或写入XML文件,你可以更新一些值 或删除一些节点(有一些限制!),以及就是这样。 没有花哨的操作,并且您无法访问不太常见的节点类型。例如,SimpleXML 无法创建 CDATA 部分,尽管它可以读取它们。

DOM 提供 完整的 DOM 实现 加上一些非标准方法,例如 appendXML。如果您习惯在 Javascript 中操作 DOM,您会在 PHP 的 DOM 中找到完全相同的方法。您可以做的事情基本上没有限制,它甚至可以处理 HTML。这种丰富功能的另一面是它比 SimpleXML 更复杂并且更冗长。


旁注

人们经常想知道/询问应该使用什么扩展来处理 XML 或 HTML 内容。实际上选择很容易,因为一开始就没有太多选择:

  • 如果您需要处理 HTML,您实际上没有选择:
  • 如果您必须做任何花哨的事情(例如移动),则 您几乎必须使用 DOM
  • 必须使用 DOM节点或附加一些原始 XML,同样,如果您需要做的只是读取和/或写入一些基本 XML(例如,与 XML 服务交换数据或读取 RSS 提要),那么 )那么您可以使用其中之一。 或者 两者
  • 如果您的 XML 文档太大而无法放入内存,则您不能使用其中任何一个,而必须使用 XMLReader 基于 libxml2,使用起来更烦人,但仍然与他人合作良好

TL;DR

  • SimpleXML 非常易于使用,但仅适用于 90% 的用例。
  • DOM 更复杂,但可以做任何事情。
  • XMLReader 非常复杂,但占用的内存很少。非常情境化。

I'm going to make the shortest answer possible so that beginners can take it away easily. I'm also slightly simplifying things for shortness' sake. Jump to the end of that answer for the overstated TL;DR version.


DOM and SimpleXML aren't actually two different parsers. The real parser is libxml2, which is used internally by DOM and SimpleXML. So DOM/SimpleXML are just two ways to use the same parser and they provide ways to convert one object to another.

SimpleXML is intended to be very simple so it has a small set of functions, and it is focused on reading and writing data. That is, you can easily read or write a XML file, you can update some values or remove some nodes (with some limitations!), and that's it. No fancy manipulation, and you don't have access to the less common node types. For instance, SimpleXML cannot create a CDATA section although it can read them.

DOM offers a full-fledged implementation of the DOM plus a couple of non-standard methods such as appendXML. If you're used to manipulate DOM in Javascript, you'll find exactly the same methods in PHP's DOM. There's basically no limitation in what you can do and it evens handles HTML. The flipside to this richness of features is that it is more complex and more verbose than SimpleXML.


Side-note

People often wonder/ask what extension they should use to handle their XML or HTML content. Actually the choice is easy because there isn't much of a choice to begin with:

  • if you need to deal with HTML, you don't really have a choice: you have to use DOM
  • if you have to do anything fancy such as moving nodes or appending some raw XML, again you pretty much have to use DOM
  • if all you need to do is read and/or write some basic XML (e.g. exchanging data with an XML service or reading a RSS feed) then you can use either. Or both.
  • if your XML document is so big that it doesn't fit in memory, you can't use either and you have to use XMLReader which is also based on libxml2, is even more annoying to use but still plays nice with others

TL;DR

  • SimpleXML is super easy to use but only good for 90% of use cases.
  • DOM is more complex, but can do everything.
  • XMLReader is super complicated, but uses very little memory. Very situational.
ˉ厌 2024-10-21 18:38:02

SimpleXMLElement 可以表示哪些 DOMNode?

这两个库之间最大的区别在于 SimpleXML 主要是一个类:SimpleXMLElement。相反,DOM 扩展有许多类,其中大多数是 DOMNode 的子类型。

因此,在比较这两个库时,一个核心问题是 DOM 提供的众多类中,哪一个最终可以由 SimpleXMLElement 表示?

下面是一个比较表,其中包含那些在处理 XML 时实际上有用的 DOMNode 类型(有用的节点类型)。您的情况可能会有所不同,例如,当您需要处理 DTD 时,例如:

+-------------------------+----+--------------------------+-----------+
| LIBXML Constant         |  # | DOMNode Classname        | SimpleXML |
+-------------------------+----+--------------------------+-----------+
| XML_ELEMENT_NODE        |  1 | DOMElement               |    yes    |
| XML_ATTRIBUTE_NODE      |  2 | DOMAttr                  |    yes    |
| XML_TEXT_NODE           |  3 | DOMText                  |  no [1]   |
| XML_CDATA_SECTION_NODE  |  4 | DOMCharacterData         |  no [2]   |
| XML_PI_NODE             |  7 | DOMProcessingInstruction |    no     |
| XML_COMMENT_NODE        |  8 | DOMComment               |    no     |
| XML_DOCUMENT_NODE       |  9 | DOMDocument              |    no     |
| XML_DOCUMENT_FRAG_NODE  | 11 | DOMDocumentFragment      |    no     |
+-------------------------+----+--------------------------+-----------+

正如此表所示,SimpleXML 确实与 DOM 相比,接口有限。除了表中的元素之外,SimpleXMLElement 还抽象了对子项和属性列表的访问,并通过元素名称(属性访问)、属性(数组访问)以及作为 Traversable 迭代它的“自己的”子元素(元素或属性)并通过 < code>children() 和 attributes() 方法。

只要所有这些神奇的接口都没有问题,但是它不能通过从 SimpleXMLElement 扩展来更改,因此尽管它很神奇,但它也很有限。

要了解 SimpleXMLElement 对象代表哪种节点类型,请参阅:

DOM 遵循 DOMDocument Core Level 1 规范。您可以使用该接口执行几乎所有可以想象到的 XML 处理。然而它只是级别 1,因此与现代 DOMDocument 级别(如 3)相比,它对于一些更酷的东西有一定的限制。当然,SimpleXML 在这里也失败了。

SimpleXMLElement 允许转换为子类型。这在 PHP 中是很特别的。 DOM 也允许这样做,尽管需要做更多的工作并且需要选择更具体的节点类型。

XPath 1.0 两者都支持,SimpleXML 中的结果是一个数组SimpleXMLElements 的 code>,在 DOM 中为 DOMNodelist

SimpleXMLElement 支持转换为字符串和数组 (json),但 DOM 中的 DOMNode 类不支持。它们提供对数组的转换,但只能像任何其他对象一样(公共属性作为键/值)。

PHP 中这两个扩展的常见使用模式是:

  • 您通常开始使用 SimpleXMLElement。您对 XML 和 XPath 的了解水平同样较低。
  • 在与界面的魔力进行斗争之后,迟早会达到一定程度的挫败感。
  • 您发现可以将 SimpleXMLElement 导入到 DOM 中,反之亦然。您将了解有关 DOM 的更多信息,以及如何使用扩展来完成您无法(或不知道如何)使用 SimpleXMLElement 完成的任务。
  • 您注意到可以加载带有 DOM 扩展的 HTML 文档。以及无效的 XML。并进行输出格式化。 SimpleXMLElement 无法做到的事情。即使使用肮脏的伎俩也不行。
  • 您甚至可能完全切换到 DOM 扩展,因为至少您知道该界面更具差异化并且允许您做一些事情。此外,您还会发现学习 DOM Level 1 的好处,因为您也可以在 Javascript 和其他语言中使用它(DOM 扩展对许多人来说是一个巨大的好处)。

您可以从这两个扩展中获得乐趣,我认为您应该了解这两个扩展。越多越好。 PHP 中所有基于 libxml 的扩展都是非常好的、强大的扩展。在 Stackoverflow 上的 标签下有一个很好的传统上,我们会很好地介绍这些图书馆并提供详细信息。

Which DOMNodes can be represented by SimpleXMLElement?

The biggest difference between the two libraries is that SimpleXML is mainly a single class: SimpleXMLElement. In contrast, the DOM extension has many classes, most of them a subtype of DOMNode.

So one core question when comparing those two libraries is which of the many classes DOM offers can be represented by a SimpleXMLElement in the end?

The following is a comparison table containing those DOMNode types that are actually useful as long as dealing with XML is concerned (useful node types). Your mileage may vary, e.g. when you need to deal with DTDs for example:

+-------------------------+----+--------------------------+-----------+
| LIBXML Constant         |  # | DOMNode Classname        | SimpleXML |
+-------------------------+----+--------------------------+-----------+
| XML_ELEMENT_NODE        |  1 | DOMElement               |    yes    |
| XML_ATTRIBUTE_NODE      |  2 | DOMAttr                  |    yes    |
| XML_TEXT_NODE           |  3 | DOMText                  |  no [1]   |
| XML_CDATA_SECTION_NODE  |  4 | DOMCharacterData         |  no [2]   |
| XML_PI_NODE             |  7 | DOMProcessingInstruction |    no     |
| XML_COMMENT_NODE        |  8 | DOMComment               |    no     |
| XML_DOCUMENT_NODE       |  9 | DOMDocument              |    no     |
| XML_DOCUMENT_FRAG_NODE  | 11 | DOMDocumentFragment      |    no     |
+-------------------------+----+--------------------------+-----------+

As this table shows, SimpleXML has really limited interfaces compared to DOM. Next to the ones in the table, SimpleXMLElement also abstracts access to children and attribute lists as well as it provides traversal via element names (property access), attributes (array access) as well as being a Traversable iterating it's "own" children (elements or attributes) and offering namespaced access via the children() and attributes() methods.

As long as all this magic interface it's fine, however it can not be changed by extending from SimpleXMLElement, so as magic as it is, as limited it is as well.

To find out which nodetype a SimpleXMLElement object represents, please see:

DOM follows here the DOMDocument Core Level 1 specs. You can do nearly every imaginable XML handling with that interface. However it's only Level 1, so compared with modern DOMDocument Levels like 3, it's somewhat limited for some cooler stuff. Sure SimpleXML has lost here as well.

SimpleXMLElement allows casting to subtypes. This is very special in PHP. DOM allows this as well, albeit it's a little bit more work and a more specific nodetype needs to be chosen.

XPath 1.0 is supported by both, the result in SimpleXML is an array of SimpleXMLElements, in DOM a DOMNodelist.

SimpleXMLElement supports casting to string and array (json), the DOMNode classes in DOM do not. They offer casting to array, but only like any other object does (public properties as keys/values).

Common usage patterns of those two extensions in PHP are:

  • You normally start to use SimpleXMLElement. Your level of knowledge about XML and XPath is on an equally low level.
  • After fighting with the magic of its interfaces, a certain level of frustration is reached sooner or later.
  • You discover that you can import SimpleXMLElements into DOM and vice-versa. You learn more about DOM and how to use the extension to do stuff you were not able (or not able to find out how) to do with SimpleXMLElement.
  • You notice that you can load HTML documents with the DOM extension. And invalid XML. And do output formatting. Things SimpleXMLElement just can't do. Not even with the dirty tricks.
  • You probably even switch to DOM extension fully because at least you know that the interface is more differentiated and allows you to do stuff. Also you see a benefit in learning the DOM Level 1 because you can use it as well in Javascript and other languages (a huge benefit of DOM extension for many).

You can have fun with both extensions and I think you should know both. The more the better. All the libxml based extensions in PHP are very good and powerful extensions. And on Stackoverflow under the tag there is a good tradition to cover these libraries well and also with detailed information.

玩心态 2024-10-21 18:38:02

正如其他人指出的那样,DOM 和 SimpleXML 扩展并不是严格意义上的“XML 解析器”,而是与底层 libxml2 解析器生成的结构不同的接口。

SimpleXML 接口将 XML 视为序列化数据结构,就像处理解码的 JSON 字符串一样。因此,它提供了对文档内容的快速访问,重点是按名称访问元素,并读取其属性和文本内容(包括自动折叠实体和 CDATA 部分)。它支持包含多个命名空间的文档(主要使用 children()attributes() 方法),并且可以使用 XPath 表达式搜索文档。它还支持内容的基本操作 - 例如使用新字符串添加或覆盖元素或属性。

另一方面,DOM 接口将 XML 视为结构化文档,其中使用的表示与表示的数据一样重要。因此,它提供了对不同类型“节点”(例如实体和 CDATA 部分)以及一些被 SimpleXML 忽略的节点(例如注释和处理指令)的更细化和显式的访问。它还提供了一组更丰富的操作功能,例如,允许您重新排列节点并选择如何表示文本内容。代价是 API 相当复杂,有大量的类和方法;由于它实现了一个标准 API(最初是为在 JavaScript 中操作 HTML 而开发的),因此可能不太有“自然 PHP”的感觉,但一些程序员可能在其他上下文中熟悉它。

这两个接口都需要将完整文档解析到内存中,并有效地将指针包装到解析后的表示中;您甚至可以使用 simplexml_import_dom()dom_import_simplexml() 在两个包装器之间切换,例如使用 DOM API 中的函数向 SimpleXML 添加“缺失”功能。对于较大的文档,“基于拉取”的 XMLReader 或“基于事件” " XML 解析器 可能更合适。

As others have pointed out, the DOM and SimpleXML extensions are not strictly "XML parsers", rather they are different interfaces to the structure generated by the underlying libxml2 parser.

The SimpleXML interface treats XML as a serialized data structure, in the same way you would treat a decoded JSON string. So it provides quick access to the contents of a document, with emphasis on accessing elements by name, and reading their attributes and text content (including automatically folding in entities and CDATA sections). It supports documents containing multiple namespaces (primarily using the children() and attributes() methods), and can search a document using an XPath expression. It also includes support for basic manipulation of the content - e.g. adding or overwriting elements or attributes with a new string.

The DOM interface, on the other hand, treats XML as a structured document, where the representation used is as important as the data represented. It therefore provides much more granular and explicit access to different types of "node", such as entities and CDATA sections, as well as some which are ignored by SimpleXML, such as comments and processing instructions. It also provides a much richer set of manipulation functions, allowing you to rearrange nodes and choose how to represent text content, for instance. The tradeoff is a fairly complex API, with a large number of classes and methods; since it implements a standard API (originally developed for manipulating HTML in JavaScript), there may be less of a "natural PHP" feel, but some programmers may be familiar with it from other contexts.

Both interfaces require the full document to be parsed into memory, and effectively wrap up pointers into that parsed representation; you can even switch between the two wrappers with simplexml_import_dom() and dom_import_simplexml(), for instance to add a "missing" feature to SimpleXML using a function from the DOM API. For larger documents, the "pull-based" XMLReader or the "event-based" XML Parser may be more appropriate.

羁客 2024-10-21 18:38:02

正如其名称所示,SimpleXML 是 XML 内容的简单解析器,仅此而已。你无法解析,比如说标准的 html 内容。它简单快捷,因此是创建简单应用程序的绝佳工具。

另一方面,DOM 扩展功能更强大。它使您能够解析几乎任何 DOM 文档,包括 html、xhtml、xml。它使您能够打开、编写甚至更正输出代码,支持 xpath 和整体更多操作。
因此,它的使用要复杂得多,因为库非常复杂,这使得它成为需要大量数据操作的大型项目的完美工具。

希望能回答您的问题:)

SimpleXML is, as name states, simple parser for XML content, and nothing else. You cannot parse, let's say standard html content. It's easy and quick, and therefore a great tool for creating simple applications.

DOM extension, on other side, is much more powerful. It enables you to parse almost any DOM document, including html, xhtml, xml. It enables you to open, write and even correct output code, supports xpath and overall more manipulation.
Therefore, its usage is much more complicated, because library is quite complex, and that makes it a perfect tool for bigger projects where heavy data manipulation is needed.

Hope that answers your question :)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文