如何通过Domdocument获取第一层dom元素?

发布于 2024-11-05 07:31:09 字数 671 浏览 0 评论 0原文

如何通过 Domdocument PHP 获取第一层 dom 元素?

代码不起作用的示例 - 取自问答:如何使用PHP DOMDocument获取第一级节点?

<?php
$str=<<< EOD
<div id="header">
</div>
<div id="content">
    <div id="sidebar">
    </div>
    <div id="info">
    </div>
</div>
<div id="footer">
</div>
EOD;

$doc = new DOMDocument();
$doc->loadHTML($str);
$xpath = new DOMXpath($doc);
$entries = $xpath->query("/");
foreach ($entries as $entry) {
    var_dump($entry->firstChild->nodeValue);
}
?>

How get first level of dom elements by Domdocument PHP?

Example with code that not works - tooken from Q&A:How to get nodes in first level using PHP DOMDocument?

<?php
$str=<<< EOD
<div id="header">
</div>
<div id="content">
    <div id="sidebar">
    </div>
    <div id="info">
    </div>
</div>
<div id="footer">
</div>
EOD;

$doc = new DOMDocument();
$doc->loadHTML($str);
$xpath = new DOMXpath($doc);
$entries = $xpath->query("/");
foreach ($entries as $entry) {
    var_dump($entry->firstChild->nodeValue);
}
?>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

情何以堪。 2024-11-12 07:31:09

根节点下面的第一级元素可以通过

$dom->documentElement->childNodes

childNodes 属性包含 进行访问DOMNodeList,您可以使用 foreach 对其进行迭代。

请参阅 DOMDocument::documentElement

这是一个方便的属性,允许直接访问作为文档的文档元素的子节点。

DOMNode::childNodes

包含此节点的所有子节点的 DOMNodeList。如果没有子节点,则这是一个空的 DOMNodeList。

由于 childNodesDOMNode 的属性,任何扩展 DOMNode 的类(这是 DOM 中的大多数类)都具有此属性,因此要获取DOMElement 下面的第一级元素是访问该 DOMElement 的 childNode 属性。


请注意,如果您在无效的 HTML 或部分文档上使用 DOMDocument::loadHTML(),HTML 解析器模块将添加带有 html 和 body 标记的 HTML 骨架,因此在 DOM 树中,您的 HTML示例将是

<!DOCTYPE html … ">
<html><body><div id="header">
</div>
<div id="content">
    <div id="sidebar">
    </div>
    <div id="info">
    </div>
</div>
<div id="footer">
</div></body></html>

您在遍历或使用 XPath 时必须考虑的示例。因此,使用

$dom = new DOMDocument;
$dom->loadHTML($str);
foreach ($dom->documentElement->childNodes as $node) {
    echo $node->nodeName; // body
}

只会迭代 DOMElement 节点。知道 libxml 将添加骨架,您将必须迭代 元素的 childNodes 以从示例代码中获取 div 元素,例如,

$dom->getElementsByTagName('body')->item(0)->childNodes

但是,这样做也会考虑到任何空白节点,因此您必须确保将 preserveWhiteSpace 设置为 false 或查询正确的元素 nodeType 如果您只想获取 DOMElement 节点,例如

foreach ($dom->getElementsByTagName('body')->item(0)->childNodes as $node) {
    if ($node->nodeType === XML_ELEMENT_NODE) {
        echo $node->nodeName;
    }
}

或使用 XPath

$dom->loadHTML($str);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('/html/body/*') as $node) {
    echo $node->nodeName;
}

其他信息:

The first level of elements below the root node can be accessed with

$dom->documentElement->childNodes

The childNodes property contains a DOMNodeList, which you can iterate with foreach.

See DOMDocument::documentElement

This is a convenience attribute that allows direct access to the child node that is the document element of the document.

and DOMNode::childNodes

A DOMNodeList that contains all children of this node. If there are no children, this is an empty DOMNodeList.

Since childNodes is a property of DOMNode any class extending DOMNode (which is most of the classes in DOM) have this property, so to get the first level of elements below a DOMElement is to access that DOMElement's childNode property.


Note that if you use DOMDocument::loadHTML() on invalid HTML or partial documents, the HTML parser module will add an HTML skeleton with html and body tags, so in the DOM tree, the HTML in your example will be

<!DOCTYPE html … ">
<html><body><div id="header">
</div>
<div id="content">
    <div id="sidebar">
    </div>
    <div id="info">
    </div>
</div>
<div id="footer">
</div></body></html>

which you have to take into account when traversing or using XPath. Consequently, using

$dom = new DOMDocument;
$dom->loadHTML($str);
foreach ($dom->documentElement->childNodes as $node) {
    echo $node->nodeName; // body
}

will only iterate the <body> DOMElement node. Knowing that libxml will add the skeleton, you will have to iterate over the childNodes of the <body> element to get the div elements from your example code, e.g.

$dom->getElementsByTagName('body')->item(0)->childNodes

However, doing so will also take into account any whitespace nodes, so you either have to make sure to set preserveWhiteSpace to false or query for the right element nodeType if you only want to get DOMElement nodes, e.g.

foreach ($dom->getElementsByTagName('body')->item(0)->childNodes as $node) {
    if ($node->nodeType === XML_ELEMENT_NODE) {
        echo $node->nodeName;
    }
}

or use XPath

$dom->loadHTML($str);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('/html/body/*') as $node) {
    echo $node->nodeName;
}

Additional information:

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文