XPath 获取一级子节点

发布于 2024-08-16 17:11:26 字数 440 浏览 1 评论 0原文

使用 DOMXPath::query 是否可以只获取一层深度的子节点?

例如,如果我有一个像这样的文档:

<div>
    <span>
        <cite>
        </cite>
    </span>
    <span>
        <cite>
        </cite>
    </span>
</div>

我希望 NodeList 仅包含跨度而不包含引用。

还应该提到的是,它并不总是相同的元素(div、span 等)。我需要它与任何类型的元素一起使用。

这是我尝试过的,但似乎不起作用:

//*[not(ancestor::div)]

Using DOMXPath::query is it possible to get just one level deep of childNodes?

For example if I had a document like:

<div>
    <span>
        <cite>
        </cite>
    </span>
    <span>
        <cite>
        </cite>
    </span>
</div>

I would want the NodeList to contain just the spans and not the cites.

Should also mention that it won't always be the same elements (divs, spans, etc). I would need it to work with any type of element.

This is what I tried and it didn't seem to work:

//*[not(ancestor::div)]

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

梦言归人 2024-08-23 17:11:27

如果使用,

/div/*

那么您将获得该元素中所有直接子元素的列表,但这些子元素包含它们的子元素。我认为你不能删除孩子的孩子

有使用默认轴,它被称为child::。该轴仅返回当前节点下第 1 层的元素

* 匹配所有元素,但属性和 text() 都不匹配。

您必须指定节点的路径,并注意 //node 因为它意味着 descendant::node 并且它返回该树中该名称的所有节点

If you use

/div/*

then you get a list of all direct children in this element but these children contain their children. I think that you can't remove children of child

There is used default axis, it is called child::. This axis returns only elements in 1 level under the current node

* matches all elements but neither attributes nor text()

You have to specify path to your node and be careful about //node because it means descendant::node and it returns all nodes of this name in this tree

人生百味 2024-08-23 17:11:27

你的问题有点不明确,所以有几种方法可以解释它。如果您想要当前元素的所有直接子元素(及其所有子元素),请使用

*/*

对于您的示例,这将为您提供

<span>
    <cite>
    </cite>
</span>

如果

<span>
    <cite>
    </cite>
</span>

您想要所有子节点,则使用 node() 而不是 *

*/node()

对于您的示例,这将为您提供如上所述的两个子元素,以及换行/缩进 text() 节点。

但是,如果您希望子节点而不是其子节点(即仅 span 元素,但没有其子元素),则必须使用两个表达式:

  1. 通过 */* 选择直接子元素
  2. 处理这些子元素,并通过 text() 仅选择文本节点而不选择孙元素

我的 PHP 有点生锈了,但它应该有点像这样工作:

$doc = new DOMDocument;
// set up $doc
$xpath = new DOMXPath($doc);

// perform step #1
$childElements = $xpath->query('*/*');

$directChildren = array();
foreach ($childElements as $child) {
  // perform step #2
  $textChildren = $xpath->query('text()', $child);
  foreach ($textChildren as $text) {
    $directChildren[] = $text;
  }
}
// now, $directChildren contains all text nodes

Your question is a bit under-specified, so there are several ways to interpret it. If you want all direct child elements of the current element (with all of their sub-elements), then use

*/*

For your example, this gives you

<span>
    <cite>
    </cite>
</span>

and

<span>
    <cite>
    </cite>
</span>

If you want all child nodes, then use node() instead of *:

*/node()

For your example, this gives you both sub-elements as above, alongside with newline/indentation text() nodes.

If, however, you want to have only the child nodes and not their children as well (i.e. only the span elements, but without their child elements), you must use two expressions:

  1. select the direct child elements via */*
  2. process the those child elements and select only the text nodes and not the grandchildren elements via text()

My PHP is a bit rusty, but it should work a bit like this:

$doc = new DOMDocument;
// set up $doc
$xpath = new DOMXPath($doc);

// perform step #1
$childElements = $xpath->query('*/*');

$directChildren = array();
foreach ($childElements as $child) {
  // perform step #2
  $textChildren = $xpath->query('text()', $child);
  foreach ($textChildren as $text) {
    $directChildren[] = $text;
  }
}
// now, $directChildren contains all text nodes
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文