使用 xpath 和 simplexml 仅获取相同类型的后续兄弟

发布于 2024-08-15 11:54:49 字数 1135 浏览 3 评论 0原文

我需要解析如下所示的 html 定义列表:

<dl>
    <dt>stuff</dt>
        <dd>junk</dd>
        <dd>things</dd>
        <dd>whatnot</dd>
    <dt>colors</dt>
        <dd>red</dd>
        <dd>green</dd>
        <dd>blue</dd>
</dl>

这样我就可以得到这样的关联数组:

[definition list] =>
    [stuff] =>
        [0] => junk
        [1] => things
        [2] => whatnot
    [colors] =>
        [0] => red
        [1] => green
        [2] => blue

我正在使用 DOMDocument -> loadHTML() 将 HTML 字符串导入到对象中,然后使用 simplexml_import_dom() 使用 simplexml 扩展,特别是 xpath

我遇到的问题是用于查询所有连续且未被

破坏的
元素的 XPath 语法。

由于

元素不被视为
元素的子元素,因此我无法简单地循环查询所有 dt并查询所有dd

所以我想我必须查询每个 dt 的第一个 dd 兄弟,然后查询该第一个 dd 的所有兄弟>dd。

但我从 XPath 教程中不清楚这是否可能。你能说“连续匹配的兄弟姐妹”吗?或者我是否被迫循环遍历原始 dl 的每个子级,并在出现任何 dt 和 dd 时移动它们?

I need to parse a html definition list like the following:

<dl>
    <dt>stuff</dt>
        <dd>junk</dd>
        <dd>things</dd>
        <dd>whatnot</dd>
    <dt>colors</dt>
        <dd>red</dd>
        <dd>green</dd>
        <dd>blue</dd>
</dl>

So that I can end up with an associative array like this:

[definition list] =>
    [stuff] =>
        [0] => junk
        [1] => things
        [2] => whatnot
    [colors] =>
        [0] => red
        [1] => green
        [2] => blue

I am using DOMDocument -> loadHTML() to import the HTML string into an object and then simplexml_import_dom() to use the simplexml extensions, specifically xpath.

The problem I'm having is with the XPath syntax for querying all <dd> elements that are consecutive and not broken by a <dt>.

Since <dd> elements are not considered children of <dt> elements, I can't simply loop through a query all dts and query for all dds.

So I'm thinking I have to do a query for the first dd sibling of each dt and then all dd siblings of that first dd.

But I'm not clear from the XPath tutorials if this is possible. Can you say "consecutive matching siblings"? Or am I forced to loop through each child of the original dl and move over any dts and dd as they show up?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

一念一轮回 2024-08-22 11:54:49

当然有一些方法可以在 XPath 中找到连续匹配的兄弟姐妹,但是这会相对复杂,并且因为无论如何您都必须处理每个子级,所以您不妨像您提到的那样循环遍历它们。它比循环

然后寻找兄弟姐妹更简单、更高效。

$dl = simplexml_load_string(
    '<dl>
        <dt>stuff</dt>
            <dd>junk</dd>
            <dd>things</dd>
            <dd>whatnot</dd>
        <dt>colors</dt>
            <dd>red</dd>
            <dd>green</dd>
            <dd>blue</dd>
    </dl>'
);

$list = array();
foreach ($dl->children() as $child)
{
    switch (dom_import_simplexml($child)->localName)
    {
        case 'dt':
            $k = (string) $child;
            break;

        case 'dd':
            $list[$k][] = (string) $child;
            break;
    }
}

There are certainly ways to find consecutive matching siblings in XPath, but it would be relatively complicated and since you have to process every child anyway you might as well just loop over them as you mentioned. It will be simpler and more efficient than looping over <dt/> then looking for siblings.

$dl = simplexml_load_string(
    '<dl>
        <dt>stuff</dt>
            <dd>junk</dd>
            <dd>things</dd>
            <dd>whatnot</dd>
        <dt>colors</dt>
            <dd>red</dd>
            <dd>green</dd>
            <dd>blue</dd>
    </dl>'
);

$list = array();
foreach ($dl->children() as $child)
{
    switch (dom_import_simplexml($child)->localName)
    {
        case 'dt':
            $k = (string) $child;
            break;

        case 'dd':
            $list[$k][] = (string) $child;
            break;
    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文