使用 xpath 查询选择深层嵌套链接

发布于 2024-11-03 08:32:00 字数 4378 浏览 0 评论 0原文

<body class="en-us">   <div id="wrapper">
    <div id="content">
      <div class="content-top">
        <div class="content-bot">
          <div id="profile-wrapper" class=
          "profile-wrapper profile-wrapper-horde">
            <div class="profile-sidebar-anchor">
              <div class="profile-sidebar-outer">
                <div class="profile-sidebar-inner">
                  <div class="profile-sidebar-contents">
                    <div class="profile-sidebar-crest">
                      <a href="/wow/en/character/some-server/sometoon/" rel="np" class="profile-sidebar-character-model" style="">
                      </a>

                      <div class="profile-sidebar-info">
                        <div class="name">
                          <a href="/wow/en/character/some-server/sometoon/"
                          rel="np">Glitchshot</a>
                        </div>

                        <div class="under-name color-c8">
                          <span class="level"><strong>85</strong></span>
                          <a href="/wow/en/game/race/somerace" class="race">somerace</a> 
                          <a href="/wow/en/game/class/someclass" class="class">someclass</a>
                        </div>

                        <div class="guild">
                          <a href="/wow/en/guild/some-server/someguild/?character=sometoon">
                          Some Guild</a>
                        </div>

                        <div class="realm">
                          <span id="profile-info-realm" class="tip"
                          data-battlegroup="Stormstrike">Black
                          Dragonflight</span>
                        </div>
                      </div>
                    </div>

                    <ul class="profile-sidebar-menu" id="profile-sidebar-menu">
                      <li><a href=
                      "/wow/en/character/some-server/sometoon/" class=
                      "back-to" rel="np"><span class="arrow"><span class=
                      "icon">Character Summary</span></span></a></li>

                      <li class="root-menu"><a href=
                      "/wow/en/character/some-server/sometoon/achievement"
                         class="back-to" rel="np"><span class=
                         "arrow"><span class=
                         "icon">Achievements</span></span></a></li>

                      <li class=" active"><a href=
                      "/wow/en/character/some-server/sometoon/achievement#summary"
                         class="" rel="np"><span class="arrow"><span class=
                         "icon">Achievements</span></span></a></li>

                      <li class=""><a href=
                      "/wow/en/character/some-server/sometoon/achievement#92"
                         class="" rel="np"><span class="arrow"><span class=
                         "icon">General</span></span></a></li>

我知道我在这里发布了很多无用的代码,但希望你们了解 DOM 是什么样子。

从此:

<a href="/wow/en/character/some-server/sometoon/achievement#92" class="" rel="np"><span class="arrow"><span class="icon">General</span></span></a>

我想提取这个:

/wow/en/character/some-server/sometoon/achievement#92

它来自发布标记中的最后一个锚点。

我已经阅读了尽可能多的关于如何使用 xpath 查询来提取所需信息的内容,但我显然遗漏了一些东西。下面是我认为应该有效但无效的查询。

<?php
    $query = '*/ul[@class=profile-sidebar-menu]/ul/li[3]/ul/li[1]/a/@href';
    echo $query . "<br>";
    $achievementSubCategory = $xpath->query($query);

    $achiSubArray = array("URL" => $achievementSubCategory->item(0)->nodeValue);
    var_dump($achiSubArray);
    // Produces array(1) { ["URL"]=> NULL } which should look something more like:
    // array(1) { ["URL"]=> /wow/en/character/some-server/sometoon/achievement#92 }
?>

预先感谢您的帮助和建议

<body class="en-us">   <div id="wrapper">
    <div id="content">
      <div class="content-top">
        <div class="content-bot">
          <div id="profile-wrapper" class=
          "profile-wrapper profile-wrapper-horde">
            <div class="profile-sidebar-anchor">
              <div class="profile-sidebar-outer">
                <div class="profile-sidebar-inner">
                  <div class="profile-sidebar-contents">
                    <div class="profile-sidebar-crest">
                      <a href="/wow/en/character/some-server/sometoon/" rel="np" class="profile-sidebar-character-model" style="">
                      </a>

                      <div class="profile-sidebar-info">
                        <div class="name">
                          <a href="/wow/en/character/some-server/sometoon/"
                          rel="np">Glitchshot</a>
                        </div>

                        <div class="under-name color-c8">
                          <span class="level"><strong>85</strong></span>
                          <a href="/wow/en/game/race/somerace" class="race">somerace</a> 
                          <a href="/wow/en/game/class/someclass" class="class">someclass</a>
                        </div>

                        <div class="guild">
                          <a href="/wow/en/guild/some-server/someguild/?character=sometoon">
                          Some Guild</a>
                        </div>

                        <div class="realm">
                          <span id="profile-info-realm" class="tip"
                          data-battlegroup="Stormstrike">Black
                          Dragonflight</span>
                        </div>
                      </div>
                    </div>

                    <ul class="profile-sidebar-menu" id="profile-sidebar-menu">
                      <li><a href=
                      "/wow/en/character/some-server/sometoon/" class=
                      "back-to" rel="np"><span class="arrow"><span class=
                      "icon">Character Summary</span></span></a></li>

                      <li class="root-menu"><a href=
                      "/wow/en/character/some-server/sometoon/achievement"
                         class="back-to" rel="np"><span class=
                         "arrow"><span class=
                         "icon">Achievements</span></span></a></li>

                      <li class=" active"><a href=
                      "/wow/en/character/some-server/sometoon/achievement#summary"
                         class="" rel="np"><span class="arrow"><span class=
                         "icon">Achievements</span></span></a></li>

                      <li class=""><a href=
                      "/wow/en/character/some-server/sometoon/achievement#92"
                         class="" rel="np"><span class="arrow"><span class=
                         "icon">General</span></span></a></li>

I know that I have posted a lot of useless code here but wanted you guys to have an idea of wwhat the DOM would look like.

From this:

<a href="/wow/en/character/some-server/sometoon/achievement#92" class="" rel="np"><span class="arrow"><span class="icon">General</span></span></a>

I would like to extract this:

/wow/en/character/some-server/sometoon/achievement#92

which comes from the last anchor in the posted markup.

I have read as much as I can find on how to use xpath query to extract the needed information but I am clearly missing something. Below is the query that I thought should work but does not.

<?php
    $query = '*/ul[@class=profile-sidebar-menu]/ul/li[3]/ul/li[1]/a/@href';
    echo $query . "<br>";
    $achievementSubCategory = $xpath->query($query);

    $achiSubArray = array("URL" => $achievementSubCategory->item(0)->nodeValue);
    var_dump($achiSubArray);
    // Produces array(1) { ["URL"]=> NULL } which should look something more like:
    // array(1) { ["URL"]=> /wow/en/character/some-server/sometoon/achievement#92 }
?>

Thank you in advance for your assistance and advice

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

一生独一 2024-11-10 08:32:00
*/ul[@class=profile-sidebar-menu]/ul/li[3]/ul/li[1]/a/@href

此 XPath 表达式存在一些问题

  1. 它正在查找作为当前节点的 crandchild 的 ul 元素,并且该元素具有名为 < 的属性code>class,其字符串值等于 ul 的子元素之一(名为 profile-sidebar-menu)的字符串值。但是,ul 没有名为 profile-sidebar-menu 的子级,并且整个表达式不选择任何节点。

  2. 另一个问题是索引。 li[3] 选择第三个 li 元素 - 上下文节点的子元素。但是,所需的 a 元素是上下文节点的第四 li 子级的子级。这必须表示为:li[4]。 XPath 位置是从 1 开始的,而不是从 0 开始。

如果这两个问题得到纠正,我相信纠正后的表达式应该如下所示

*/ul[@class="profile-sidebar-menu"]/ul/li[4]/a/@href

从顶部元素开始选择所需 href 属性的绝对 XPath 表达式body 是:

/*/*/*/*/*/*/*/*/*/*/ul/li[4]/a/@href

下面是 XML 文档(提供的文档,通过附加一些缺失的结束标记而形成良好格式:

<body class="en-us">
    <div id="wrapper">
        <div id="content">
            <div class="content-top">
                <div class="content-bot">
                    <div id="profile-wrapper" class=
              "profile-wrapper profile-wrapper-horde">
                        <div class="profile-sidebar-anchor">
                            <div class="profile-sidebar-outer">
                                <div class="profile-sidebar-inner">
                                    <div class="profile-sidebar-contents">
                                        <div class="profile-sidebar-crest">
                                            <a href="/wow/en/character/some-server/sometoon/" rel="np" class="profile-sidebar-character-model" style=""></a>
                                            <div class="profile-sidebar-info">
                                                <div class="name">
                                                    <a href="/wow/en/character/some-server/sometoon/"
                              rel="np">Glitchshot</a>
                                                </div>
                                                <div class="under-name color-c8">
                                                    <span class="level">
                                                        <strong>85</strong>
                                                    </span>
                                                    <a href="/wow/en/game/race/somerace" class="race">somerace</a>
                                                    <a href="/wow/en/game/class/someclass" class="class">someclass</a>
                                                </div>
                                                <div class="guild">
                                                    <a href="/wow/en/guild/some-server/someguild/?character=sometoon">
                              Some Guild</a>
                                                </div>
                                                <div class="realm">
                                                    <span id="profile-info-realm" class="tip"
                              data-battlegroup="Stormstrike">Black
                              Dragonflight</span>
                                                </div>
                                            </div>
                                        </div>
                                        <ul class="profile-sidebar-menu" id="profile-sidebar-menu">
                                            <li>
                                                <a href=
                          "/wow/en/character/some-server/sometoon/" class=
                          "back-to" rel="np">
                                                    <span class="arrow">
                                                        <span class=
                          "icon">Character Summary</span></span>
                                                </a>
                                            </li>
                                            <li class="root-menu">
                                                <a href=
                          "/wow/en/character/some-server/sometoon/achievement"
                             class="back-to" rel="np">
                                                    <span class=
                             "arrow">
                                                        <span class=
                             "icon">Achievements</span></span>
                                                </a>
                                            </li>
                                            <li class=" active">
                                                <a href=
                          "/wow/en/character/some-server/sometoon/achievement#summary"
                             class="" rel="np">
                                                    <span class="arrow">
                                                        <span class=
                             "icon">Achievements</span></span>
                                                </a>
                                            </li>
                                            <li class="">
                                                <a href=
                          "/wow/en/character/some-server/sometoon/achievement#92"
                             class="" rel="np">
                                                    <span class="arrow">
                                                        <span class=
                             "icon">General</span></span>
                                                </a>
                                            </li>
                                        </ul>
                                    </div>
                                </div>
                            </div>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
</body>

可以检查上面的绝对 XPath 表达式是否准确地选择了所需的内容) href 属性,通过使用Xpath Visualizer.

这是使用 XPath Visualizer 执行的选择的快照:

在此处输入图像描述

*/ul[@class=profile-sidebar-menu]/ul/li[3]/ul/li[1]/a/@href

There are a few problems with this XPath expression:

  1. It is looking for a ul element that is a crandchild of the current node, and that has an attribute named class whose string value is equal to the string value of one of the children-elements of ul, named profile-sidebar-menu. However, the ul has no children named profile-sidebar-menu and the whole expression doesn't select any node.

  2. Another problem is the indexing. li[3] selects the third li element - child of the context node. However the wanted a element is a child of the fourth li child of the context node. This must be expressed as: li[4]. XPath positions are 1-based, not 0-based.

If these two problems are corrected, I believe that the corrected expression should look like the following:

*/ul[@class="profile-sidebar-menu"]/ul/li[4]/a/@href

The absolute XPath expression that selects the wanted href attribute starting from the top element body of the provided XML document, is:

/*/*/*/*/*/*/*/*/*/*/ul/li[4]/a/@href

Below is the XML document (the provided one, made well-formed by appending a number of missing end tags:

<body class="en-us">
    <div id="wrapper">
        <div id="content">
            <div class="content-top">
                <div class="content-bot">
                    <div id="profile-wrapper" class=
              "profile-wrapper profile-wrapper-horde">
                        <div class="profile-sidebar-anchor">
                            <div class="profile-sidebar-outer">
                                <div class="profile-sidebar-inner">
                                    <div class="profile-sidebar-contents">
                                        <div class="profile-sidebar-crest">
                                            <a href="/wow/en/character/some-server/sometoon/" rel="np" class="profile-sidebar-character-model" style=""></a>
                                            <div class="profile-sidebar-info">
                                                <div class="name">
                                                    <a href="/wow/en/character/some-server/sometoon/"
                              rel="np">Glitchshot</a>
                                                </div>
                                                <div class="under-name color-c8">
                                                    <span class="level">
                                                        <strong>85</strong>
                                                    </span>
                                                    <a href="/wow/en/game/race/somerace" class="race">somerace</a>
                                                    <a href="/wow/en/game/class/someclass" class="class">someclass</a>
                                                </div>
                                                <div class="guild">
                                                    <a href="/wow/en/guild/some-server/someguild/?character=sometoon">
                              Some Guild</a>
                                                </div>
                                                <div class="realm">
                                                    <span id="profile-info-realm" class="tip"
                              data-battlegroup="Stormstrike">Black
                              Dragonflight</span>
                                                </div>
                                            </div>
                                        </div>
                                        <ul class="profile-sidebar-menu" id="profile-sidebar-menu">
                                            <li>
                                                <a href=
                          "/wow/en/character/some-server/sometoon/" class=
                          "back-to" rel="np">
                                                    <span class="arrow">
                                                        <span class=
                          "icon">Character Summary</span></span>
                                                </a>
                                            </li>
                                            <li class="root-menu">
                                                <a href=
                          "/wow/en/character/some-server/sometoon/achievement"
                             class="back-to" rel="np">
                                                    <span class=
                             "arrow">
                                                        <span class=
                             "icon">Achievements</span></span>
                                                </a>
                                            </li>
                                            <li class=" active">
                                                <a href=
                          "/wow/en/character/some-server/sometoon/achievement#summary"
                             class="" rel="np">
                                                    <span class="arrow">
                                                        <span class=
                             "icon">Achievements</span></span>
                                                </a>
                                            </li>
                                            <li class="">
                                                <a href=
                          "/wow/en/character/some-server/sometoon/achievement#92"
                             class="" rel="np">
                                                    <span class="arrow">
                                                        <span class=
                             "icon">General</span></span>
                                                </a>
                                            </li>
                                        </ul>
                                    </div>
                                </div>
                            </div>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
</body>

One can check that the above absolute XPath expression selects exactly the wanted href attribute, by evaluating it with a tool like the Xpath Visualizer.

Here is a snapshot of the selection, performed with the XPath Visualizer:

enter image description here

情仇皆在手 2024-11-10 08:32:00

如果您的 DOM 结构是一致的,那么类似以下内容应该有效:

//ul[@class='profile-sidebar-menu']/li[last()]/a/@href

您的 xpath 语句没有任何意义。路径中有多个 ul,但示例的结构并非如此。另外,xpath 中的索引从 1 开始,而不是从 0 开始。

If your DOM structure is consistent, then something like the following should work:

//ul[@class='profile-sidebar-menu']/li[last()]/a/@href

Your xpath statement makes no sense. You have multiple ul's in the path but the sample is not structured that way. Also, indexing in xpath starts at 1, not 0.

夜空下最亮的亮点 2024-11-10 08:32:00

在上面显示的 html 的基础上(并假设最终标签正确关闭), ewh' 表达式应该可以正常工作。

可能您省略了文档中的一些重要部分。尝试更具体:

//ul[@class='profile-sidebar-menu' and @id='profile-sidebar-menu']/li/a[@href='/wow/en/character/ some-server/sometoon/achievement#92']/@href

我很确定它可以工作,通过 XPath 查询表达式工具

如果您仍然没有得到结果,请尝试显示您正在处理的所有 html。

On the base of the html you show above (and assuming that final tags are correctly closed) the ewh'expression should work fine.

May be you omitted some important part of the document there. Try being more specific:

//ul[@class='profile-sidebar-menu' and @id='profile-sidebar-menu']/li/a[@href='/wow/en/character/some-server/sometoon/achievement#92']/@href

I'm pretty sure it works, tested online via XPath Query Expression Tool.

If you still do not get results, try to show all the html you are working on.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文