使用 xpath 查询选择深层嵌套链接
<body class="en-us"> <div id="wrapper">
<div id="content">
<div class="content-top">
<div class="content-bot">
<div id="profile-wrapper" class=
"profile-wrapper profile-wrapper-horde">
<div class="profile-sidebar-anchor">
<div class="profile-sidebar-outer">
<div class="profile-sidebar-inner">
<div class="profile-sidebar-contents">
<div class="profile-sidebar-crest">
<a href="/wow/en/character/some-server/sometoon/" rel="np" class="profile-sidebar-character-model" style="">
</a>
<div class="profile-sidebar-info">
<div class="name">
<a href="/wow/en/character/some-server/sometoon/"
rel="np">Glitchshot</a>
</div>
<div class="under-name color-c8">
<span class="level"><strong>85</strong></span>
<a href="/wow/en/game/race/somerace" class="race">somerace</a>
<a href="/wow/en/game/class/someclass" class="class">someclass</a>
</div>
<div class="guild">
<a href="/wow/en/guild/some-server/someguild/?character=sometoon">
Some Guild</a>
</div>
<div class="realm">
<span id="profile-info-realm" class="tip"
data-battlegroup="Stormstrike">Black
Dragonflight</span>
</div>
</div>
</div>
<ul class="profile-sidebar-menu" id="profile-sidebar-menu">
<li><a href=
"/wow/en/character/some-server/sometoon/" class=
"back-to" rel="np"><span class="arrow"><span class=
"icon">Character Summary</span></span></a></li>
<li class="root-menu"><a href=
"/wow/en/character/some-server/sometoon/achievement"
class="back-to" rel="np"><span class=
"arrow"><span class=
"icon">Achievements</span></span></a></li>
<li class=" active"><a href=
"/wow/en/character/some-server/sometoon/achievement#summary"
class="" rel="np"><span class="arrow"><span class=
"icon">Achievements</span></span></a></li>
<li class=""><a href=
"/wow/en/character/some-server/sometoon/achievement#92"
class="" rel="np"><span class="arrow"><span class=
"icon">General</span></span></a></li>
我知道我在这里发布了很多无用的代码,但希望你们了解 DOM 是什么样子。
从此:
<a href="/wow/en/character/some-server/sometoon/achievement#92" class="" rel="np"><span class="arrow"><span class="icon">General</span></span></a>
我想提取这个:
/wow/en/character/some-server/sometoon/achievement#92
它来自发布标记中的最后一个锚点。
我已经阅读了尽可能多的关于如何使用 xpath 查询来提取所需信息的内容,但我显然遗漏了一些东西。下面是我认为应该有效但无效的查询。
<?php
$query = '*/ul[@class=profile-sidebar-menu]/ul/li[3]/ul/li[1]/a/@href';
echo $query . "<br>";
$achievementSubCategory = $xpath->query($query);
$achiSubArray = array("URL" => $achievementSubCategory->item(0)->nodeValue);
var_dump($achiSubArray);
// Produces array(1) { ["URL"]=> NULL } which should look something more like:
// array(1) { ["URL"]=> /wow/en/character/some-server/sometoon/achievement#92 }
?>
预先感谢您的帮助和建议
<body class="en-us"> <div id="wrapper">
<div id="content">
<div class="content-top">
<div class="content-bot">
<div id="profile-wrapper" class=
"profile-wrapper profile-wrapper-horde">
<div class="profile-sidebar-anchor">
<div class="profile-sidebar-outer">
<div class="profile-sidebar-inner">
<div class="profile-sidebar-contents">
<div class="profile-sidebar-crest">
<a href="/wow/en/character/some-server/sometoon/" rel="np" class="profile-sidebar-character-model" style="">
</a>
<div class="profile-sidebar-info">
<div class="name">
<a href="/wow/en/character/some-server/sometoon/"
rel="np">Glitchshot</a>
</div>
<div class="under-name color-c8">
<span class="level"><strong>85</strong></span>
<a href="/wow/en/game/race/somerace" class="race">somerace</a>
<a href="/wow/en/game/class/someclass" class="class">someclass</a>
</div>
<div class="guild">
<a href="/wow/en/guild/some-server/someguild/?character=sometoon">
Some Guild</a>
</div>
<div class="realm">
<span id="profile-info-realm" class="tip"
data-battlegroup="Stormstrike">Black
Dragonflight</span>
</div>
</div>
</div>
<ul class="profile-sidebar-menu" id="profile-sidebar-menu">
<li><a href=
"/wow/en/character/some-server/sometoon/" class=
"back-to" rel="np"><span class="arrow"><span class=
"icon">Character Summary</span></span></a></li>
<li class="root-menu"><a href=
"/wow/en/character/some-server/sometoon/achievement"
class="back-to" rel="np"><span class=
"arrow"><span class=
"icon">Achievements</span></span></a></li>
<li class=" active"><a href=
"/wow/en/character/some-server/sometoon/achievement#summary"
class="" rel="np"><span class="arrow"><span class=
"icon">Achievements</span></span></a></li>
<li class=""><a href=
"/wow/en/character/some-server/sometoon/achievement#92"
class="" rel="np"><span class="arrow"><span class=
"icon">General</span></span></a></li>
I know that I have posted a lot of useless code here but wanted you guys to have an idea of wwhat the DOM would look like.
From this:
<a href="/wow/en/character/some-server/sometoon/achievement#92" class="" rel="np"><span class="arrow"><span class="icon">General</span></span></a>
I would like to extract this:
/wow/en/character/some-server/sometoon/achievement#92
which comes from the last anchor in the posted markup.
I have read as much as I can find on how to use xpath query to extract the needed information but I am clearly missing something. Below is the query that I thought should work but does not.
<?php
$query = '*/ul[@class=profile-sidebar-menu]/ul/li[3]/ul/li[1]/a/@href';
echo $query . "<br>";
$achievementSubCategory = $xpath->query($query);
$achiSubArray = array("URL" => $achievementSubCategory->item(0)->nodeValue);
var_dump($achiSubArray);
// Produces array(1) { ["URL"]=> NULL } which should look something more like:
// array(1) { ["URL"]=> /wow/en/character/some-server/sometoon/achievement#92 }
?>
Thank you in advance for your assistance and advice
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
此 XPath 表达式存在一些问题:
它正在查找作为当前节点的 crandchild 的
ul
元素,并且该元素具有名为 < 的属性code>class,其字符串值等于ul
的子元素之一(名为profile-sidebar-menu
)的字符串值。但是,ul
没有名为profile-sidebar-menu
的子级,并且整个表达式不选择任何节点。另一个问题是索引。
li[3]
选择第三个li
元素 - 上下文节点的子元素。但是,所需的a
元素是上下文节点的第四li
子级的子级。这必须表示为:li[4]
。 XPath 位置是从 1 开始的,而不是从 0 开始。如果这两个问题得到纠正,我相信纠正后的表达式应该如下所示:
从顶部元素开始选择所需
href
属性的绝对 XPath 表达式body 是:
下面是 XML 文档(提供的文档,通过附加一些缺失的结束标记而形成良好格式:
可以检查上面的绝对 XPath 表达式是否准确地选择了所需的内容)
href
属性,通过使用Xpath Visualizer.这是使用 XPath Visualizer 执行的选择的快照:
There are a few problems with this XPath expression:
It is looking for a
ul
element that is a crandchild of the current node, and that has an attribute namedclass
whose string value is equal to the string value of one of the children-elements oful
, namedprofile-sidebar-menu
. However, theul
has no children namedprofile-sidebar-menu
and the whole expression doesn't select any node.Another problem is the indexing.
li[3]
selects the thirdli
element - child of the context node. However the wanteda
element is a child of the fourthli
child of the context node. This must be expressed as:li[4]
. XPath positions are 1-based, not 0-based.If these two problems are corrected, I believe that the corrected expression should look like the following:
The absolute XPath expression that selects the wanted
href
attribute starting from the top elementbody
of the provided XML document, is:Below is the XML document (the provided one, made well-formed by appending a number of missing end tags:
One can check that the above absolute XPath expression selects exactly the wanted
href
attribute, by evaluating it with a tool like the Xpath Visualizer.Here is a snapshot of the selection, performed with the XPath Visualizer:
如果您的 DOM 结构是一致的,那么类似以下内容应该有效:
您的 xpath 语句没有任何意义。路径中有多个 ul,但示例的结构并非如此。另外,xpath 中的索引从 1 开始,而不是从 0 开始。
If your DOM structure is consistent, then something like the following should work:
Your xpath statement makes no sense. You have multiple ul's in the path but the sample is not structured that way. Also, indexing in xpath starts at 1, not 0.
在上面显示的 html 的基础上(并假设最终标签正确关闭), ewh' 表达式应该可以正常工作。
可能您省略了文档中的一些重要部分。尝试更具体:
//ul[@class='profile-sidebar-menu' and @id='profile-sidebar-menu']/li/a[@href='/wow/en/character/ some-server/sometoon/achievement#92']/@href
我很确定它可以工作,通过 XPath 查询表达式工具。
如果您仍然没有得到结果,请尝试显示您正在处理的所有 html。
On the base of the html you show above (and assuming that final tags are correctly closed) the ewh'expression should work fine.
May be you omitted some important part of the document there. Try being more specific:
//ul[@class='profile-sidebar-menu' and @id='profile-sidebar-menu']/li/a[@href='/wow/en/character/some-server/sometoon/achievement#92']/@href
I'm pretty sure it works, tested online via XPath Query Expression Tool.
If you still do not get results, try to show all the html you are working on.