PHP DOMXPATH &大批
我试图从页面中提取所有相关的 URL 和图像并将它们放入一个数组中,下面的代码工作正常,只是它一遍又一遍地输出第一对数字正确的次数。我想也许我在指定 XPATH 时犯了错误,但我已经在 3 个不同的站点上进行了测试,每次都得到相同的结果。
$dom = new DOMDocument();
$dom->loadHtml( $html );
$xpath = new DOMXPath( $dom );
$items = $xpath->query( "//div[@class=\"row\"]" );
foreach ( $items as $item ) {
$value['url'] = $xpath->query( "//div[@class=\"productImg\"]/a/@href",$item)->item(0)->nodeValue;
$value['img'] = $xpath->query("//div[@class=\"productImg\"]/a/img/@src",$item)->item(0)->nodeValue;
$result[] = $value;
}
print_r($result);
显然代码不正确,但我无法将其缩小到有问题的部分。在有人建议使用正则表达式之前,这是我通常会做的事情,但如果可能的话,我现在更愿意使用 XPATH。
I'm trying to extract all relevant URLs and images out of a page and put them into an array, the code below works fine except it outputs the first pair over and over for the numerically-correct number of times. I thought maybe I was making mistakes when specifying XPATHs but I've tested it on 3 different sites with the same result every time.
$dom = new DOMDocument();
$dom->loadHtml( $html );
$xpath = new DOMXPath( $dom );
$items = $xpath->query( "//div[@class=\"row\"]" );
foreach ( $items as $item ) {
$value['url'] = $xpath->query( "//div[@class=\"productImg\"]/a/@href",$item)->item(0)->nodeValue;
$value['img'] = $xpath->query("//div[@class=\"productImg\"]/a/img/@src",$item)->item(0)->nodeValue;
$result[] = $value;
}
print_r($result);
Clearly the code isn't right but I haven't been able to narrow it down to the offending portion. And before somebody suggests using regex that is something I'd usually do but I'd prefer to use XPATH now if possible.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
给定
query("//div[@class=\"productImg\"]/a/img/@src",$item)
看起来您想要执行查询相对于$item
的。您已经非常接近目标了,只是还没有完全实现。您的查询以
//div
开头,这意味着查找作为文档根的后代的任何节点,并满足其余条件查询的一部分。您失败的关键地方是,如前所述,该表达式来自文档根目录。
为了选择上下文节点,您应该以
.
开始表达式,这样.//div
将匹配任何节点它们是上下文节点的后代(即您的
$item
)。Given
query("//div[@class=\"productImg\"]/a/img/@src",$item)
it looks like you're wanting to perform a query relative to$item
. You're very nearly there, just not quite.Your query starts with
//div
which means to look for any<div>
nodes which are descendants of the document root and satisfy the remaining portion of the query. The key place where you're falling over is that this expression is, as mentioned, from the document root.In order to select the context node, you should start the expression with
.
such that.//div
would match any<div>
nodes which are descendant from the context node (i.e. your$item
).关于 HTML 的外观有太多假设,但是,我可以立即发现的一个问题是 ->item(0) 部分。该 0 需要反映所讨论的迭代。
假设 $items 始终具有数字键:
There are too many assumptions about what your HTML looks like, but, one problem I can spot right off the bat is the ->item(0) portion. That 0 needs to reflect the iteration in question.
Assuming that $items will always have numerical keys: