PHP DOMXPATH &大批
我试图从页面中提取所有相关的 URL 和图像并将它们放入一个数组中,下面的代码工作正常,只是它一遍又一遍地输出第一对数字正确的次数。我想也许我在指定 XPATH 时犯了错误,但我已经在 3 个不同的站点上进行了测试,每次都得到相同的结果。
$dom = new DOMDocument();
$dom->loadHtml( $html );
$xpath = new DOMXPath( $dom );
$items = $xpath->query( "//div[@class=\"row\"]" );
foreach ( $items as $item ) {
$value['url'] = $xpath->query( "//div[@class=\"productImg\"]/a/@href",$item)->item(0)->nodeValue;
$value['img'] = $xpath->query("//div[@class=\"productImg\"]/a/img/@src",$item)->item(0)->nodeValue;
$result[] = $value;
}
print_r($result);
显然代码不正确,但我无法将其缩小到有问题的部分。在有人建议使用正则表达式之前,这是我通常会做的事情,但如果可能的话,我现在更愿意使用 XPATH。
I'm trying to extract all relevant URLs and images out of a page and put them into an array, the code below works fine except it outputs the first pair over and over for the numerically-correct number of times. I thought maybe I was making mistakes when specifying XPATHs but I've tested it on 3 different sites with the same result every time.
$dom = new DOMDocument();
$dom->loadHtml( $html );
$xpath = new DOMXPath( $dom );
$items = $xpath->query( "//div[@class=\"row\"]" );
foreach ( $items as $item ) {
$value['url'] = $xpath->query( "//div[@class=\"productImg\"]/a/@href",$item)->item(0)->nodeValue;
$value['img'] = $xpath->query("//div[@class=\"productImg\"]/a/img/@src",$item)->item(0)->nodeValue;
$result[] = $value;
}
print_r($result);
Clearly the code isn't right but I haven't been able to narrow it down to the offending portion. And before somebody suggests using regex that is something I'd usually do but I'd prefer to use XPATH now if possible.
给定
query("//div[@class=\"productImg\"]/a/img/@src",$item)
看起来您想要执行查询相对于$item
的。您已经非常接近目标了,只是还没有完全实现。您的查询以
//div
开头,这意味着查找作为文档根的后代的任何节点,并满足其余条件查询的一部分。您失败的关键地方是,如前所述,该表达式来自文档根目录。
为了选择上下文节点,您应该以
.
开始表达式,这样.//div
将匹配任何节点它们是上下文节点的后代(即您的
$item
)。Given
query("//div[@class=\"productImg\"]/a/img/@src",$item)
it looks like you're wanting to perform a query relative to$item
. You're very nearly there, just not quite.Your query starts with
//div
which means to look for any<div>
nodes which are descendants of the document root and satisfy the remaining portion of the query. The key place where you're falling over is that this expression is, as mentioned, from the document root.In order to select the context node, you should start the expression with
.
such that.//div
would match any<div>
nodes which are descendant from the context node (i.e. your$item
).