PHP DOMXPATH &大批

发布于 2024-09-15 03:50:11 字数 667 浏览 0 评论 0原文

我试图从页面中提取所有相关的 URL 和图像并将它们放入一个数组中,下面的代码工作正常,只是它一遍又一遍地输出第一对数字正确的次数。我想也许我在指定 XPATH 时犯了错误,但我已经在 3 个不同的站点上进行了测试,每次都得到相同的结果。

$dom = new DOMDocument();
$dom->loadHtml( $html );
$xpath = new DOMXPath( $dom );

$items = $xpath->query( "//div[@class=\"row\"]" );

foreach ( $items as $item ) {

$value['url'] = $xpath->query( "//div[@class=\"productImg\"]/a/@href",$item)->item(0)->nodeValue;

$value['img'] = $xpath->query("//div[@class=\"productImg\"]/a/img/@src",$item)->item(0)->nodeValue;

$result[] = $value;


}

print_r($result);

显然代码不正确,但我无法将其缩小到有问题的部分。在有人建议使用正则表达式之前,这是我通常会做的事情,但如果可能的话,我现在更愿意使用 XPATH。

I'm trying to extract all relevant URLs and images out of a page and put them into an array, the code below works fine except it outputs the first pair over and over for the numerically-correct number of times. I thought maybe I was making mistakes when specifying XPATHs but I've tested it on 3 different sites with the same result every time.

$dom = new DOMDocument();
$dom->loadHtml( $html );
$xpath = new DOMXPath( $dom );

$items = $xpath->query( "//div[@class=\"row\"]" );

foreach ( $items as $item ) {

$value['url'] = $xpath->query( "//div[@class=\"productImg\"]/a/@href",$item)->item(0)->nodeValue;

$value['img'] = $xpath->query("//div[@class=\"productImg\"]/a/img/@src",$item)->item(0)->nodeValue;

$result[] = $value;


}

print_r($result);

Clearly the code isn't right but I haven't been able to narrow it down to the offending portion. And before somebody suggests using regex that is something I'd usually do but I'd prefer to use XPATH now if possible.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

浪漫之都 2024-09-22 03:50:11

给定 query("//div[@class=\"productImg\"]/a/img/@src",$item) 看起来您想要执行查询相对于 $item。您已经非常接近目标了,只是还没有完全实现。

您的查询以 //div 开头,这意味着查找作为文档根的后代的任何

节点,并满足其余条件查询的一部分。您失败的关键地方是,如前所述,该表达式来自文档根目录。

为了选择上下文节点,您应该以 . 开始表达式,这样 .//div 将匹配任何

节点它们是上下文节点的后代(即您的 $item)。

Given query("//div[@class=\"productImg\"]/a/img/@src",$item) it looks like you're wanting to perform a query relative to $item. You're very nearly there, just not quite.

Your query starts with //div which means to look for any <div> nodes which are descendants of the document root and satisfy the remaining portion of the query. The key place where you're falling over is that this expression is, as mentioned, from the document root.

In order to select the context node, you should start the expression with . such that .//div would match any <div> nodes which are descendant from the context node (i.e. your $item).

拥有 2024-09-22 03:50:11

关于 HTML 的外观有太多假设,但是,我可以立即发现的一个问题是 ->item(0) 部分。该 0 需要反映所讨论的迭代。

假设 $items 始终具有数字键:

foreach( $items as $key => $item ) {
 ..... item)->item($key)->nodeValue;
}

There are too many assumptions about what your HTML looks like, but, one problem I can spot right off the bat is the ->item(0) portion. That 0 needs to reflect the iteration in question.

Assuming that $items will always have numerical keys:

foreach( $items as $key => $item ) {
 ..... item)->item($key)->nodeValue;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文