如何从其兄弟节点获取img的src和数据

发布于 2024-11-09 13:01:55 字数 374 浏览 8 评论 0原文

<?php 
$htmlget = new DOMDocument();

@$htmlget->loadHtmlFile(http://www.amazon.com);

$xpath = new DOMXPath( $htmlget);
$nodelist = $xpath->query( "//img/@src" );

foreach ($nodelist as $images){
    $value = $images->nodeValue;
}
?>

我获得了所有 img 标签，但如何获取图像所在同一元素的信息？

例如，在 amazon.com 上，有一个 kindle。我现在有图片，但需要它的相关信息，例如价格说明。

原文

<?php 
$htmlget = new DOMDocument();

@$htmlget->loadHtmlFile(http://www.amazon.com);

$xpath = new DOMXPath( $htmlget);
$nodelist = $xpath->query( "//img/@src" );

foreach ($nodelist as $images){
    $value = $images->nodeValue;
}
?>

I got all img tags, but how do I get the information around the same element the image is in?

For example, on amazon.com, there's a kindle. I have the picture, now but need the information around it, such as the price description.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

回忆追雨的时光 2024-11-16 13:01:55

这取决于所请求页面的标记，这里是获取亚马逊价格的示例：

<?php
       $htmlget = new DOMDocument();

       @$htmlget->loadHtmlFile('http://www.amazon.com');

       $xpath = new DOMXPath( $htmlget);
       $nodelist = $xpath->query( "//img/@src" );

        foreach ($nodelist as $imageSrc){

      //fetch images with a parent node that has class "imagecontainer"
      if($imageSrc->parentNode->parentNode->getAttribute('class')=='imageContainer')
      {
        //skip dummy-images
        if(strstr($imageSrc->nodeValue,'transparent-pixel'))continue;

        //point to the common anchestor of image and product-details
        $wrapper=$imageSrc->parentNode->parentNode->parentNode->parentNode->parentNode;

        //fetch the price
        $price=$xpath->query( 'span[@class="red t14"]',$wrapper );
        if($price->length )
        {
           echo '<br/><img src="'.$imageSrc->nodeValue.'">'.$price->item(0)->nodeValue.'<br/>';
        };
      }
}
?>

但是，您不应该以这种方式解析页面。如果他们想向您提供一些信息，通常会有 API。如果没有，他们不想让你抢任何东西。这种解析方式并不可靠，所请求页面的标记每秒都可能发生变化（您也可能为漏洞利用打开一扇门）。它也可能不合法。

It depends on the markup of the requested page, here an example for getting the price on amazon:

<?php
       $htmlget = new DOMDocument();

       @$htmlget->loadHtmlFile('http://www.amazon.com');

       $xpath = new DOMXPath( $htmlget);
       $nodelist = $xpath->query( "//img/@src" );

        foreach ($nodelist as $imageSrc){

      //fetch images with a parent node that has class "imagecontainer"
      if($imageSrc->parentNode->parentNode->getAttribute('class')=='imageContainer')
      {
        //skip dummy-images
        if(strstr($imageSrc->nodeValue,'transparent-pixel'))continue;

        //point to the common anchestor of image and product-details
        $wrapper=$imageSrc->parentNode->parentNode->parentNode->parentNode->parentNode;

        //fetch the price
        $price=$xpath->query( 'span[@class="red t14"]',$wrapper );
        if($price->length )
        {
           echo '<br/><img src="'.$imageSrc->nodeValue.'">'.$price->item(0)->nodeValue.'<br/>';
        };
      }
}
?>

But however, you shouldn't parse pages that way. If they want to provide you some information, the ususally have an API. If not, they don't want you to grab anything. Parsing that way is not reliable, the markup of the requested page can change every second(you may open a door for exploits too). It also may not be legal .

回复收藏 0 原文

~没有更多了~