使用 Zend_Dom 作为屏幕抓取工具

发布于 2024-10-05 12:06:20 字数 1031 浏览 5 评论 0原文

如何？

更重要的是......

这个：

$url = 'http://php.net/manual/en/class.domelement.php';
$client = new Zend_Http_Client($url);
$response = $client->request();
$html = $response->getBody();
$dom = new Zend_Dom_Query($html);
$result = $dom->query('div.note');
Zend_Debug::dump($result);

给了我这个：

object(Zend_Dom_Query_Result)#867 (7) {
  ["_count":protected] => NULL
  ["_cssQuery":protected] => string(8) "div.note"
  ["_document":protected] => object(DOMDocument)#79 (0) {
  }
  ["_nodeList":protected] => object(DOMNodeList)#864 (0) {
  }
  ["_position":protected] => int(0)
  ["_xpath":protected] => NULL
  ["_xpathQuery":protected] => string(33) "//div[contains(@class, ' note ')]"
}

我一生都无法弄清楚如何用这个做任何事情。

我想提取检索到的数据的各个部分（即带有“note”类的 div 以及其中的任何元素...如文本和 url），但无法使任何内容正常工作。

有人向我指出 php.net 上的 DOMElement 类，但是当我尝试使用提到的一些方法时，我无法让事情正常工作。我如何从页面中抓取一大块 html 并通过它抓取各个部分？我如何检查我要拿回来的这个物体，以便我至少可以弄清楚里面有什么？

海尔普？

原文

How?

More to the point...

this:

$url = 'http://php.net/manual/en/class.domelement.php';
$client = new Zend_Http_Client($url);
$response = $client->request();
$html = $response->getBody();
$dom = new Zend_Dom_Query($html);
$result = $dom->query('div.note');
Zend_Debug::dump($result);

gives me this:

object(Zend_Dom_Query_Result)#867 (7) {
  ["_count":protected] => NULL
  ["_cssQuery":protected] => string(8) "div.note"
  ["_document":protected] => object(DOMDocument)#79 (0) {
  }
  ["_nodeList":protected] => object(DOMNodeList)#864 (0) {
  }
  ["_position":protected] => int(0)
  ["_xpath":protected] => NULL
  ["_xpathQuery":protected] => string(33) "//div[contains(@class, ' note ')]"
}

And I cannot for the life of me figure out how to do anything with this.

I want to extract the various parts of the retrieved data (that being the div with the class "note" and any of the elements inside it... like the text and urls) but cannot get anything working.

Someone pointed me to the DOMElement class over at php.net but when I try using some of the methods mentioned, I can't get things to work. How would I grab a chunk of html from a page and go through it grabbing the various parts? How do I inspect this object I am getting back so I can at least figure out what is in it?

Hjälp?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

薄暮涼年 2024-10-12 12:06:20

Zend_Dom_Query_Result 的 Iterator 实现为每次迭代返回一个 DOMElement 对象：

foreach ($result as $element) {
    var_dump($element instanceof DOMElement); // always true
}

从 $element 变量中，您可以使用任何 DOMElement 方法：

foreach ($result as $element) {
    echo 'Element Id: '.$element->getAttribute('id').PHP_EOL;
    if ($element->hasChildNodes()) {
        echo 'Element has child nodes'.PHP_EOL;
    }
    $aNodes = $element->getElementsByTagName('a');
    // etc
}

您还可以访问文档元素，或者您可以使用 Zend_Dom_Query_Result 来执行此操作：

$document1 = $element->ownerDocument;
$document2 = $result->getDocument();
var_dump($document1 === $document2); // true
echo $document1->saveHTML();

The Iterator implementation of Zend_Dom_Query_Result returns a DOMElement object for each iteration:

foreach ($result as $element) {
    var_dump($element instanceof DOMElement); // always true
}

From the $element variable, you can use any DOMElement method:

foreach ($result as $element) {
    echo 'Element Id: '.$element->getAttribute('id').PHP_EOL;
    if ($element->hasChildNodes()) {
        echo 'Element has child nodes'.PHP_EOL;
    }
    $aNodes = $element->getElementsByTagName('a');
    // etc
}

You can also access the document element, or you can use Zend_Dom_Query_Result to do so:

$document1 = $element->ownerDocument;
$document2 = $result->getDocument();
var_dump($document1 === $document2); // true
echo $document1->saveHTML();

回复收藏 0 原文

~没有更多了~

关于作者

贵在坚持

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

使用 Zend_Dom 作为屏幕抓取工具

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

浪漫人生路

620vip

羞稚

走过海棠暮

你好刘可爱

陌若浮生

友情链接

使用 Zend_Dom 作为屏幕抓取工具

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

浪漫人生路

620vip

羞稚

走过海棠暮

你好刘可爱

陌若浮生

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。