使用 Zend_Dom 作为屏幕抓取工具

发布于 2024-10-05 12:06:20 字数 1031 浏览 0 评论 0原文

如何?

更重要的是......

这个:

$url = 'http://php.net/manual/en/class.domelement.php';
$client = new Zend_Http_Client($url);
$response = $client->request();
$html = $response->getBody();
$dom = new Zend_Dom_Query($html);
$result = $dom->query('div.note');
Zend_Debug::dump($result);

给了我这个:

object(Zend_Dom_Query_Result)#867 (7) {
  ["_count":protected] => NULL
  ["_cssQuery":protected] => string(8) "div.note"
  ["_document":protected] => object(DOMDocument)#79 (0) {
  }
  ["_nodeList":protected] => object(DOMNodeList)#864 (0) {
  }
  ["_position":protected] => int(0)
  ["_xpath":protected] => NULL
  ["_xpathQuery":protected] => string(33) "//div[contains(@class, ' note ')]"
}

我一生都无法弄清楚如何用这个做任何事情。

我想提取检索到的数据的各个部分(即带有“note”类的 div 以及其中的任何元素...如文本和 url),但无法使任何内容正常工作。

有人向我指出 php.net 上的 DOMElement 类,但是当我尝试使用提到的一些方法时,我无法让事情正常工作。我如何从页面中抓取一大块 html 并通过它抓取各个部分?我如何检查我要拿回来的这个物体,以便我至少可以弄清楚里面有什么?

海尔普?

How?

More to the point...

this:

$url = 'http://php.net/manual/en/class.domelement.php';
$client = new Zend_Http_Client($url);
$response = $client->request();
$html = $response->getBody();
$dom = new Zend_Dom_Query($html);
$result = $dom->query('div.note');
Zend_Debug::dump($result);

gives me this:

object(Zend_Dom_Query_Result)#867 (7) {
  ["_count":protected] => NULL
  ["_cssQuery":protected] => string(8) "div.note"
  ["_document":protected] => object(DOMDocument)#79 (0) {
  }
  ["_nodeList":protected] => object(DOMNodeList)#864 (0) {
  }
  ["_position":protected] => int(0)
  ["_xpath":protected] => NULL
  ["_xpathQuery":protected] => string(33) "//div[contains(@class, ' note ')]"
}

And I cannot for the life of me figure out how to do anything with this.

I want to extract the various parts of the retrieved data (that being the div with the class "note" and any of the elements inside it... like the text and urls) but cannot get anything working.

Someone pointed me to the DOMElement class over at php.net but when I try using some of the methods mentioned, I can't get things to work. How would I grab a chunk of html from a page and go through it grabbing the various parts? How do I inspect this object I am getting back so I can at least figure out what is in it?

Hjälp?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

薄暮涼年 2024-10-12 12:06:20

Zend_Dom_Query_ResultIterator 实现为每次迭代返回一个 DOMElement 对象:

foreach ($result as $element) {
    var_dump($element instanceof DOMElement); // always true
}

$element 变量中,您可以使用任何 DOMElement 方法:

foreach ($result as $element) {
    echo 'Element Id: '.$element->getAttribute('id').PHP_EOL;
    if ($element->hasChildNodes()) {
        echo 'Element has child nodes'.PHP_EOL;
    }
    $aNodes = $element->getElementsByTagName('a');
    // etc
}

您还可以访问 文档元素,或者您可以使用 Zend_Dom_Query_Result 来执行此操作:

$document1 = $element->ownerDocument;
$document2 = $result->getDocument();
var_dump($document1 === $document2); // true
echo $document1->saveHTML();

The Iterator implementation of Zend_Dom_Query_Result returns a DOMElement object for each iteration:

foreach ($result as $element) {
    var_dump($element instanceof DOMElement); // always true
}

From the $element variable, you can use any DOMElement method:

foreach ($result as $element) {
    echo 'Element Id: '.$element->getAttribute('id').PHP_EOL;
    if ($element->hasChildNodes()) {
        echo 'Element has child nodes'.PHP_EOL;
    }
    $aNodes = $element->getElementsByTagName('a');
    // etc
}

You can also access the document element, or you can use Zend_Dom_Query_Result to do so:

$document1 = $element->ownerDocument;
$document2 = $result->getDocument();
var_dump($document1 === $document2); // true
echo $document1->saveHTML();
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文