当前位置：文江博客话题详情

PHP 函数获取
内的所有链接在远程站点上使用抓取方法

发布于 2024-09-29 06:01:06 字数 166 浏览 8 评论 0原文

有人有一个 PHP 函数可以抓取远程站点上特定 DIV 内的所有链接吗？所以用法可能是：

$links =grab_links($url,$divname);

并返回一个我可以使用的数组。抓取链接我可以弄清楚但不知道如何让它只在特定的 div 内执行。

谢谢！斯科特

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

丿*梦醉红颜 2024-10-06 06:01:06

查看 PHP XPath。它可以让您查询文档中特定标签的内容等。 php 网站上的示例非常简单：
http://php.net/manual/en/simplexmlelement.xpath.php

下面的示例实际上将获取文档中任何 DIV 中的所有 URL：

$xml = new SimpleXMLElement($docAsString);

$result = $xml->xpath('//div//a');

您可以在格式正确的 HTML 文件上使用它，而不仅仅是 XML。

良好的 XPath 参考： http://msdn.microsoft.com/en-us/library /ms256086.aspx

Check out PHP XPath. It will let you query a document for the contents of specific tags and so on. The example on the php site is pretty straightforward:
http://php.net/manual/en/simplexmlelement.xpath.php

This following example will actually grab all of the URLs in any DIVs in a doc:

$xml = new SimpleXMLElement($docAsString);

$result = $xml->xpath('//div//a');

You can use this on well-formed HTML files, not just XML.

Good XPath reference: http://msdn.microsoft.com/en-us/library/ms256086.aspx

回复收藏 0 原文

維他命╮ 2024-10-06 06:01:06

过去我成功地使用了 PHP Simple DOM 库：

http://simplehtmldom.sourceforge.net/

样品：

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');

// Find all images 
foreach($html->find('img') as $element) 
       echo $element->src . '<br>';

// Find all links 
foreach($html->find('a') as $element) 
       echo $element->href . '<br>';

In the past I have use the PHP Simple DOM library with success:

http://simplehtmldom.sourceforge.net/

Samples:

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');

// Find all images 
foreach($html->find('img') as $element) 
       echo $element->src . '<br>';

// Find all links 
foreach($html->find('a') as $element) 
       echo $element->href . '<br>';

回复收藏 0 原文

魂归处 2024-10-06 06:01:06

我发现一些东西似乎可以满足我的要求。

http://www.earthinfo.org/xpaths-with-php-by- example/

<?php

$html = new DOMDocument();
@$html->loadHtmlFile('http://www.bbc.com');
$xpath = new DOMXPath( $html );
$nodelist = $xpath->query( "//div[@id='news_moreTopStories']//a/@href" );
foreach ($nodelist as $n){
echo $n->nodeValue."\n";
}

// for images

echo "<br><br>";
$html = new DOMDocument();
@$html->loadHtmlFile('http://www.bbc.com');
$xpath = new DOMXPath( $html );
$nodelist = $xpath->query( "//div[@id='promo_area']//img/@src" );
foreach ($nodelist as $n){
echo $n->nodeValue."\n";
}

?>

我也尝试过 PHP DOM 方法，它似乎更快...

http://w-shadow.com/blog/2009/10/20/how-to-extract-html-tags-and-their -attributes-with-php/

$html = file_get_contents('http://www.bbc.com');
//Create a new DOM document
$dom = new DOMDocument;

//Parse the HTML. The @ is used to suppress any parsing errors
//that will be thrown if the $html string isn't valid XHTML.
@$dom->loadHTML($html);

//Get all links. You could also use any other tag name here,
//like 'img' or 'table', to extract other tags.
$links = $dom->getElementById('news_moreTopStories')->getElementsByTagName('a');

//Iterate over the extracted links and display their URLs
foreach ($links as $link){
    //Extract and show the "href" attribute. 
    echo $link->getAttribute('href'), '<br>';
}

I found something that seems to do what I wanted.

http://www.earthinfo.org/xpaths-with-php-by-example/

<?php

$html = new DOMDocument();
@$html->loadHtmlFile('http://www.bbc.com');
$xpath = new DOMXPath( $html );
$nodelist = $xpath->query( "//div[@id='news_moreTopStories']//a/@href" );
foreach ($nodelist as $n){
echo $n->nodeValue."\n";
}

// for images

echo "<br><br>";
$html = new DOMDocument();
@$html->loadHtmlFile('http://www.bbc.com');
$xpath = new DOMXPath( $html );
$nodelist = $xpath->query( "//div[@id='promo_area']//img/@src" );
foreach ($nodelist as $n){
echo $n->nodeValue."\n";
}

?>

I also tried PHP DOM method and it seems faster...

http://w-shadow.com/blog/2009/10/20/how-to-extract-html-tags-and-their-attributes-with-php/

$html = file_get_contents('http://www.bbc.com');
//Create a new DOM document
$dom = new DOMDocument;

//Parse the HTML. The @ is used to suppress any parsing errors
//that will be thrown if the $html string isn't valid XHTML.
@$dom->loadHTML($html);

//Get all links. You could also use any other tag name here,
//like 'img' or 'table', to extract other tags.
$links = $dom->getElementById('news_moreTopStories')->getElementsByTagName('a');

//Iterate over the extracted links and display their URLs
foreach ($links as $link){
    //Extract and show the "href" attribute. 
    echo $link->getAttribute('href'), '<br>';
}

回复收藏 0 原文

~没有更多了~