通过 get_file_contents() 解析 html

发布于 2024-12-17 19:37:57 字数 374 浏览 0 评论 0原文

有人告诉我们，解析 html 的最佳方法是通过 DOM，如下所示：

<?

$html = "<span>Text</span>";
$doc = new DOMDocument();
$doc->loadHTML( $html);

$elements = $doc->getElementsByTagName("span");
foreach( $elements as $el)
{
    echo $el->nodeValue . "\n";
}


?>

但是在上面的变量 $html 不能是 url，或者可以吗？我是否必须使用函数 get_file_contents() 来获取页面的 html？

原文

is have been told that the best way to parse html is through DOM like this:

<?

$html = "<span>Text</span>";
$doc = new DOMDocument();
$doc->loadHTML( $html);

$elements = $doc->getElementsByTagName("span");
foreach( $elements as $el)
{
    echo $el->nodeValue . "\n";
}


?>

but in the above the variable $html can't be a url, or can it??
wouldnt i have to use to function get_file_contents() to get the html of a page?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

扶醉桌前 2024-12-24 19:37:57

您必须使用 DOMDocument::loadHTMLFile 从 URL 加载 HTML。

$doc = new DOMDocument();
$doc->loadHTMLFile($path);

DOMDocument::loadHTML 解析 HTML 字符串。

$doc = new DOMDocument();
$doc->loadHTML(file_get_contents($path));

You have to use DOMDocument::loadHTMLFile to load HTML from an URL.

$doc = new DOMDocument();
$doc->loadHTMLFile($path);

DOMDocument::loadHTML parses a string of HTML.

$doc = new DOMDocument();
$doc->loadHTML(file_get_contents($path));

回复收藏 0 原文

毁梦 2024-12-24 19:37:57

可以，但这取决于 PHP 安装中是否启用了allow_url_fopen。基本上所有基于 PHP 文件的函数都可以接受 URL 作为源（或目标）。这样的 URL 是否有意义取决于您想要做什么。

例如，执行 file_put_contents('http://google.com') 是行不通的，因为您会尝试向 Google 进行 HTTP 上传，而他们不会允许您这样做替换他们的主页...

但是执行 $dom->loadHTML('http://google.com'); 会起作用，并且会将 google 的主页吸入 DOM 中进行处理。

回复收藏 0 原文

南汐寒笙箫 2024-12-24 19:37:57

如果您在使用 DOM 时遇到问题，可以使用 CURL 进行解析。例如：

$url = "http://www.davesdaily.com/";

$curl = curl_init(); 
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($curl, CURLOPT_URL, $url);
$input = curl_exec($curl);

$regexp = "<span class=comment>([^<]*)<\/span>";
if(preg_match_all("/$regexp/siU", $input, $matches, PREG_SET_ORDER)) {
  foreach($matches as $match);
}
  echo $match[0];

脚本应抓取和之间的文本，并将其存储在数组 $match。这应该与娱乐相呼应。

If you're having trouble using DOM, you could use CURL to parse. For example:

$url = "http://www.davesdaily.com/";

$curl = curl_init(); 
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($curl, CURLOPT_URL, $url);
$input = curl_exec($curl);

$regexp = "<span class=comment>([^<]*)<\/span>";
if(preg_match_all("/$regexp/siU", $input, $matches, PREG_SET_ORDER)) {
  foreach($matches as $match);
}
  echo $match[0];

The script should grab the text between <span class=comment> and </span> and store inside an array $match. This should echo Entertainment.

回复收藏 0 原文

~没有更多了~