通过 get_file_contents() 解析 html

发布于 2024-12-17 19:37:57 字数 374 浏览 0 评论 0原文

有人告诉我们,解析 html 的最佳方法是通过 DOM,如下所示:

<?

$html = "<span>Text</span>";
$doc = new DOMDocument();
$doc->loadHTML( $html);

$elements = $doc->getElementsByTagName("span");
foreach( $elements as $el)
{
    echo $el->nodeValue . "\n";
}


?>

但是在上面的变量 $html 不能是 url,或者可以吗? 我是否必须使用函数 get_file_contents() 来获取页面的 html?

is have been told that the best way to parse html is through DOM like this:

<?

$html = "<span>Text</span>";
$doc = new DOMDocument();
$doc->loadHTML( $html);

$elements = $doc->getElementsByTagName("span");
foreach( $elements as $el)
{
    echo $el->nodeValue . "\n";
}


?>

but in the above the variable $html can't be a url, or can it??
wouldnt i have to use to function get_file_contents() to get the html of a page?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

扶醉桌前 2024-12-24 19:37:57

您必须使用 DOMDocument::loadHTMLFile 从 URL 加载 HTML。

$doc = new DOMDocument();
$doc->loadHTMLFile($path);

DOMDocument::loadHTML 解析 HTML 字符串。

$doc = new DOMDocument();
$doc->loadHTML(file_get_contents($path));

You have to use DOMDocument::loadHTMLFile to load HTML from an URL.

$doc = new DOMDocument();
$doc->loadHTMLFile($path);

DOMDocument::loadHTML parses a string of HTML.

$doc = new DOMDocument();
$doc->loadHTML(file_get_contents($path));
毁梦 2024-12-24 19:37:57

可以,但这取决于 PHP 安装中是否启用了allow_url_fopen。基本上所有基于 PHP 文件的函数都可以接受 URL 作为源(或目标)。这样的 URL 是否有意义取决于您想要做什么。

例如,执行 file_put_contents('http://google.com') 是行不通的,因为您会尝试向 Google 进行 HTTP 上传,而他们不会允许您这样做替换他们的主页...

但是执行 $dom->loadHTML('http://google.com'); 会起作用,并且会将 google 的主页吸入 DOM 中进行处理。

It can be, but it depends on allow_url_fopen being enabled in your PHP install. Basically all of the PHP file-based functions can accept a URL as a source (or destination). Whether such a URL makes sense is up to what you're trying to do.

e.g. doing file_put_contents('http://google.com') is not going to work, as you'd be attempting to do an HTTP upload to google, and they're not going allow you to replace their homepage...

but doing $dom->loadHTML('http://google.com'); would work, and would suck in google's homepage into DOM for processing.

南汐寒笙箫 2024-12-24 19:37:57

如果您在使用 DOM 时遇到问题,可以使用 CURL 进行解析。例如:

$url = "http://www.davesdaily.com/";

$curl = curl_init(); 
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($curl, CURLOPT_URL, $url);
$input = curl_exec($curl);

$regexp = "<span class=comment>([^<]*)<\/span>";
if(preg_match_all("/$regexp/siU", $input, $matches, PREG_SET_ORDER)) {
  foreach($matches as $match);
}
  echo $match[0];

脚本应抓取 之间的文本,并将其存储在数组 $match。这应该与娱乐相呼应。

If you're having trouble using DOM, you could use CURL to parse. For example:

$url = "http://www.davesdaily.com/";

$curl = curl_init(); 
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($curl, CURLOPT_URL, $url);
$input = curl_exec($curl);

$regexp = "<span class=comment>([^<]*)<\/span>";
if(preg_match_all("/$regexp/siU", $input, $matches, PREG_SET_ORDER)) {
  foreach($matches as $match);
}
  echo $match[0];

The script should grab the text between <span class=comment> and </span> and store inside an array $match. This should echo Entertainment.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文