通过 get_file_contents() 解析 html
有人告诉我们,解析 html 的最佳方法是通过 DOM,如下所示:
<?
$html = "<span>Text</span>";
$doc = new DOMDocument();
$doc->loadHTML( $html);
$elements = $doc->getElementsByTagName("span");
foreach( $elements as $el)
{
echo $el->nodeValue . "\n";
}
?>
但是在上面的变量 $html 不能是 url,或者可以吗? 我是否必须使用函数 get_file_contents() 来获取页面的 html?
is have been told that the best way to parse html is through DOM like this:
<?
$html = "<span>Text</span>";
$doc = new DOMDocument();
$doc->loadHTML( $html);
$elements = $doc->getElementsByTagName("span");
foreach( $elements as $el)
{
echo $el->nodeValue . "\n";
}
?>
but in the above the variable $html can't be a url, or can it??
wouldnt i have to use to function get_file_contents() to get the html of a page?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您必须使用 DOMDocument::loadHTMLFile 从 URL 加载 HTML。
DOMDocument::loadHTML
解析 HTML 字符串。You have to use DOMDocument::loadHTMLFile to load HTML from an URL.
DOMDocument::loadHTML
parses a string of HTML.可以,但这取决于 PHP 安装中是否启用了allow_url_fopen。基本上所有基于 PHP 文件的函数都可以接受 URL 作为源(或目标)。这样的 URL 是否有意义取决于您想要做什么。
例如,执行
file_put_contents('http://google.com')
是行不通的,因为您会尝试向 Google 进行 HTTP 上传,而他们不会允许您这样做替换他们的主页...但是执行
$dom->loadHTML('http://google.com');
会起作用,并且会将 google 的主页吸入 DOM 中进行处理。It can be, but it depends on allow_url_fopen being enabled in your PHP install. Basically all of the PHP file-based functions can accept a URL as a source (or destination). Whether such a URL makes sense is up to what you're trying to do.
e.g. doing
file_put_contents('http://google.com')
is not going to work, as you'd be attempting to do an HTTP upload to google, and they're not going allow you to replace their homepage...but doing
$dom->loadHTML('http://google.com');
would work, and would suck in google's homepage into DOM for processing.如果您在使用 DOM 时遇到问题,可以使用
CURL
进行解析。例如:脚本应抓取
和
之间的文本,并将其存储在数组
$match
。这应该与娱乐
相呼应。If you're having trouble using DOM, you could use
CURL
to parse. For example:The script should grab the text between
<span class=comment>
and</span>
and store inside an array$match
. This should echoEntertainment
.