通过 PHP 检索网站源代码的最有效方法? (获取请求)
我知道 file_get_contents 可用于检索网页的源,但我想知道最有效的方法。
我很久以前制作的一个旧课程使用了这样的内容:
$this->socket = fsockopen($this->host, 80);
fputs($this->socket, 'GET ' . $this->target . ' HTTP/1.0' . "\n");
fputs($this->socket, 'Host: ' . $this->host . "\n");
fputs($this->socket, 'User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b5) Gecko/2008050509 Firefox/3.0b5' . "\n");
fputs($this->socket, 'Connection: close' . "\n\n");
$this->source = '';
while(!feof($this->socket))
{
$this->source .= fgets($this->socket, 128);
}
fclose($this->socket);
这是最好的方法吗? 我所说的最高效是指返回最快的结果。
I know that file_get_contents can be used to retrieve the source of a webpage, but I want to know the most efficient way.
I have an old class I made a long time ago that uses something like this:
$this->socket = fsockopen($this->host, 80);
fputs($this->socket, 'GET ' . $this->target . ' HTTP/1.0' . "\n");
fputs($this->socket, 'Host: ' . $this->host . "\n");
fputs($this->socket, 'User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b5) Gecko/2008050509 Firefox/3.0b5' . "\n");
fputs($this->socket, 'Connection: close' . "\n\n");
$this->source = '';
while(!feof($this->socket))
{
$this->source .= fgets($this->socket, 128);
}
fclose($this->socket);
Is this the best way? By most efficient I mean returns the fastest results.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
file_get_contents()
是最好、最有效的方法。 但是,无论哪种方式,都没有太大区别,因为瓶颈是网络,而不是处理器。 代码的可读性也应该是一个问题。也考虑这个基准:http://www.ebrueggeman.com/php_benchmarking_fopen.php
file_get_contents()
is the best and most efficient way. But, either way, there is not much difference because the bottleneck is the network, not the processor. Code readability should also be a concern.Consider this benchmark as well: http://www.ebrueggeman.com/php_benchmarking_fopen.php
您拥有的代码可能是执行您所讨论的操作的最快且最简单的方法。 但是,如果您想要执行更复杂的任务(例如发布或支持 HTTP 1.1 内容,例如内容编码和传输编码),它就不是很灵活。
如果您想要处理更复杂的情况等,请使用 php cURL。
The code you have is probably the fastest and simplest way of doing what you're talking about. However, it isn't very flexible if you want to do more complex tasks (like posting, or supporting HTTP 1.1 stuff like Content-Encoding and Transfer-Encoding).
If you want something that will handle more complex cases and such, use php cURL.
没有把握? 我们来测试一下! 下面的脚本使用两种方法打开
example.org
10 次:第一次运行:
第二次运行:
第三次运行
因此,由于
file_get_contents
更快、更简洁,我'我要宣布它获胜!Not sure? Let's test! The script below opens up
example.org
10 times using both methods:1st run:
2nd run:
3rd run
So since
file_get_contents
is faster and more concise I'm going to declare it the winner!另请检查 Zend Framework 的
Zend_Http_Client
类。 它支持重定向等。
Check also Zend Framework's
Zend_Http_Client
class. It supports redirects etc.使用这样的自制代码,您不会获得比内置 file_get_contents 更好的性能。 事实上,短至 128 字节的字符串的常量连接(?为什么?)的性能会相当糟糕。
对于 HTTP,有的理由需要自己动手或使用外部库,例如:
您需要控制网络超时
您想要直接从套接字流式传输内容而不是累积它
但性能不是其中之一; 简单的内置PHP函数只会受到网络速度的限制,这是你无能为力的。
You won't get better performance than the built-in file_get_contents with homebrew code like this. Indeed, the constant concatenation on strings as short as 128 bytes (? why?) will perform rather badly.
For HTTP there are reasons to Do It Yourself or use an external library, for example:
you need control over network timeouts
you want to stream content directly from the socket instead of accumulating it
but performance isn't one of them; the simple built-in PHP function will be limited only by the network speed, which is something you can't do anything about.