通过 PHP 检索网站源代码的最有效方法? (获取请求)

发布于 2024-07-15 22:17:37 字数 700 浏览 6 评论 0原文

我知道 file_get_contents 可用于检索网页的源,但我想知道最有效的方法。

我很久以前制作的一个旧课程使用了这样的内容:

    $this->socket = fsockopen($this->host, 80);

    fputs($this->socket, 'GET ' . $this->target . ' HTTP/1.0' . "\n");
    fputs($this->socket, 'Host: ' . $this->host . "\n"); 
    fputs($this->socket, 'User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b5) Gecko/2008050509 Firefox/3.0b5' . "\n");
    fputs($this->socket, 'Connection: close' . "\n\n");

    $this->source = '';

    while(!feof($this->socket))
    {
        $this->source .= fgets($this->socket, 128);
    }

    fclose($this->socket);

这是最好的方法吗? 我所说的最高效是指返回最快的结果。

I know that file_get_contents can be used to retrieve the source of a webpage, but I want to know the most efficient way.

I have an old class I made a long time ago that uses something like this:

    $this->socket = fsockopen($this->host, 80);

    fputs($this->socket, 'GET ' . $this->target . ' HTTP/1.0' . "\n");
    fputs($this->socket, 'Host: ' . $this->host . "\n"); 
    fputs($this->socket, 'User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b5) Gecko/2008050509 Firefox/3.0b5' . "\n");
    fputs($this->socket, 'Connection: close' . "\n\n");

    $this->source = '';

    while(!feof($this->socket))
    {
        $this->source .= fgets($this->socket, 128);
    }

    fclose($this->socket);

Is this the best way? By most efficient I mean returns the fastest results.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

謌踐踏愛綪 2024-07-22 22:17:37

file_get_contents() 是最好、最有效的方法。 但是,无论哪种方式,都没有太大区别,因为瓶颈是网络,而不是处理器。 代码的可读性也应该是一个问题。

也考虑这个基准:http://www.ebrueggeman.com/php_benchmarking_fopen.php

file_get_contents() is the best and most efficient way. But, either way, there is not much difference because the bottleneck is the network, not the processor. Code readability should also be a concern.

Consider this benchmark as well: http://www.ebrueggeman.com/php_benchmarking_fopen.php

南冥有猫 2024-07-22 22:17:37

您拥有的代码可能是执行您所讨论的操作的最快且最简单的方法。 但是,如果您想要执行更复杂的任务(例如发布或支持 HTTP 1.1 内容,例如内容编码和传输编码),它就不是很灵活。

如果您想要处理更复杂的情况等,请使用 php cURL

The code you have is probably the fastest and simplest way of doing what you're talking about. However, it isn't very flexible if you want to do more complex tasks (like posting, or supporting HTTP 1.1 stuff like Content-Encoding and Transfer-Encoding).

If you want something that will handle more complex cases and such, use php cURL.

梦醒时光 2024-07-22 22:17:37

没有把握? 我们来测试一下! 下面的脚本使用两种方法打开 example.org 10 次:

$t = microtime(true);
$array = array();
for($i = 0; $i < 10; $i++) {
    $source = file_get_contents('http://www.example.org');
}
print microtime(true) - $t;
print '<br>';
$t = microtime(true);
$array = array();
for($i = 0; $i < 10; $i++) {
    $socket = fsockopen('www.example.org', 80);
    fputs($socket, 'GET / HTTP/1.0' . "\n");
    fputs($socket, 'Host: www.example.org' . "\n"); 
    fputs($socket, 'User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b5) Gecko/2008050509 Firefox/3.0b5' . "\n");
    fputs($socket, 'Connection: close' . "\n\n");
    $source = '';
    while(!feof($socket)) {
        $source .= fgets($socket, 128);
    }
    fclose($socket);
}
print microtime(true) - $t;

第一次运行:

file_get_contents: 3.4470698833466
fsockopen: 6.3937518596649

第二次运行:

file_get_contents: 3.5667569637299
fsockopen: 6.4959270954132

第三次运行

file_get_contents: 3.4623680114746
fsockopen: 6.4249370098114

因此,由于 file_get_contents 更快、更简洁,我'我要宣布它获胜!

Not sure? Let's test! The script below opens up example.org 10 times using both methods:

$t = microtime(true);
$array = array();
for($i = 0; $i < 10; $i++) {
    $source = file_get_contents('http://www.example.org');
}
print microtime(true) - $t;
print '<br>';
$t = microtime(true);
$array = array();
for($i = 0; $i < 10; $i++) {
    $socket = fsockopen('www.example.org', 80);
    fputs($socket, 'GET / HTTP/1.0' . "\n");
    fputs($socket, 'Host: www.example.org' . "\n"); 
    fputs($socket, 'User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b5) Gecko/2008050509 Firefox/3.0b5' . "\n");
    fputs($socket, 'Connection: close' . "\n\n");
    $source = '';
    while(!feof($socket)) {
        $source .= fgets($socket, 128);
    }
    fclose($socket);
}
print microtime(true) - $t;

1st run:

file_get_contents: 3.4470698833466
fsockopen: 6.3937518596649

2nd run:

file_get_contents: 3.5667569637299
fsockopen: 6.4959270954132

3rd run

file_get_contents: 3.4623680114746
fsockopen: 6.4249370098114

So since file_get_contents is faster and more concise I'm going to declare it the winner!

小…红帽 2024-07-22 22:17:37

另请检查 Zend Framework 的 Zend_Http_Client类。 它支持重定向等。

Check also Zend Framework's Zend_Http_Client class. It supports redirects etc.

能否归途做我良人 2024-07-22 22:17:37

使用这样的自制代码,您不会获得比内置 file_get_contents 更好的性能。 事实上,短至 128 字节的字符串的常量连接(?为什么?)的性能会相当糟糕。

对于 HTTP,有的理由需要自己动手或使用外部库,例如:

  • 您需要控制网络超时

  • 您想要直接从套接字流式传输内容而不是累积它

但性能不是其中之一; 简单的内置PHP函数只会受到网络速度的限制,这是你无能为力的。

You won't get better performance than the built-in file_get_contents with homebrew code like this. Indeed, the constant concatenation on strings as short as 128 bytes (? why?) will perform rather badly.

For HTTP there are reasons to Do It Yourself or use an external library, for example:

  • you need control over network timeouts

  • you want to stream content directly from the socket instead of accumulating it

but performance isn't one of them; the simple built-in PHP function will be limited only by the network speed, which is something you can't do anything about.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文