使用 file_get_contents 时忽略 Content-Length 标头

发布于 2025-01-06 21:47:06 字数 951 浏览 1 评论 0原文

我需要获取页面的内容,该页面始终发送 Content-Length: 0 标头,但页面永远不会为空。

file_get_contents(url) 仅返回一个空字符串。

页面返回的整个标头是:

HTTP/1.1 200 OK
X-Powered-By: PHP/5.3.10
Expires: Mon, 26 Jul 1997 05:00:00 GMT
Last-Modified: Sat, 18 Feb 2012 18:14:59 GMT
Cache-Control: no-store, no-cache, must-revalidate
Cache-Control: post-check=0, pre-check=0
Pragma: no-cache
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Sat, 18 Feb 2012 18:14:59 GMT
Server: lighttpd

是否可以使用 file_get_contents 并忽略标头,还是需要使用curl?

编辑

get_headers(url) 输出(使用print_r):

Array
(
    [0] => HTTP/1.0 200 OK
    [1] => X-Powered-By: PHP/5.3.10
    [2] => Content-type: text/html
    [3] => Content-Length: 0
    [4] => Connection: close
    [5] => Date: Sat, 18 Feb 2012 22:39:52 GMT
    [6] => Server: lighttpd
)

I need to get the contents of a page, which always sends a Content-Length: 0 header, however the page is never empty.

The file_get_contents(url) just returns an empty string.

The whole header returned by the page is:

HTTP/1.1 200 OK
X-Powered-By: PHP/5.3.10
Expires: Mon, 26 Jul 1997 05:00:00 GMT
Last-Modified: Sat, 18 Feb 2012 18:14:59 GMT
Cache-Control: no-store, no-cache, must-revalidate
Cache-Control: post-check=0, pre-check=0
Pragma: no-cache
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Sat, 18 Feb 2012 18:14:59 GMT
Server: lighttpd

Would it be possible to use file_get_contents and ignore the header or do I need to use curl?

Edit

get_headers(url) output (using print_r):

Array
(
    [0] => HTTP/1.0 200 OK
    [1] => X-Powered-By: PHP/5.3.10
    [2] => Content-type: text/html
    [3] => Content-Length: 0
    [4] => Connection: close
    [5] => Date: Sat, 18 Feb 2012 22:39:52 GMT
    [6] => Server: lighttpd
)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

萤火眠眠 2025-01-13 21:47:06

我相信,没有任何 HTTP 级别的函数无法读取这样的答案。因为它是不正确的 HTTP 答案,所以它说“我的正文是空的,不要读取它”

您肯定需要自己的基于 fread 的函数,它将以物理方式读取套接字。像这样:

$aURL    = parse_url($sURL);

if ($iHandle = fsockopen($aURL["host"], 80, $iError, $sError))
{
    $sQuery = substr($sURL, strpos($sURL, $aURL["host"]) + strlen($aURL["host"]));

    $sOut   = "GET " . (($sQuery != "") ? $sQuery : "/") . " HTTP/1.1\r\n";
    $sOut  .= "Host: " . $aURL["host"] . "\r\n";
    $sOut  .= "Connection: Close\r\n\r\n";

    fputs($iHandle, $sOut);

    while (!feof($iHandle))
    {
        $sResult .= fread($iHandle, 1024);
    }
}

然后只需剪切标题即可。

I believe, that none of HTTP-level functions can not read such an answer. Because it is incorrect HTTP answer, it says "my body is empty, dont read it"

You definitely need your own function based on fread, which will phisically read the socket. Something like this:

$aURL    = parse_url($sURL);

if ($iHandle = fsockopen($aURL["host"], 80, $iError, $sError))
{
    $sQuery = substr($sURL, strpos($sURL, $aURL["host"]) + strlen($aURL["host"]));

    $sOut   = "GET " . (($sQuery != "") ? $sQuery : "/") . " HTTP/1.1\r\n";
    $sOut  .= "Host: " . $aURL["host"] . "\r\n";
    $sOut  .= "Connection: Close\r\n\r\n";

    fputs($iHandle, $sOut);

    while (!feof($iHandle))
    {
        $sResult .= fread($iHandle, 1024);
    }
}

Then just cut the headers.

甜`诱少女 2025-01-13 21:47:06

正如 Optimist 所指出的,该问题与标头无关,而是我没有向服务器发送任何 User-Agent 标头。

发送 User-Agent 标头后,file_get_contents 工作正常,即使服务器始终返回 Content-Length: 0

诡异的。

As noted by Optimist the problem had nothing to do with the headers, but rather that I didn't send any User-Agent header to the server.

file_get_contents worked perfectly after sending User-Agent headers, even though the server always returns Content-Length: 0.

Weird.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文