通过 PHP 发送浏览器标头
如何将标头发送到网站,就像 PHP / Apache 是浏览器一样?我正在尝试抓取一个网站,但如果它来自另一台服务器,他们似乎会发送 404 错误...
或者,如果您知道从网站抓取内容的任何其他好方法?
另外,这是我当前的代码:
<?php
$curl_handle=curl_init();
curl_setopt($curl_handle,CURLOPT_URL,$_GET['url']);
curl_setopt($curl_handle, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)");
curl_setopt($curl_handle, CURLOPT_REFERER, "http://google.com");
curl_setopt($curl_handle,CURLOPT_CONNECTTIMEOUT,2);
curl_setopt($curl_handle,CURLOPT_RETURNTRANSFER,1);
$buffer = curl_exec($curl_handle);
curl_close($curl_handle);
echo $buffer;
?>
因此,我将发出一个 AJAX 请求,例如:
/spider.php?url=http://target.com
返回一个空字符串。我知道这是正确的设置,因为如果我用 twitter.com 切换目标,它就可以工作......我缺少什么才能使它看起来像一个完整的浏览器?
How can I send a header to a website as if PHP / Apache is a browser? I'm trying to scrape a site, but it looks like they send a 404 error if it's coming from another server...
Or, if you know any other good ways to scrape content from a site?
Also, here is my current code:
<?php
$curl_handle=curl_init();
curl_setopt($curl_handle,CURLOPT_URL,$_GET['url']);
curl_setopt($curl_handle, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)");
curl_setopt($curl_handle, CURLOPT_REFERER, "http://google.com");
curl_setopt($curl_handle,CURLOPT_CONNECTTIMEOUT,2);
curl_setopt($curl_handle,CURLOPT_RETURNTRANSFER,1);
$buffer = curl_exec($curl_handle);
curl_close($curl_handle);
echo $buffer;
?>
so, I'll be making an AJAX request like:
/spider.php?url=http://target.com
Which returns an empty string. I know this is setup right though because if i switch target with twitter.com it works... what am i missing to make this look like a full browser?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
对于 cURL,有一个 CURLOPT_USERAGENT 选项,
但是它也可以检查 Referer 标头,您可以通过设置
For cURL, there is CURLOPT_USERAGENT option for that,
However it may also check Referer header, which you can set via
如果您使用的是curl,则可以使用
CURLOPT_HTTPHEADER
选项,该选项采用您希望随请求发送的标头数组。如果您使用
file_get_contents()
,您可以向其传递使用stream_create_context()
。If you're using the curl, you can use the
CURLOPT_HTTPHEADER
option, which takes an array of headers you wish to send with the request.If you're using
file_get_contents()
, you can pass it a stream context created withstream_create_context()
.