curl 无法获取网页内容,为什么?
我正在使用curl 脚本转到链接并获取其内容以进行进一步操作。 以下是链接和curl脚本:
<?php
$url = 'http://criminaljustice.state.ny.us/cgi/internet/nsor/fortecgi?serviceName=WebNSOR&templateName=detail.htm&requestingHandler=WebNSORDetailHandler&ID=368343543';
//curl script to get content of given url
$ch = curl_init();
// set the target url
curl_setopt($ch, CURLOPT_URL,$url);
// request as if Firefox
curl_setopt($ch, CURLOPT_HTTPHEADER, Array("User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.15) Gecko/20080623 Firefox/2.0.0.15") );
curl_setopt($ch, CURLOPT_NOBODY, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result= curl_exec ($ch);
curl_close ($ch);
echo $result;
?>
但是网站并没有通过脚本排除它,它在结果中给用户带来异常,但是如果我们通常将url粘贴到浏览器中,它就可以完美地打开页面。
请帮忙,我在这里做错了什么。
感谢致敬
i am using a curl script to go to a link and get its content for further manipulation. following is the link and curl script:
<?php
$url = 'http://criminaljustice.state.ny.us/cgi/internet/nsor/fortecgi?serviceName=WebNSOR&templateName=detail.htm&requestingHandler=WebNSORDetailHandler&ID=368343543';
//curl script to get content of given url
$ch = curl_init();
// set the target url
curl_setopt($ch, CURLOPT_URL,$url);
// request as if Firefox
curl_setopt($ch, CURLOPT_HTTPHEADER, Array("User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.15) Gecko/20080623 Firefox/2.0.0.15") );
curl_setopt($ch, CURLOPT_NOBODY, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result= curl_exec ($ch);
curl_close ($ch);
echo $result;
?>
but the website is not excepting it through script it is giving user exception in result, but if we normally paste the url in browser it is opening the page perfectly alright.
Please help, what i am doing wrong here.
Thanks and regards
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我运行了以下程序/脚本并且页面已正确下载。 这很可能意味着您运行脚本的服务器无法到达“criminaljustice.state.ny.us”处的服务器。 这要么是因为您的服务器配置错误,要么是他们的服务器明确阻止您,这是积极的屏幕抓取的常见结果。
其他故障排除提示 - 如果您对运行 PHP 脚本的计算机有 shell 访问权限,请运行以下命令。
这将输出响应标头,其中可能包含一些有关请求失败原因的线索。
I ran the following program/script and the page was downloaded correctly. This most likely means the server you're running your script from can't reach the server at "criminaljustice.state.ny.us". This is either because your server is mis-configured, or their server is explicitly blocking you, which is a common result of aggressive screen scraping.
Additional troubleshooting tip -- if you have shell access to the machine your PHP script is running from, run the following command
This will output the response headers, which may contain some clue as to why your request is failing.
对于 useragent 我认为你想使用 CURLOPT_USERAGENT 常量
For useragent i think you want to use the CURLOPT_USERAGENT constant
我遇到了同样的问题,最终导致 followlocation 选项未设置。 我认为默认情况下,curl 会将其设置为 true,但我想不会!?
一旦我设置它,它就得到了完整的网站没有问题
I had the same issue which ended up being the followlocation option not being set. I thought curl would set it to true by default but I guess not!?
Once I set it it got the full site no problem
用户代理是否应该位于这样的数组中? 我以前没见过这样做的。
尝试只使用纯字符串,即
Is the user agent meant to be in an array like that? I haven't seen it done like that before.
Try just using a plain string, i.e.