curl 无法获取网页内容，为什么？

发布于 2024-07-18 18:49:11 字数 897 浏览 14 评论 0原文

我正在使用curl 脚本转到链接并获取其内容以进行进一步操作。以下是链接和curl脚本：

<?php 
$url = 'http://criminaljustice.state.ny.us/cgi/internet/nsor/fortecgi?serviceName=WebNSOR&amp;templateName=detail.htm&amp;requestingHandler=WebNSORDetailHandler&amp;ID=368343543';

//curl script to get content of given url

$ch = curl_init();

// set the target url

curl_setopt($ch, CURLOPT_URL,$url);

// request as if Firefox

curl_setopt($ch, CURLOPT_HTTPHEADER, Array("User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.15) Gecko/20080623 Firefox/2.0.0.15") ); 
curl_setopt($ch, CURLOPT_NOBODY, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result= curl_exec ($ch);
curl_close ($ch);
echo $result;
?>

但是网站并没有通过脚本排除它，它在结果中给用户带来异常，但是如果我们通常将url粘贴到浏览器中，它就可以完美地打开页面。

请帮忙，我在这里做错了什么。

感谢致敬

原文

i am using a curl script to go to a link and get its content for further manipulation. following is the link and curl script:

<?php 
$url = 'http://criminaljustice.state.ny.us/cgi/internet/nsor/fortecgi?serviceName=WebNSOR&templateName=detail.htm&requestingHandler=WebNSORDetailHandler&ID=368343543';

//curl script to get content of given url

$ch = curl_init();

// set the target url

curl_setopt($ch, CURLOPT_URL,$url);

// request as if Firefox

curl_setopt($ch, CURLOPT_HTTPHEADER, Array("User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.15) Gecko/20080623 Firefox/2.0.0.15") ); 
curl_setopt($ch, CURLOPT_NOBODY, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result= curl_exec ($ch);
curl_close ($ch);
echo $result;
?>

but the website is not excepting it through script it is giving user exception in result, but if we normally paste the url in browser it is opening the page perfectly alright.

Please help, what i am doing wrong here.

Thanks and regards

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

眼泪都笑了 2024-07-25 18:49:11

我运行了以下程序/脚本并且页面已正确下载。这很可能意味着您运行脚本的服务器无法到达“criminaljustice.state.ny.us”处的服务器。这要么是因为您的服务器配置错误，要么是他们的服务器明确阻止您，这是积极的屏幕抓取的常见结果。

<?php
$url = 'http://criminaljustice.state.ny.us/cgi/internet/nsor/fortecgi?serviceName=WebNSOR&templateName=detail.htm&requestingHandler=WebNSORDetailHandler&ID=368343543';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_HTTPHEADER, Array("User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.15) Gecko/20080623 Firefox/2.0.0.15") ); 
curl_setopt($ch, CURLOPT_NOBODY, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result= curl_exec ($ch);
curl_close ($ch);
echo $result;

其他故障排除提示 - 如果您对运行 PHP 脚本的计算机有 shell 访问权限，请运行以下命令。

curl -I 'http://criminaljustice.state.ny.us/cgi/internet/nsor/fortecgi?serviceName=WebNSOR&templateName=detail.htm&requestingHandler=WebNSORDetailHandler&ID=368343543'

这将输出响应标头，其中可能包含一些有关请求失败原因的线索。

I ran the following program/script and the page was downloaded correctly. This most likely means the server you're running your script from can't reach the server at "criminaljustice.state.ny.us". This is either because your server is mis-configured, or their server is explicitly blocking you, which is a common result of aggressive screen scraping.

<?php
$url = 'http://criminaljustice.state.ny.us/cgi/internet/nsor/fortecgi?serviceName=WebNSOR&templateName=detail.htm&requestingHandler=WebNSORDetailHandler&ID=368343543';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_HTTPHEADER, Array("User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.15) Gecko/20080623 Firefox/2.0.0.15") ); 
curl_setopt($ch, CURLOPT_NOBODY, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result= curl_exec ($ch);
curl_close ($ch);
echo $result;

Additional troubleshooting tip -- if you have shell access to the machine your PHP script is running from, run the following command

curl -I 'http://criminaljustice.state.ny.us/cgi/internet/nsor/fortecgi?serviceName=WebNSOR&templateName=detail.htm&requestingHandler=WebNSORDetailHandler&ID=368343543'

This will output the response headers, which may contain some clue as to why your request is failing.

回复收藏 0 原文

难以启齿的温柔 2024-07-25 18:49:11

对于 useragent 我认为你想使用 CURLOPT_USERAGENT 常量

curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");

For useragent i think you want to use the CURLOPT_USERAGENT constant

curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");

回复收藏 0 原文

木森分化 2024-07-25 18:49:11

我遇到了同样的问题，最终导致 followlocation 选项未设置。我认为默认情况下，curl 会将其设置为 true，但我想不会！？
一旦我设置它，它就得到了完整的网站没有问题

回复收藏 0 原文

冰雪之触 2024-07-25 18:49:11

用户代理是否应该位于这样的数组中？我以前没见过这样做的。

尝试只使用纯字符串，即

curl_setopt($ch, CURLOPT_HTTPHEADER, 'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.15) Gecko/20080623 Firefox/2.0.0.15');

Is the user agent meant to be in an array like that? I haven't seen it done like that before.

Try just using a plain string, i.e.

curl_setopt($ch, CURLOPT_HTTPHEADER, 'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.15) Gecko/20080623 Firefox/2.0.0.15');

回复收藏 0 原文

~没有更多了~

关于作者

烟火散人牵绊

暂无简介

文章

24 人气

关注发私信

友情链接

文江博客

curl 无法获取网页内容，为什么？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

泪是无色的血

yriii2

1649543945

g红火

嘿哥们儿

旧城烟雨

友情链接

curl 无法获取网页内容，为什么？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

泪是无色的血

yriii2

1649543945

g红火

嘿哥们儿

旧城烟雨

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。