尝试使用 perl 中的 get 获取网页源
我试图在我的 Perl 代码中获取网页的源代码,基本上这个网站是本地服务器,链接是 http://gold.star.com/isos/preFCS5.4/LASTESTDMS/ 我能够 ping 通服务器,但我的代码中的 get 命令似乎无法得到这里的页面源是我正在尝试使用的代码
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
my $dmsurl = 'http://gold.star.com/isos/preFCS5.4/LATESTDMS/';
my $page = get($dmsurl) or die "cannot\n";
print $page;
每次运行此代码时,我都会收到消息“无法”,但当我在 opens 中尝试使用浏览器时会收到相同的链接,但在代码中它不起作用。
I'm trying to get source of a webpage in my perl code, basically this website is a local server and the link is http://gold.star.com/isos/preFCS5.4/LASTESTDMS/ I'm able to ping the server, but the get command in my code doesn't seem to be getting the page source here is the code I'm trying with
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
my $dmsurl = 'http://gold.star.com/isos/preFCS5.4/LATESTDMS/';
my $page = get($dmsurl) or die "cannot\n";
print $page;
Every-times I run this code I get the message "Cannot" but the same link when I try with my browser in opens , but in code its not working.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您的网站可能会阻止您的脚本,因为它认为它是机器人。您想通过查看 LWP 从您的站点获取的状态代码来找出答案。不幸的是,您无法使用
get
来做到这一点。您可以使用getprint
和getstore
。getprint
如果失败的话会显示状态码,所以打印状态消息有点多余。有关$rc
的更多信息,请参阅 HTTP::Status。It's possible your site is blocking your script because it thinks its a bot. You want to find out by looking at the status code LWP is getting from your site. Unfortunately, you can't do that with
get
. You can withgetprint
andgetstore
.getprint
will display the status code if it fails, so printing the status message is a bit redundant. For more on$rc
, see HTTP::Status.例如,目标站点可能会检查 User-Agent 字段并响应 404 HTTP 错误。
我建议您设置 User-Agent (使用 WWW::Mechanize):
The target site may check User-Agent field and response with 404 HTTP error for example.
I'll recommend you to set User-Agent (using WWW::Mechanize):