如何使用 Perl LWP 抓取欢迎页面?
我正在尝试使用 Perl LWP 抓取此页面:
http://livingsocial.com/cities/86/deals/138811-hour-long-photo-session-cd-and-more
我有曾经能够处理生活的代码社交,但它似乎已经停止工作了。基本上,这个想法是抓取页面一次,获取其 cookie,在 UserAgent 中设置 cookie,然后再抓取两次。通过这样做,您可以进入欢迎页面:
$response = $browser->get($url);
$cookie_jar->extract_cookies($response);
$browser->cookie_jar($cookie_jar);
$response = $browser->get($url);
$response = $browser->get($url);
这似乎已经停止为正常的 LivingSocial 页面工作,但似乎仍然为 LivinSocialEscapes 工作。例如:
http://livingsocial.com/escapes/148029-cook- island-hotel-+-airfare
关于如何通过欢迎页面有什么建议吗?
I'm trying to crawl this page using Perl LWP:
http://livingsocial.com/cities/86/deals/138811-hour-long-photo-session-cd-and-more
I had code that used to be able to handle living social, but it seems to have stopped working. Basically the idea was to crawl the page once, get its cookie, set the cookie in the UserAgent, and crawl it twice more. By doing this, you could get through the welcome page:
$response = $browser->get($url);
$cookie_jar->extract_cookies($response);
$browser->cookie_jar($cookie_jar);
$response = $browser->get($url);
$response = $browser->get($url);
This seems to have stopped working for normal LivingSocial pages, but still seems to work for LivinSocialEscapes. E.g.,:
http://livingsocial.com/escapes/148029-cook-islands-hotel-+-airfare
Any tips on how to get past the welcome page?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
看起来此页面仅适用于支持 Javascript 的浏览器(
LWP::UserAgent
不支持)您可以尝试WWW::Mechanize::Firefox
:请注意,您必须拥有 Firefox 和 mozrepl 安装扩展程序才能使该模块正常工作。
It looks like this page only works with a Javascript enabled browser (which
LWP::UserAgent
is not) You could tryWWW::Mechanize::Firefox
instead:Note that you must have Firefox and the mozrepl extension installed for this module to work.