Perl WWW::Mechanize cookie 问题

发布于 2024-11-29 07:34:51 字数 1171 浏览 0 评论 0原文

我正在尝试自动从首先要求验证码的网站收集链接。为此，我捕获验证码图像，以便可以在外部解决它，然后将解决方案作为表单字段的一部分提交。不知怎的，它不起作用。我怀疑 cookie 有问题，但我不确定，如果有人能解决这个问题，我将不胜感激。

这是代码。首先，我创建了 mech 对象及其 cookie jar：

$cookie_jar = HTTP::Cookies->new;
$agent = WWW::Mechanize->new(cookie_jar => $cookie_jar);
$agent->get("http://www.site.com/page.html");

我找到了感兴趣的链接：

$link = $agent->find_link(tag => "a", text_regex => qr{regex});
$url = $link->url;
$agent->get($url);

在此阶段，该网站会显示一个验证码。我提取图像并保存它，以便人工可以解决它，然后输入解决方案以继续：

$captcha = $agent->find_image(url_regex => qr{captcha\.php});
$agent->get($captcha->url, ':content_file' => 'captcha.jpg');
print "Please solve captcha at http://my.own.site/captcha.jpg\n";
$agent->back;
print "Enter answer: ";
$solved = <>;

现在脚本已手动输入验证码解决方案，它可以通过提交表单来继续：

$agent->form_with_fields('code');
$agent->set_fields(code => $solved, action => 'download');
$agent->submit;

但是这不起作用。结果是页面再次询问验证码，而不是包含我想要的信息的预期页面。

我想知道当我保存验证码图像后执行 $agent->back 时，cookie 是否会丢失/重置？

感谢您的任何提示！

原文

I am trying to automate the collection of links from a site that asks for a captcha first.
For this, I capture the captcha image so it can be solved externally, and then submit the solution as part of the form fields.
Somehow it doesn't work. I suspect a cookie problem but I'm not sure and would appreciate if anyone could figure this out.

Here is the code. First I create the mech object along with its cookie jar:

$cookie_jar = HTTP::Cookies->new;
$agent = WWW::Mechanize->new(cookie_jar => $cookie_jar);
$agent->get("http://www.site.com/page.html");

I find the link of interest:

$link = $agent->find_link(tag => "a", text_regex => qr{regex});
$url = $link->url;
$agent->get($url);

At this stage the site presents a captcha. I extract the image and save it so it can be solved by a human, which then enter the solution to continue:

$captcha = $agent->find_image(url_regex => qr{captcha\.php});
$agent->get($captcha->url, ':content_file' => 'captcha.jpg');
print "Please solve captcha at http://my.own.site/captcha.jpg\n";
$agent->back;
print "Enter answer: ";
$solved = <>;

Now that the script has the captcha solution entered manually, it can continue by submitting the form:

$agent->form_with_fields('code');
$agent->set_fields(code => $solved, action => 'download');
$agent->submit;

However this doesn't work. The result is the page asking the captcha again, rather than the expected page with the info I'm after.

I am wondering if the cookie gets lost/reset when I do the $agent->back after saving the captcha image?

Thanks for any hints!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

寄居者 2024-12-06 07:34:51

我找到了一种更简单的方法来处理这个问题。如下：

使用 Web 浏览器访问该站点
解决验证码
打开生成的 cookie 并记下 sessionid
然后在脚本中，使用该 sessionid 设置 cookie

就像一个魅力一样。

$phpsessid = '4d93c8f247b49780';
$cookie_jar = HTTP::Cookies->new;
$agent = WWW::Mechanize->new(cookie_jar => $cookie_jar);
$agent->get($url);
$cookie_jar->clear;
$cookie_jar->set_cookie(undef, "SESSIONID", $sessionid, "/", $domain, undef, 1, 0, undef, 1);

I found a much easier way to handle this problem. Here it is:

Visit the site with a web browser
Solve the captcha
Open the cookie generated and note the sessionid
Then in the script, set the cookie with that sessionid

Works like a charm.

$phpsessid = '4d93c8f247b49780';
$cookie_jar = HTTP::Cookies->new;
$agent = WWW::Mechanize->new(cookie_jar => $cookie_jar);
$agent->get($url);
$cookie_jar->clear;
$cookie_jar->set_cookie(undef, "SESSIONID", $sessionid, "/", $domain, undef, 1, 0, undef, 1);

回复收藏 0 原文