在Perl中进行自动站点访问时如何获取文本形式的验证码?
我正在使用 Win32::IE:Mechanize 来尝试自动访问一些需要身份验证的网站。到目前为止我已经取得了一定的成功,例如我可以自动登录我的雅虎邮箱。但我发现很多网站都在使用某种图像验证机制,这可能称为验证码。我对他们无能为力。但我尝试自动访问的网站之一正在使用纯文本验证码。它由四位数字组成,可选择且可复制。但它们不在源文件中,可以使用
$mech->content;
我通过临时 Internet 文件中的所有文件搜索出现在网页上但不在源文件中的关键字来获取,但仍然找不到它。
知道发生了什么事吗?我怀疑验证码以某种方式隐藏在某些 cookie 文件中,但我似乎找不到它:(
以下是完成除验证码之外的所有字段要求的代码:
use warnings;
use Win32::IE::Mechanize;
my $url = "http://www.zjsmap.com/smap/smap_login.jsp";
my $eccode = "myeccode";
my $username = "myaccountname";
my $password = "mypassword";
my $verify = "I can't figure out how to let the script get the code yet"
my $mech = Win32::IE::Mechanize->new(visible=>1);
$mech->get($url);
sleep(1); #avoids undefined value error
$mech->form_name("BaseForm");
$mech->field(ECCODE => $eccode);
$mech->field(MEMBERACCOUNT => $username);
$mech->field(PASSWORD => $password);
$mech->field(verify => $verify);
$mech->click();
像往常一样,任何建议/评论都会是非常感谢:)
更新
我想出了一个不太聪明的方法来解决这个问题。请对下面发布的我自己的答案发表评论。一如既往地感谢:)
I'm playing around with Win32::IE:Mechanize to try to access some authentication-required sites automatically. So far I've achieved moderate success, for example, I can automatically log in to my yahoo mailbox. But I find many sites are using some kind of image verification mechanism, which is possibly called CAPTCHA. I can do nothing to them. But one of the sites I'm trying to auto access is using a plain-text verification code. It is comnposed of four digits, selectable and copyable. But they're not in the source file which can be fetched using
$mech->content;
I searched for the keyword that appears on the webpage but not in the source file through all the files in the Temporary Internet Files but still can't find it.
Any idea what's going on? I was suspecting that the verification code was somehow hidden in some cookie file but I can't seem to find it :(
The following is the code that completes all the fields requirements except for the verification code:
use warnings;
use Win32::IE::Mechanize;
my $url = "http://www.zjsmap.com/smap/smap_login.jsp";
my $eccode = "myeccode";
my $username = "myaccountname";
my $password = "mypassword";
my $verify = "I can't figure out how to let the script get the code yet"
my $mech = Win32::IE::Mechanize->new(visible=>1);
$mech->get($url);
sleep(1); #avoids undefined value error
$mech->form_name("BaseForm");
$mech->field(ECCODE => $eccode);
$mech->field(MEMBERACCOUNT => $username);
$mech->field(PASSWORD => $password);
$mech->field(verify => $verify);
$mech->click();
Like always any suggestions/comments would be greatly appreciated :)
UPDATE
I've figured out a not-so-smart way to solve this problem. Please comment on my own asnwer posted below. Thanks like always :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这就是他们在那里的原因。停止像你这样的程序来做自动化的事情;-)
This is the reason why they are there. To stop program like yours to do automated stuff ;-)
这似乎是一个无关紧要的数字。该页面在 3 个地方使用它:生成它;将其显示在表单的输入字段旁边;并检查输入值是否等于所选的随机数。也就是说,它是仅限客户的检查。不过,如果你禁用 javascript,我猜,重要的 cookie 似乎不会被设置。如果您可以在页面上下文中执行 JavaScript(您应该能够使用 get 方法调用和 JavaScript URI),则可以将 random_number 的值更改为 fe 42 并将其填写在表单上。
This appears to be an irrelevant number. The page uses it in 3 places: generating it; displaying it on the form next to the input field for it; and checking for the input value being equal to the random number chosen. That is, it is a client-only check. Still, if you disable javascript it looks like, I'm guessing, important cookies don't get set. If you can execute JavaScript in the context of the page (you should be able to with a get method call and a javascript URI), you could change the value of random_number to f.e. 42 and fill that in on the form.
代码是由 JavaScript 插入的 - 禁用 JS,重新加载页面,然后看到它消失。您必须搜索 JS 代码才能了解它来自哪里以及如何复制它。
The code is inserted by JavaScript – disable JS, reload the page and see it disappear. You have to hunt through the JS code to get an idea where it comes from and how to replicate it.
感谢 james2vegas、zoul 和 Shoban。
我终于自己想出了一个不太聪明但至少可行的方法来解决我在这里描述的问题。我想在这里分享一下。我认为@james2vegas 建议的方法可能要好得多......但无论如何我正在学习。
我的方法是这样的:
虽然验证码不在源文件中,但由于它仍然是可选择和可复制的,所以我可以让我的脚本复制登录页面中的所有内容,然后提取验证码。
为此,我使用 Win32::Guitest 模块中的 sendkeys 函数对登录页面执行“全选”和“复制”操作。
然后我使用 Win32:Clipboard 获取剪贴板内容,然后使用 Regexp 提取代码。像这样的事情:
一些想法:
随机数是由 Perl 中类似的东西生成的
我的 $random_number = int(rand(8999)) + 1000; #var random_number = rand(1000,10000);
然后它检查 $verify == $random_number 是否。我不知道如何捕获仅一个会话的 $random_number 的值。我认为它存储在内存中的某个地方。如果我可以直接捕获该值,那么我就不会那么麻烦地使用这个和那个额外的模块。
Thanks to james2vegas, zoul and Shoban.
I've finally figured out on my own a not-so-smart but at-least-workable way to solve the problem I described here. I'd like to share it here. I think the approach suggested by @james2vegas is probably much better...but anyway I'm learning along the way.
My approach is this:
Although the verification code is not in the source file but since it is still selectable and copyable, I can let my script copy everything in the login page and then extract the verification code.
To do this, I use the sendkeys functions in the Win32::Guitest module to do "Select All" and "Copy" to the login page.
Then I use Win32:Clipboard to get the clipboard content and then Regexp to extract the code. Something like this:
A few thoughts:
The random number is generated by something like this in Perl
my $random_number = int(rand(8999)) + 1000; #var random_number = rand(1000,10000);
And then it checks if $verify == $random_number. I don't know how to catch the value of one-session-only $random_number. I think it is stored somewhere in the memory. If I can capture the value directly then I wouldn't have gone to so much trouble of using this and that extra module.