当前位置：文江博客话题详情

适合初学者的小型可运行 www::Mechanize 示例

发布于 2024-10-03 07:22:16 字数 370 浏览 3 评论 0原文

我对编程非常陌生，我只是在阅读文档。对于这个小项目，我阅读了一些 Perl 书籍和一本 PHP-Cookbook。但我选择了一些食谱，不管你信不信：它在屏幕上看起来非常糟糕。我想我现在需要一些帮助 -

由于我的知识很少，很难完成这项工作...我需要机械化中的一些食谱来完成这项工作，因为以下一些示例已经过时：

查看 cpan 站点的机械化示例

我很想了解更多信息- 真实的例子 - 你还有更多......

我很高兴听到你的来信

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

落日海湾 2024-10-10 07:22:16

你能更具体地说明你到底在做什么吗？例如，这是一个登录网站的脚本：

use WWW::Mechanize;

my $mech = WWW::Mechanize->new();
my $url = "http://www.test.com";

$mech->cookie_jar->set_cookie(0,"start",1,"/",".test.com");
$mech->get($url);
$mech->form_name("frmLogin");
$mech->set_fields(user=>'test',passwrd=>'test');
$mech->click();
$mech->save_content("logged_in.html");

这是一个执行谷歌搜索的脚本

use WWW::Mechanize;
use 5.10.0;
use strict;
use warnings;

my $mech = new WWW::Mechanize;

my $option = $ARGV[$#ARGV]; 

#you may customize your google search by editing this url (always end it with "q=" though)
my $google = 'http://www.google.co.uk/search?q='; 


my @dork = ("inurl:dude","cheese");

        #declare necessary variables
        my $max = 0;
        my $link;
        my $sc = scalar(@dork);

        #start the main loop, one itineration for every google search
        for my $i ( 0 .. $sc ) {

            #loop until the maximum number of results chosen isn't reached
            while ( $max <= $option ) {
                $mech->get( $google . $dork[$i] . "&start=" . $max );

                #get all the google results
                foreach $link ( $mech->links() ) {
                    my $google_url = $link->url;
                    if ( $google_url !~ /^\// && $google_url !~ /google/ ) {
                    say $google_url;
            }
                    }
                     $max += 10;
                }


            }

简单的网站爬虫从每个页面提取信息（html注释）：

    #call the mechanize object, with autocheck switched off
    #so we don't get error when bad/malformed url is requested
    my $mech = WWW::Mechanize->new(autocheck=>0);
    my %comments;
    my %links;
    my @comment;

    my $target = "http://google.com";
    #store the first target url as not checked
    $links{$target} = 0;
    #initiate the search
    my $url = &get_url();

    #start the main loop
    while ($url ne "")
    {
        #get the target url
        $mech->get($url);
        #search the source for any html comments 
        my $res = $mech->content;
        @comment = $res =~ /<!--[^>]*-->/g;
        #store comments in 'comments' hash and output it on the screen, if there are any found
        $comments{$url} = "@comment" and say "\n$url \n---------------->\n $comments{$url}" if $#comment >= 0;
        #loop through all the links that are on the current page (including only urls that are contained in html anchor)

        foreach my $link ($mech->links())
        {
            $link = $link->url();
            #exclude some irrelevant stuff, such as javascript functions, or external links
            #you might want to add checking domain name, to ensure relevant links aren't excluded

            if ($link !~ /^(#|mailto:|(f|ht)tp(s)?\:|www\.|javascript:)/)
            {
            #check whether the link has leading slash so we can build properly the whole url
            $link = $link =~ /^\// ? $target.$link : $target."/".$link;
            #store it into our hash of links to be searched, unless it's already present
            $links{$link} = 0 unless $links{$link};
            }
        }

        #indicate we have searched this url and start over
        $links{$url} = 1;
        $url = &get_url();
    }

    sub get_url
    {
        my $key, my $value;
        #loop through the links hash and return next target url, unless it's already been searched
        #if all urls have been searched return empty, ending the main loop

        while (($key,$value) = each(%links))
        {
            return $key if $value == 0;
        }

        return "";
    }

它确实取决于您想要什么，但如果您想要更多示例，我建议您访问 perlmonks.org，您可以在其中找到足够的材料来帮助你前进。

一定要为此添加书签 mechanize 模块手册页，这是终极资源……

could you be a little more specific on what exactly you are after... For instance this is a script to log into a website:

use WWW::Mechanize;

my $mech = WWW::Mechanize->new();
my $url = "http://www.test.com";

$mech->cookie_jar->set_cookie(0,"start",1,"/",".test.com");
$mech->get($url);
$mech->form_name("frmLogin");
$mech->set_fields(user=>'test',passwrd=>'test');
$mech->click();
$mech->save_content("logged_in.html");

This is a script to perform google searches

use WWW::Mechanize;
use 5.10.0;
use strict;
use warnings;

my $mech = new WWW::Mechanize;

my $option = $ARGV[$#ARGV]; 

#you may customize your google search by editing this url (always end it with "q=" though)
my $google = 'http://www.google.co.uk/search?q='; 


my @dork = ("inurl:dude","cheese");

        #declare necessary variables
        my $max = 0;
        my $link;
        my $sc = scalar(@dork);

        #start the main loop, one itineration for every google search
        for my $i ( 0 .. $sc ) {

            #loop until the maximum number of results chosen isn't reached
            while ( $max <= $option ) {
                $mech->get( $google . $dork[$i] . "&start=" . $max );

                #get all the google results
                foreach $link ( $mech->links() ) {
                    my $google_url = $link->url;
                    if ( $google_url !~ /^\// && $google_url !~ /google/ ) {
                    say $google_url;
            }
                    }
                     $max += 10;
                }


            }

Simple site crawler extracting information (html comments) from every page:

    #call the mechanize object, with autocheck switched off
    #so we don't get error when bad/malformed url is requested
    my $mech = WWW::Mechanize->new(autocheck=>0);
    my %comments;
    my %links;
    my @comment;

    my $target = "http://google.com";
    #store the first target url as not checked
    $links{$target} = 0;
    #initiate the search
    my $url = &get_url();

    #start the main loop
    while ($url ne "")
    {
        #get the target url
        $mech->get($url);
        #search the source for any html comments 
        my $res = $mech->content;
        @comment = $res =~ /<!--[^>]*-->/g;
        #store comments in 'comments' hash and output it on the screen, if there are any found
        $comments{$url} = "@comment" and say "\n$url \n---------------->\n $comments{$url}" if $#comment >= 0;
        #loop through all the links that are on the current page (including only urls that are contained in html anchor)

        foreach my $link ($mech->links())
        {
            $link = $link->url();
            #exclude some irrelevant stuff, such as javascript functions, or external links
            #you might want to add checking domain name, to ensure relevant links aren't excluded

            if ($link !~ /^(#|mailto:|(f|ht)tp(s)?\:|www\.|javascript:)/)
            {
            #check whether the link has leading slash so we can build properly the whole url
            $link = $link =~ /^\// ? $target.$link : $target."/".$link;
            #store it into our hash of links to be searched, unless it's already present
            $links{$link} = 0 unless $links{$link};
            }
        }

        #indicate we have searched this url and start over
        $links{$url} = 1;
        $url = &get_url();
    }

    sub get_url
    {
        my $key, my $value;
        #loop through the links hash and return next target url, unless it's already been searched
        #if all urls have been searched return empty, ending the main loop

        while (($key,$value) = each(%links))
        {
            return $key if $value == 0;
        }

        return "";
    }

It really depends what you are after, but if you want more examples I would refer you to perlmonks.org, where you can find plenty of material to get you going.

Definitely bookmark this though mechanize module man page, it is the ultimate resource...

回复收藏 0 原文