适合初学者的小型可运行 www::Mechanize 示例

发布于 2024-10-03 07:22:16 字数 370 浏览 3 评论 0原文

我对编程非常陌生,我只是在阅读文档。 对于这个小项目,我阅读了一些 Perl 书籍和一本 PHP-Cookbook。 但我选择了一些食谱,不管你信不信:它在屏幕上看起来非常糟糕。我想我现在需要一些帮助 -

由于我的知识很少,很难完成这项工作...我需要机械化中的一些食谱来完成这项工作,因为以下一些示例已经过时:

查看 cpan 站点的机械化示例

我很想了解更多信息- 真实的例子 - 你还有更多......

我很高兴听到你的来信

i am very new to Programming and i am just reading the documents.
For the little project i have read some Perl books and a PHP-Cookbook.
But i have choosen some recipes and believe it or not: it looks pretty awful on the screen. I guess that i now need some assistance -

With my little knowledge is is hard to do the work...i need some Recipes in Mechanize that work, since some of the following examples are outdated:

see the cpan-site for the mechanize examples

i would love to learn more - with real live examples - do you have more ....

i love to hear from you

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

落日海湾 2024-10-10 07:22:16

你能更具体地说明你到底在做什么吗?例如,这是一个登录网站的脚本:

use WWW::Mechanize;

my $mech = WWW::Mechanize->new();
my $url = "http://www.test.com";

$mech->cookie_jar->set_cookie(0,"start",1,"/",".test.com");
$mech->get($url);
$mech->form_name("frmLogin");
$mech->set_fields(user=>'test',passwrd=>'test');
$mech->click();
$mech->save_content("logged_in.html");

这是一个执行谷歌搜索的脚本

use WWW::Mechanize;
use 5.10.0;
use strict;
use warnings;

my $mech = new WWW::Mechanize;

my $option = $ARGV[$#ARGV]; 

#you may customize your google search by editing this url (always end it with "q=" though)
my $google = 'http://www.google.co.uk/search?q='; 


my @dork = ("inurl:dude","cheese");

        #declare necessary variables
        my $max = 0;
        my $link;
        my $sc = scalar(@dork);

        #start the main loop, one itineration for every google search
        for my $i ( 0 .. $sc ) {

            #loop until the maximum number of results chosen isn't reached
            while ( $max <= $option ) {
                $mech->get( $google . $dork[$i] . "&start=" . $max );

                #get all the google results
                foreach $link ( $mech->links() ) {
                    my $google_url = $link->url;
                    if ( $google_url !~ /^\// && $google_url !~ /google/ ) {
                    say $google_url;
            }
                    }
                     $max += 10;
                }


            }

简单的网站爬虫从每个页面提取信息(html注释):

    #call the mechanize object, with autocheck switched off
    #so we don't get error when bad/malformed url is requested
    my $mech = WWW::Mechanize->new(autocheck=>0);
    my %comments;
    my %links;
    my @comment;

    my $target = "http://google.com";
    #store the first target url as not checked
    $links{$target} = 0;
    #initiate the search
    my $url = &get_url();

    #start the main loop
    while ($url ne "")
    {
        #get the target url
        $mech->get($url);
        #search the source for any html comments 
        my $res = $mech->content;
        @comment = $res =~ /<!--[^>]*-->/g;
        #store comments in 'comments' hash and output it on the screen, if there are any found
        $comments{$url} = "@comment" and say "\n$url \n---------------->\n $comments{$url}" if $#comment >= 0;
        #loop through all the links that are on the current page (including only urls that are contained in html anchor)

        foreach my $link ($mech->links())
        {
            $link = $link->url();
            #exclude some irrelevant stuff, such as javascript functions, or external links
            #you might want to add checking domain name, to ensure relevant links aren't excluded

            if ($link !~ /^(#|mailto:|(f|ht)tp(s)?\:|www\.|javascript:)/)
            {
            #check whether the link has leading slash so we can build properly the whole url
            $link = $link =~ /^\// ? $target.$link : $target."/".$link;
            #store it into our hash of links to be searched, unless it's already present
            $links{$link} = 0 unless $links{$link};
            }
        }

        #indicate we have searched this url and start over
        $links{$url} = 1;
        $url = &get_url();
    }

    sub get_url
    {
        my $key, my $value;
        #loop through the links hash and return next target url, unless it's already been searched
        #if all urls have been searched return empty, ending the main loop

        while (($key,$value) = each(%links))
        {
            return $key if $value == 0;
        }

        return "";
    }

它确实取决于您想要什么,但如果您想要更多示例,我建议您访问 perlmonks.org,您可以在其中找到足够的材料来帮助你前进。

一定要为此添加书签 mechanize 模块手册页 ,这是终极资源……

could you be a little more specific on what exactly you are after... For instance this is a script to log into a website:

use WWW::Mechanize;

my $mech = WWW::Mechanize->new();
my $url = "http://www.test.com";

$mech->cookie_jar->set_cookie(0,"start",1,"/",".test.com");
$mech->get($url);
$mech->form_name("frmLogin");
$mech->set_fields(user=>'test',passwrd=>'test');
$mech->click();
$mech->save_content("logged_in.html");

This is a script to perform google searches

use WWW::Mechanize;
use 5.10.0;
use strict;
use warnings;

my $mech = new WWW::Mechanize;

my $option = $ARGV[$#ARGV]; 

#you may customize your google search by editing this url (always end it with "q=" though)
my $google = 'http://www.google.co.uk/search?q='; 


my @dork = ("inurl:dude","cheese");

        #declare necessary variables
        my $max = 0;
        my $link;
        my $sc = scalar(@dork);

        #start the main loop, one itineration for every google search
        for my $i ( 0 .. $sc ) {

            #loop until the maximum number of results chosen isn't reached
            while ( $max <= $option ) {
                $mech->get( $google . $dork[$i] . "&start=" . $max );

                #get all the google results
                foreach $link ( $mech->links() ) {
                    my $google_url = $link->url;
                    if ( $google_url !~ /^\// && $google_url !~ /google/ ) {
                    say $google_url;
            }
                    }
                     $max += 10;
                }


            }

Simple site crawler extracting information (html comments) from every page:

    #call the mechanize object, with autocheck switched off
    #so we don't get error when bad/malformed url is requested
    my $mech = WWW::Mechanize->new(autocheck=>0);
    my %comments;
    my %links;
    my @comment;

    my $target = "http://google.com";
    #store the first target url as not checked
    $links{$target} = 0;
    #initiate the search
    my $url = &get_url();

    #start the main loop
    while ($url ne "")
    {
        #get the target url
        $mech->get($url);
        #search the source for any html comments 
        my $res = $mech->content;
        @comment = $res =~ /<!--[^>]*-->/g;
        #store comments in 'comments' hash and output it on the screen, if there are any found
        $comments{$url} = "@comment" and say "\n$url \n---------------->\n $comments{$url}" if $#comment >= 0;
        #loop through all the links that are on the current page (including only urls that are contained in html anchor)

        foreach my $link ($mech->links())
        {
            $link = $link->url();
            #exclude some irrelevant stuff, such as javascript functions, or external links
            #you might want to add checking domain name, to ensure relevant links aren't excluded

            if ($link !~ /^(#|mailto:|(f|ht)tp(s)?\:|www\.|javascript:)/)
            {
            #check whether the link has leading slash so we can build properly the whole url
            $link = $link =~ /^\// ? $target.$link : $target."/".$link;
            #store it into our hash of links to be searched, unless it's already present
            $links{$link} = 0 unless $links{$link};
            }
        }

        #indicate we have searched this url and start over
        $links{$url} = 1;
        $url = &get_url();
    }

    sub get_url
    {
        my $key, my $value;
        #loop through the links hash and return next target url, unless it's already been searched
        #if all urls have been searched return empty, ending the main loop

        while (($key,$value) = each(%links))
        {
            return $key if $value == 0;
        }

        return "";
    }

It really depends what you are after, but if you want more examples I would refer you to perlmonks.org, where you can find plenty of material to get you going.

Definitely bookmark this though mechanize module man page, it is the ultimate resource...

深海夜未眠 2024-10-10 07:22:16

What is wrong with the WWW::Mechanize::Cookbook and WWW::Mechanize::Examples pages that the author provides?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文