机械化示例 - 安静简单但对我来说太复杂:需要解释

发布于 2024-10-04 00:10:31 字数 821 浏览 0 评论 0原文

美好的一天,亲爱的社区。我是编程新手。我想更深入地研究 Perl。 所以我有一个机械化的例子 - 安静简单但对我来说太复杂:需要解释。我需要你的帮助!

use strict;
    $|++;

    use WWW::Mechanize;
    use File::Basename;

    my $m = WWW::Mechanize->new;

    $m->get("http://www.despair.com/indem.html");

    my @top_links = @{$m->links};

    for my $top_link_num (0..$#top_links) {
        next unless $top_links[$top_link_num][0] =~ /^http:/;

        $m->follow_link( n=>$top_link_num ) or die "can't follow $top_link_num";

        print $m->uri, "\n";
        for my $image (grep m{^http://store4}, map $_->[0], @{$m->links}) {
            my $local = basename $image;
            print " $image...", $m->mirror($image, $local)->message, "\n"
        }

        $m->back or die "can't go back";
    }

有人能给我逐行解释吗?

Good day dear community. I am new to programming. And i want to digg deeper into Perl.
So i have a Mechanize example - quiete simple but too complex for me: need explanations. I need your help here with this!

use strict;
    $|++;

    use WWW::Mechanize;
    use File::Basename;

    my $m = WWW::Mechanize->new;

    $m->get("http://www.despair.com/indem.html");

    my @top_links = @{$m->links};

    for my $top_link_num (0..$#top_links) {
        next unless $top_links[$top_link_num][0] =~ /^http:/;

        $m->follow_link( n=>$top_link_num ) or die "can't follow $top_link_num";

        print $m->uri, "\n";
        for my $image (grep m{^http://store4}, map $_->[0], @{$m->links}) {
            my $local = basename $image;
            print " $image...", $m->mirror($image, $local)->message, "\n"
        }

        $m->back or die "can't go back";
    }

can anybody give me a line by line explanation?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

孤星 2024-10-11 00:10:31

我尝试了第一条线路。

但是,您需要确保首先阅读并理解以下文档:

1) Perl简介 - 特别是变量作用域部分

2) Perl 数据

3) Perl 数据结构手册

PS 正如 Eric 在评论中所说,对于刚开始的人来说,这段代码绝对不是一个很好的例子。它有太多重要的想法/概念/移动部件。

use strict; 
  # Does not allow undeclared global variables or other unsafe constructs.
  # You should ALWAYS code with "use strict; use warnings"
  # See http://perldoc.perl.org/strict.html
$|++;
  # Turn on autoflush on STDOUT filehandle. 
  # See "http://perldoc.perl.org/perlvar.html" for "$|" and other special variables.
  # P.S. This "++" is a hack - it would be a lot more readable to do "$| = 1;"
  #      since $| only cares whether the value is zero or non-zero.

use WWW::Mechanize; # Load the module for getting web sites.
use File::Basename; # Load the module for finding script's name/path.

my $m = WWW::Mechanize->new; # Create new object via a constructor (new)

$m->get("http://www.despair.com/indem.html"); 
   # Retrieve the contents of the URL. 
   # See http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize.pm
   # for the module's documentation (aka POD)

my @top_links = @{$m->links};
   # Declare a "@top_links" array, 
   # get the list of links on the above page (returns array reference)
   # and de-reference that array reference and store it in @top_links array

for my $top_link_num (0..$#top_links) { 
    # Loop over all integers between 0 and the last index of @top_links array
    # (e.g. if there were 3 links, loop over 0,1,2
    # Assign the current loop value to $top_link_num variable

    next unless $top_links[$top_link_num][0] =~ /^http:/;
    # go to next iteration of the loop unless the current link's URL is HTTP protocol
    # Current link is the element of the array with current undex -
    #    $top_links[$top_link_num]
    # The link data is stored as an array reference,
    # with the link URL being the first element of the arrayref 
    # Therefore, $top_links[$top_link_num][0] - which is the shorthand 
    #    for $top_links[$top_link_num]->[0] as you learned
    #    from reading Data Structures Cookbook I linked - is the URL
    # To check if URL is HTTP prtocol, we check if it starts with http:
    # via regular expression - see "http://perldoc.perl.org/perlre.html"

    $m->follow_link( n=>$top_link_num ) or die "can't follow $top_link_num";

    print $m->uri, "\n";
    for my $image (grep m{^http://store4}, map $_->[0], @{$m->links}) {
        my $local = basename $image;
        print " $image...", $m->mirror($image, $local)->message, "\n"
    }

    $m->back or die "can't go back";
}

I tried the first coupe of lines.

However you need to make sure to first read and understand the following documentation:

1) Perl Intro - especially variable scoping part

2) Perl data

3) Perl Data Structures Cookbook

P.S. As Eric said in the comment, this code is definitely NOT a very good example for someone just starting. It's got way too many non-trivial ideas/concepts/moving parts.

use strict; 
  # Does not allow undeclared global variables or other unsafe constructs.
  # You should ALWAYS code with "use strict; use warnings"
  # See http://perldoc.perl.org/strict.html
$|++;
  # Turn on autoflush on STDOUT filehandle. 
  # See "http://perldoc.perl.org/perlvar.html" for "$|" and other special variables.
  # P.S. This "++" is a hack - it would be a lot more readable to do "$| = 1;"
  #      since $| only cares whether the value is zero or non-zero.

use WWW::Mechanize; # Load the module for getting web sites.
use File::Basename; # Load the module for finding script's name/path.

my $m = WWW::Mechanize->new; # Create new object via a constructor (new)

$m->get("http://www.despair.com/indem.html"); 
   # Retrieve the contents of the URL. 
   # See http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize.pm
   # for the module's documentation (aka POD)

my @top_links = @{$m->links};
   # Declare a "@top_links" array, 
   # get the list of links on the above page (returns array reference)
   # and de-reference that array reference and store it in @top_links array

for my $top_link_num (0..$#top_links) { 
    # Loop over all integers between 0 and the last index of @top_links array
    # (e.g. if there were 3 links, loop over 0,1,2
    # Assign the current loop value to $top_link_num variable

    next unless $top_links[$top_link_num][0] =~ /^http:/;
    # go to next iteration of the loop unless the current link's URL is HTTP protocol
    # Current link is the element of the array with current undex -
    #    $top_links[$top_link_num]
    # The link data is stored as an array reference,
    # with the link URL being the first element of the arrayref 
    # Therefore, $top_links[$top_link_num][0] - which is the shorthand 
    #    for $top_links[$top_link_num]->[0] as you learned
    #    from reading Data Structures Cookbook I linked - is the URL
    # To check if URL is HTTP prtocol, we check if it starts with http:
    # via regular expression - see "http://perldoc.perl.org/perlre.html"

    $m->follow_link( n=>$top_link_num ) or die "can't follow $top_link_num";

    print $m->uri, "\n";
    for my $image (grep m{^http://store4}, map $_->[0], @{$m->links}) {
        my $local = basename $image;
        print " $image...", $m->mirror($image, $local)->message, "\n"
    }

    $m->back or die "can't go back";
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文