从“Yahoo!”运行 Perl hack Google 目录 Mindshare”

发布于 2025-01-03 14:20:30 字数 2033 浏览 1 评论 0原文

有没有人运行 http://oreilly.com/pub/h/974#code< 上给出的 perl 脚本/a>？

这是一个著名的，用于从 Yahoo! 获取 URL。目录并且许多人已经成功使用它。

我试图获取网址。我创建了自己的 Google API 密钥并在代码中替换了它。除此之外我没有做任何改变。

脚本既不会产生任何错误，也不会产生任何 URL。

#!/usr/bin/perl -w

use strict;
use LWP::Simple;
use HTML::LinkExtor;
use SOAP::Lite;

my $google_key  = "your API key goes here";
my $google_wdsl = "GoogleSearch.wsdl";
my $yahoo_dir   = shift || "/Computers_and_Internet/Data_Formats/XML_  _".
              "eXtensible_Markup_Language_/RSS/News_Aggregators/";

# download the Yahoo! directory.
my $data = get("http://dir.yahoo.com" . $yahoo_dir) or die $!;

# create our Google object.
my $google_search = SOAP::Lite->service("file:$google_wdsl");
my %urls; # where we keep our counts and titles.

# extract all the links and parse 'em.
HTML::LinkExtor->new(\&mindshare)->parse($data);

sub mindshare { # for each link we find...

  my ($tag, %attr) = @_;

  print "$tag\n";   

  # continue on only if the tag was a link,

  # and the URL matches Yahoo!'s redirectory.

  return if $tag ne 'a';   

  return unless $attr{href} =~ /srd.yahoo/;

  return unless $attr{href} =~ /\*http/;



  # now get our real URL.

  $attr{href} =~ /\*(http.*)/; my $url = $1;

  print "hi";

  # and process each URL through Google.

  my $results = $google_search->doGoogleSearch(

                      $google_key,"link:$url", 0, 1,

                      "true", "", "false", "", "", ""

                ); # wheee, that was easy, guvner.

  $urls{$url} = $results->{estimatedTotalResultsCount};

  print "1\n";

} 

# now sort and display.

my @sorted_urls = sort { $urls{$b} <=> $urls{$a} } keys %urls;

foreach my $url (@sorted_urls) { print "$urls{$url}: $url\n"; }

程序进入循环，并在第一次迭代时出现“my @sorted_urls = sort { $urls{$b} <=> $urls{$a} } keys %urls;”。

我对 perl 没有任何了解，但这个任务应该是微不足道的。

当然，我错过了一些非常明显的东西，因为这个脚本已被许多人成功使用。

提前致谢。

原文

Has anyone run perl script given at http://oreilly.com/pub/h/974#code ?

This is a famous one, used to get URLs from Yahoo! directory and many people have successfully used it.

I was trying to get URLs. I created my own Google API key and replaced that in the code.
Apart from that I did not make any change.

Script is neither producing any error nor any URL.

#!/usr/bin/perl -w

use strict;
use LWP::Simple;
use HTML::LinkExtor;
use SOAP::Lite;

my $google_key  = "your API key goes here";
my $google_wdsl = "GoogleSearch.wsdl";
my $yahoo_dir   = shift || "/Computers_and_Internet/Data_Formats/XML_  _".
              "eXtensible_Markup_Language_/RSS/News_Aggregators/";

# download the Yahoo! directory.
my $data = get("http://dir.yahoo.com" . $yahoo_dir) or die $!;

# create our Google object.
my $google_search = SOAP::Lite->service("file:$google_wdsl");
my %urls; # where we keep our counts and titles.

# extract all the links and parse 'em.
HTML::LinkExtor->new(\&mindshare)->parse($data);

sub mindshare { # for each link we find...

  my ($tag, %attr) = @_;

  print "$tag\n";   

  # continue on only if the tag was a link,

  # and the URL matches Yahoo!'s redirectory.

  return if $tag ne 'a';   

  return unless $attr{href} =~ /srd.yahoo/;

  return unless $attr{href} =~ /\*http/;



  # now get our real URL.

  $attr{href} =~ /\*(http.*)/; my $url = $1;

  print "hi";

  # and process each URL through Google.

  my $results = $google_search->doGoogleSearch(

                      $google_key,"link:$url", 0, 1,

                      "true", "", "false", "", "", ""

                ); # wheee, that was easy, guvner.

  $urls{$url} = $results->{estimatedTotalResultsCount};

  print "1\n";

} 

# now sort and display.

my @sorted_urls = sort { $urls{$b} <=> $urls{$a} } keys %urls;

foreach my $url (@sorted_urls) { print "$urls{$url}: $url\n"; }

Program goes into the loop, and comes out at first iteration to "my @sorted_urls = sort { $urls{$b} <=> $urls{$a} } keys %urls;".

I don't have any understanding about perl but this task should have been trivial.

Surely,I am missing something very obvious, because this script has been successfully used by many.

Thanks in advance.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

双手揣兜 2025-01-10 14:20:30

您是否向脚本提供目录？因为如果您不是，并且脚本中的这一行

"/Computers_and_Internet/Data_Formats/XML_  _".
              "eXtensible_Markup_Language_/RSS/News_Aggregators/"

不是格式化工件，那么您正在尝试抓取不存在的页面。

Are you supplying a directory to the script? Because if you are not, and this line in your script

"/Computers_and_Internet/Data_Formats/XML_  _".
              "eXtensible_Markup_Language_/RSS/News_Aggregators/"

is not a formatting artefact, then you're trying to scrape a non-existent page.

回复收藏 0 原文

~没有更多了~

关于作者

无边思念无边月

暂无简介

文章

523 人气

关注发私信

西西弗的石头怪

文章 0 评论 0

关注

5397313

文章 0 评论 0

关注

烟沫凡尘

文章 0 评论 0

关注

一个破名字

文章 0 评论 0

关注

萌︼了一个春

文章 0 评论 0

关注

当爱已成负担

文章 0 评论 0

友情链接

文江博客

从“Yahoo!”运行 Perl hack Google 目录 Mindshare”

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

西西弗的石头怪

5397313

烟沫凡尘

一个破名字

萌︼了一个春

当爱已成负担

友情链接

从“Yahoo!”运行 Perl hack Google 目录 Mindshare”

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

西西弗的石头怪

5397313

烟沫凡尘

一个破名字

萌︼了一个春

当爱已成负担

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。