在 Perl 脚本中搜索?

发布于 2024-10-14 04:14:26 字数 1436 浏览 4 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

草莓酥 2024-10-21 04:14:26

在继续操作之前,请先了解 Google 服务条款

您同意不通过 Google 提供的界面以外的任何方式访问(或尝试访问)任何服务,除非您在与 Google 签订的单独协议中获得特别允许。您明确同意不通过任何自动方式(包括使用脚本或网络爬虫)访问(或尝试访问)任何服务,并应确保您遵守服务上存在的任何 robots.txt 文件中规定的说明.


话虽这么说,有一个官方 API 可以以编程方式查询网络搜索。

JSON/Atom 自定义搜索 API 可让您开发网站并以编程方式从您的 Google 自定义搜索检索和显示搜索结果的程序。通过此 API,您可以使用 RESTful 请求来获取 JSON 或 Atom 格式的搜索结果。

您可以使用 XML::Atom::ClientLWP+JSON::Any 或许多其他库来执行 REST< /a> 调用。

(您可能仍会找到对旧版 Google Web Search API 的引用,但它已被弃用并受到限制。)

Before proceeding, please be aware of the Google Terms of Service.

You agree not to access (or attempt to access) any of the Services by any means other than through the interface that is provided by Google, unless you have been specifically allowed to do so in a separate agreement with Google. You specifically agree not to access (or attempt to access) any of the Services through any automated means (including use of scripts or web crawlers) and shall ensure that you comply with the instructions set out in any robots.txt file present on the Services.


That being said, there exists an official API to query web search programmatically.

The JSON/Atom Custom Search API lets you develop websites and programs to retrieve and display search results from your Google Custom Search programmatically. With this API, you can use RESTful requests to get search results in either JSON or Atom format.

You can use XML::Atom::Client or LWP+JSON::Any or many other libraries to perform the REST calls.

(You may still find references to the older Google Web Search API but it's deprecated and limited.)

如果没结果 2024-10-21 04:14:26

看一下 Google 自定义搜索 API:
http://code.google.com/apis/customsearch/

如果您需要搜索对于更广泛的主机,您需要使用较旧的、已弃用的 Websearch API,但这会限制您每天可以进行的查询数量。

除此之外,您将需要进行大量的 html 抓取和解析。

Take a look at the Google Custom search API:
http://code.google.com/apis/customsearch/

If you need to search over a wider variety of hosts, you'll need to use the older, deprecated Websearch API, but that will limit the number of queries you can make per day.

Barring that, you'll need to do a lot of html scraping and parsing.

鲜肉鲜肉永远不皱 2024-10-21 04:14:26

这是一个简单的脚本的样子(是的,它违反了 TOS,所以它只是 PoC,你不应该使用它......)

use WWW::Mechanize;
use 5.10.0;
use strict;
use warnings;

my $mech = new WWW::Mechanize;

my $option = shift; 

#you may customize your google search by editing this url (always end it with "q=" though)
my $google = 'http://www.google.co.uk/search?q='; 
my @dork = ("this is my search one","this is my search two"); 

        #declare necessary variables
        my $max = 0;
        my $link;
        my $sc = scalar(@dork);

        #start the main loop, one itineration for every google search
        for my $i ( 0 .. $sc ) {

            #loop until the maximum number of results chosen isn't reached
            while ( $max <= $option ) {
                #say $google . $dork[$i] . "&start=" . $max;
                $mech->get( $google . $dork[$i] . "&start=" . $max );

                #get all the google results
                foreach $link ( $mech->links() ) {
                    my $google_url = $link->url;
                    if ( $google_url !~ /^\// && $google_url !~ /google/ ) {
                    say $google_url;
            }
                    }
                     $max += 10;
                }


            }

顺便说一句,我不久前写了这个,所以它不完全符合标准,但它确实完成了工作,而且我懒得启动 linux 来找到这个的更新版本......

Here is how a simple script could look like (and yes it violates TOS so it's just PoC, and you shouldn't use it...)

use WWW::Mechanize;
use 5.10.0;
use strict;
use warnings;

my $mech = new WWW::Mechanize;

my $option = shift; 

#you may customize your google search by editing this url (always end it with "q=" though)
my $google = 'http://www.google.co.uk/search?q='; 
my @dork = ("this is my search one","this is my search two"); 

        #declare necessary variables
        my $max = 0;
        my $link;
        my $sc = scalar(@dork);

        #start the main loop, one itineration for every google search
        for my $i ( 0 .. $sc ) {

            #loop until the maximum number of results chosen isn't reached
            while ( $max <= $option ) {
                #say $google . $dork[$i] . "&start=" . $max;
                $mech->get( $google . $dork[$i] . "&start=" . $max );

                #get all the google results
                foreach $link ( $mech->links() ) {
                    my $google_url = $link->url;
                    if ( $google_url !~ /^\// && $google_url !~ /google/ ) {
                    say $google_url;
            }
                    }
                     $max += 10;
                }


            }

By the way I wrote this a while back, so it's not exactly up to the par, but it does the job, and I am too lazy to boot linux to find the newer version of this...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文