使用 Perl LWP:: 简单处理在线价格查询网站

发布于 2024-10-18 00:17:36 字数 1658 浏览 1 评论 0原文

在空闲时间,我一直在尝试通过编写一个脚本来提高我的 Perl 能力,该脚本使用 LWP::Simple 来轮询一个特定网站的产品页面以检查产品的价格(我有点像 Perl 菜鸟)。该脚本还保留了该商品的最后价格的非常简单的积压(因为价格经常变化)。

我想知道是否有任何方法可以进一步自动化脚本,这样我就不必显式地将页面的 URL 添加到初始哈希中(即保留关键术语数组并执行搜索查询 amazon 来查找页面或价格?)。有没有办法可以做到这一点,而不需要我复制亚马逊的搜索 URL 并解析我的关键字? (我知道使用正则表达式处理 HTML 通常是不好的形式,我只是使用它,因为我只需要一小部分数据)。


#!usr/bin/perl
use strict;
use warnings;
use LWP::Simple;

my %oldPrice;
my %nameURL = (
    "Archer Season 1" => "http://www.amazon.com/Archer-Season-H-Jon-Benjamin/dp/B00475B0G2/ref=sr_1_1?ie=UTF8&qid=1297282236&sr=8-1",
    "Code Complete" => "http://www.amazon.com/Code-Complete-Practical-Handbook-Construction/dp/0735619670/ref=sr_1_1?ie=UTF8&qid=1296841986&sr=8-1",
    "Intermediate Perl" => "http://www.amazon.com/Intermediate-Perl-Randal-L-Schwartz/dp/0596102062/ref=sr_1_1?s=books&ie=UTF8&qid=1297283720&sr=1-1",
    "Inglorious Basterds (2-Disc)" => "http://www.amazon.com/Inglourious-Basterds-Two-Disc-Special-Brad/dp/B002T9H2LK/ref=sr_1_3?ie=UTF8&qid=1297283816&sr=8-3"
);

if (-e "backlog.txt"){
    open (LOG, "backlog.txt");
    while(){
        chomp;
        my @temp = split(/:\s/);
        $oldPrice{$temp[0]} = $temp[1];
    }
close(LOG);
}

print "\nChecking Daily Amazon Prices:\n";
open(LOG, ">backlog.txt");
foreach my $key (sort keys %nameURL){
    my $content = get $nameURL{$key} or die;
    $content =~  m{\s*\$(\d+.\d+)} || die;
    if (exists $oldPrice{$key} && $oldPrice{$key} != $1){
        print "$key: \$$1 (Was $oldPrice{$key})\n";
    }
    else{
    print "\n$key: $1\n";
    }
    print LOG "$key: $1\n";
}
close(LOG);

In my free time, I've been trying to improve my perl abilities by working on a script that uses LWP::Simple to poll one specific website's product pages to check the prices of products (I'm somewhat of a perl noob). This script also keeps a very simple backlog of the last price seen for that item (since the prices change frequently).

I was wondering if there was any way I could further automate the script so that I don't have to explicitly add the page's URL to the initial hash (i.e. keep an array of key terms and do a search query amazon to find the page or price?). Is there anyway way I could do this that doesn't involve me just copying Amazon's search URL and parsing in my keywords? (I'm aware that processing HTML with regex is generally bad form, I just used it since I only need one small piece of data).


#!usr/bin/perl
use strict;
use warnings;
use LWP::Simple;

my %oldPrice;
my %nameURL = (
    "Archer Season 1" => "http://www.amazon.com/Archer-Season-H-Jon-Benjamin/dp/B00475B0G2/ref=sr_1_1?ie=UTF8&qid=1297282236&sr=8-1",
    "Code Complete" => "http://www.amazon.com/Code-Complete-Practical-Handbook-Construction/dp/0735619670/ref=sr_1_1?ie=UTF8&qid=1296841986&sr=8-1",
    "Intermediate Perl" => "http://www.amazon.com/Intermediate-Perl-Randal-L-Schwartz/dp/0596102062/ref=sr_1_1?s=books&ie=UTF8&qid=1297283720&sr=1-1",
    "Inglorious Basterds (2-Disc)" => "http://www.amazon.com/Inglourious-Basterds-Two-Disc-Special-Brad/dp/B002T9H2LK/ref=sr_1_3?ie=UTF8&qid=1297283816&sr=8-3"
);

if (-e "backlog.txt"){
    open (LOG, "backlog.txt");
    while(){
        chomp;
        my @temp = split(/:\s/);
        $oldPrice{$temp[0]} = $temp[1];
    }
close(LOG);
}

print "\nChecking Daily Amazon Prices:\n";
open(LOG, ">backlog.txt");
foreach my $key (sort keys %nameURL){
    my $content = get $nameURL{$key} or die;
    $content =~  m{\s*\$(\d+.\d+)} || die;
    if (exists $oldPrice{$key} && $oldPrice{$key} != $1){
        print "$key: \$1 (Was $oldPrice{$key})\n";
    }
    else{
    print "\n$key: $1\n";
    }
    print LOG "$key: $1\n";
}
close(LOG);

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

药祭#氼 2024-10-25 00:17:36

是的,设计可以改进。最好删除所有内容并从现有的全功能网页抓取应用程序或框架重新开始,但由于您想学习:

  1. 名称到 URL 映射是 配置数据。从程序外部检索它。
  2. 将历史数据存储在数据库中。
  3. 学习 XPath 并使用它从 HTML 中提取数据,如果您已经掌握了,那么这很容易CSS 选择器。

其他堆垛机,如果您想修改我的帖子并提供每条建议的理由,请继续编辑它。

Yes, the design can be improved. It's probably best to delete everything and start over with an existing full-featured web scraping application or framework, but since you want to learn:

  1. The name-to-URL map is configuration data. Retrieve it from outside of the program.
  2. Store the historic data in a database.
  3. Learn XPath and use it to extract data from HTML, it's easy if you already grok CSS selectors.

Other stackers, if you want to amend my post with the rationale for each piece of advice, go ahead and edit it.

尬尬 2024-10-25 00:17:36

我制作了简单的脚本来演示亚马逊搜索自动化。所有部门的搜索网址均已使用转义搜索词进行更改。其余代码是使用 HTML::TreeBuilder 进行简单解析。可以使用 dump 方法轻松检查相关 HTML 的结构(请参阅注释掉的行)。

use strict; use warnings;

use LWP::Simple;
use URI::Escape;
use HTML::TreeBuilder;
use Try::Tiny;

my $look_for = "Archer Season 1";

my $contents
  = get "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords="
        . uri_escape($look_for);

my $html = HTML::TreeBuilder->new_from_content($contents);
for my $item ($html->look_down(id => qr/result_\d+/)) {
    # $item->dump;      # find out structure of HTML
    my $title = try { $item->look_down(class => 'productTitle')->as_trimmed_text };
    my $price = try { $item->look_down(class => 'newPrice')->find('span')->as_text };

    print "$title\n$price\n\n";
}
$html->delete;

I made simple script to demonstate Amazon search automation. Search url for all departments was changed with escaped search term. The rest of code is simple parsing with HTML::TreeBuilder. Structure of HTML in question can be easily examined with dump method (see commented-out line).

use strict; use warnings;

use LWP::Simple;
use URI::Escape;
use HTML::TreeBuilder;
use Try::Tiny;

my $look_for = "Archer Season 1";

my $contents
  = get "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords="
        . uri_escape($look_for);

my $html = HTML::TreeBuilder->new_from_content($contents);
for my $item ($html->look_down(id => qr/result_\d+/)) {
    # $item->dump;      # find out structure of HTML
    my $title = try { $item->look_down(class => 'productTitle')->as_trimmed_text };
    my $price = try { $item->look_down(class => 'newPrice')->find('span')->as_text };

    print "$title\n$price\n\n";
}
$html->delete;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文