使用 Perl LWP:: 简单处理在线价格查询网站
在空闲时间,我一直在尝试通过编写一个脚本来提高我的 Perl 能力,该脚本使用 LWP::Simple 来轮询一个特定网站的产品页面以检查产品的价格(我有点像 Perl 菜鸟)。该脚本还保留了该商品的最后价格的非常简单的积压(因为价格经常变化)。
我想知道是否有任何方法可以进一步自动化脚本,这样我就不必显式地将页面的 URL 添加到初始哈希中(即保留关键术语数组并执行搜索查询 amazon 来查找页面或价格?)。有没有办法可以做到这一点,而不需要我复制亚马逊的搜索 URL 并解析我的关键字? (我知道使用正则表达式处理 HTML 通常是不好的形式,我只是使用它,因为我只需要一小部分数据)。
#!usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
my %oldPrice;
my %nameURL = (
"Archer Season 1" => "http://www.amazon.com/Archer-Season-H-Jon-Benjamin/dp/B00475B0G2/ref=sr_1_1?ie=UTF8&qid=1297282236&sr=8-1",
"Code Complete" => "http://www.amazon.com/Code-Complete-Practical-Handbook-Construction/dp/0735619670/ref=sr_1_1?ie=UTF8&qid=1296841986&sr=8-1",
"Intermediate Perl" => "http://www.amazon.com/Intermediate-Perl-Randal-L-Schwartz/dp/0596102062/ref=sr_1_1?s=books&ie=UTF8&qid=1297283720&sr=1-1",
"Inglorious Basterds (2-Disc)" => "http://www.amazon.com/Inglourious-Basterds-Two-Disc-Special-Brad/dp/B002T9H2LK/ref=sr_1_3?ie=UTF8&qid=1297283816&sr=8-3"
);
if (-e "backlog.txt"){
open (LOG, "backlog.txt");
while(){
chomp;
my @temp = split(/:\s/);
$oldPrice{$temp[0]} = $temp[1];
}
close(LOG);
}
print "\nChecking Daily Amazon Prices:\n";
open(LOG, ">backlog.txt");
foreach my $key (sort keys %nameURL){
my $content = get $nameURL{$key} or die;
$content =~ m{\s*\$(\d+.\d+)} || die;
if (exists $oldPrice{$key} && $oldPrice{$key} != $1){
print "$key: \$$1 (Was $oldPrice{$key})\n";
}
else{
print "\n$key: $1\n";
}
print LOG "$key: $1\n";
}
close(LOG);
In my free time, I've been trying to improve my perl abilities by working on a script that uses LWP::Simple to poll one specific website's product pages to check the prices of products (I'm somewhat of a perl noob). This script also keeps a very simple backlog of the last price seen for that item (since the prices change frequently).
I was wondering if there was any way I could further automate the script so that I don't have to explicitly add the page's URL to the initial hash (i.e. keep an array of key terms and do a search query amazon to find the page or price?). Is there anyway way I could do this that doesn't involve me just copying Amazon's search URL and parsing in my keywords? (I'm aware that processing HTML with regex is generally bad form, I just used it since I only need one small piece of data).
#!usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
my %oldPrice;
my %nameURL = (
"Archer Season 1" => "http://www.amazon.com/Archer-Season-H-Jon-Benjamin/dp/B00475B0G2/ref=sr_1_1?ie=UTF8&qid=1297282236&sr=8-1",
"Code Complete" => "http://www.amazon.com/Code-Complete-Practical-Handbook-Construction/dp/0735619670/ref=sr_1_1?ie=UTF8&qid=1296841986&sr=8-1",
"Intermediate Perl" => "http://www.amazon.com/Intermediate-Perl-Randal-L-Schwartz/dp/0596102062/ref=sr_1_1?s=books&ie=UTF8&qid=1297283720&sr=1-1",
"Inglorious Basterds (2-Disc)" => "http://www.amazon.com/Inglourious-Basterds-Two-Disc-Special-Brad/dp/B002T9H2LK/ref=sr_1_3?ie=UTF8&qid=1297283816&sr=8-3"
);
if (-e "backlog.txt"){
open (LOG, "backlog.txt");
while(){
chomp;
my @temp = split(/:\s/);
$oldPrice{$temp[0]} = $temp[1];
}
close(LOG);
}
print "\nChecking Daily Amazon Prices:\n";
open(LOG, ">backlog.txt");
foreach my $key (sort keys %nameURL){
my $content = get $nameURL{$key} or die;
$content =~ m{\s*\$(\d+.\d+)} || die;
if (exists $oldPrice{$key} && $oldPrice{$key} != $1){
print "$key: \$1 (Was $oldPrice{$key})\n";
}
else{
print "\n$key: $1\n";
}
print LOG "$key: $1\n";
}
close(LOG);
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
是的,设计可以改进。最好删除所有内容并从现有的全功能网页抓取应用程序或框架重新开始,但由于您想学习:
其他堆垛机,如果您想修改我的帖子并提供每条建议的理由,请继续编辑它。
Yes, the design can be improved. It's probably best to delete everything and start over with an existing full-featured web scraping application or framework, but since you want to learn:
Other stackers, if you want to amend my post with the rationale for each piece of advice, go ahead and edit it.
我制作了简单的脚本来演示亚马逊搜索自动化。所有部门的搜索网址均已使用转义搜索词进行更改。其余代码是使用 HTML::TreeBuilder 进行简单解析。可以使用
dump
方法轻松检查相关 HTML 的结构(请参阅注释掉的行)。I made simple script to demonstate Amazon search automation. Search url for all departments was changed with escaped search term. The rest of code is simple parsing with HTML::TreeBuilder. Structure of HTML in question can be easily examined with
dump
method (see commented-out line).