如何使用 Perl 的 LWP::UserAgent 使用不同的查询字符串获取相同的 URL?
我有一个正在运行的 LWP::UserAgent ,应该应用于以下 URL:
http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school=5503
这与许多类似的目标一起运行,请参阅以下结尾:
html?show_school=5503
html?show_school=9002
html?show_school=5512
我想使用 use LWP::UserAgent 来执行此操作:
for my $i (0..10000)
{ $ua->get(' [here the URL should be applied] ', id => 21, extern_uid => $i);
# process reply }
在任何情况下,使用这样的循环 for这种工作是一种方法。我想 LWP 的 API 并不是旨在取代核心 Perl 的功能,而且我可以使用 Perl 循环来查询多个 URL。
由于必须应用循环而无法运行的代码:
#use strict;
use DBI;
use LWP::UserAgent;
use HTTP::Request::Common;
use HTML::TreeBuilder::XPath;
# first get a list of all schools
my ($url = '[here the url should be applied] =',id);
for my $id (0..10000) {
$ua->get(' [here the url should be applied ] ', id => 21, extern_uid => $i);
# process reply
}
#my $request = POST $url,
# [
# Schulsuche=> "Ergebnisse anzeigen",
# order => "schule_ort",
# schulname => undef,
# schulort => undef,
# typid => "11",
# verbinder => "AND"
# ];
my $ua = LWP::UserAgent->new;
print "getting all schools - this could take some time\n";
my $response = $ua->request($request);
# extract the ids
my @ids = $response->content =~ /getSchoolDetail\((\d+)/gs;
print "found " . scalar @ids . " schools\n";
# for this demo we only do the first 5
my @ids_to_do = @ids[0..4];
# use your own user and password
my $dbh = DBI->connect("DBI:mysql:database=schulen", "user", "pass", { AutoCommit => 0 }) or die $!;
my $sth = $dbh->prepare(<<sqlend);
insert into schulen ( name , plz , ort, strasse , tel, fax , mail, quelle , original_id )
values ( ?, ?, ?, ?, ?, ?, ?, ?, ? )
sqlend
# now loop over ids
for my $id (@ids_to_do) {
# get detail information for id
my $res = $ua->get("[url]=> &gid=$id");
# parse the response
my $tree = HTML::TreeBuilder::XPath->new;
$tree->parse($res->content);
my $xpath = q|//div[@id='MCinhview']//div[@class='contentitem']//table|;
my ($adress_table, $tel_table) = $tree->findnodes($xpath);
my ($adr) = $adress_table->find("td");
my ($name, $city, $street) = map { s/^\s*//; s/\s*$//; $_ } ($adr->content_list)[2,4,6];
my($plz, $ort) = $city =~ /^(\d+)\s*(.*)/;
my ($tel, $fax, $mail) = map { s/^\s*//; s/\s*$//; $_ } map { ($_->content_list)[1] } $tel_table->find("td");
$sth->execute($name, $plz, $ort, $street, $tel, $fax, $mail, "SA", $id);
$dbh->commit;
$tree->delete;
print "$name done\n";
}
10 月 25 日星期日更新: 我已应用 OmnipotEntity 的建议。
#!/usr/bin/perl -W
use strict;
use warnings; # give out some warnings if something does not run well
use diagnostics; # tell me when something is wrong
use DBI;
use LWP::UserAgent;
use HTTP::Request::Common;
use HTML::TreeBuilder::XPath;
# first get a list of all schools
my $ua = LWP::UserAgent->new;
$ua->agent("Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7");
#pretending to be firefox on linux.
for my $i (0..10000) {
my $request = HTTP::Request->new(GET => sprintf(" here to put the URL into =%d", $i));
$request->header('Accept' => 'text/html');
my $response = $ua->request($request);
if ($response->is_success) {
$pagecontent = $response -> content;
}
# now we can do whatever with the $pagecontent
}
my $request = POST $url,
[
order => "schule_ort",
schulname => undef,
Basisdaten => undef,
Profil => undef,
Schulort => undef,
typid => "11",
Fax =>
Homepage => undef,
verbinder => "AND"
];
print "getting all schools - this could take some time\n";
my $response = $ua->request($request);
# extract the ids
my @ids = $response->content =~ /getSchoolDetail\((\d+)/gs;
print "found " . scalar @ids . " schools\n";
# for this demo we only do the first 5
my @ids_to_do = @ids[0..4];
# use your own user and password
my $dbh = DBI->connect("DBI:mysql:database=schulen", "user", "pass", { AutoCommit => 0 }) or die $!;
my $sth = $dbh->prepare(<<sqlend);
insert into schulen ( name , plz , ort, strasse , tel, fax , mail, quelle , original_id )
values ( ?, ?, ?, ?, ?, ?, ?, ?, ? )
sqlend
# now loop over ids
for my $id (@ids_to_do) {
# get detail information for id
my $res = $ua->get(" here to put the URL into => &gid=$id");
# parse the response
my $tree = HTML::TreeBuilder::XPath->new;
$tree->parse($res->content);
my $xpath = q|//div[@id='MCinhview']//div[@class='floatbox']//table|;
my ($adress_table, $tel_table) = $tree->findnodes($xpath);
my ($adr) = $adress_table->find("td");
my ($name, $city, $street) = map { s/^\s*//; s/\s*$//; $_ } ($adr->content_list)[2,4,6];
my($plz, $ort) = $city =~ /^(\d+)\s*(.*)/;
my ($tel, $fax, $mail) = map { s/^\s*//; s/\s*$//; $_ } map { ($_->content_list)[1] } $tel_table->find("td");
$sth->execute($name, $plz, $ort, $street, $tel, $fax, $mail, "SA", $id);
$dbh->commit;
$tree->delete;
print "$name done\n";
}
我想循环遍历结果,因此我尝试应用相应的 URL,但出现了一堆错误:
suse-linux:/usr/perl # perl perl_mecha_example_two.pl Global symbol "$pagecontent" requires explicit package name at perl_mecha_example_two.pl line 24. Global symbol "$url" requires explicit package name at perl_mecha_example_two.pl line 29. Execution of perl_mecha_example_two.pl aborted due to compilation errors (#1) (F) You've said "use strict" or "use strict vars", which indicates that all variables must either be lexically scoped (using "my" or "state"), declared beforehand using "our", or explicitly qualified to say which package the global variable is in (using "::"). Uncaught exception from user code: Global symbol "$pagecontent" requires explicit package name at perl_mecha_example_two.pl line 24. Global symbol "$url" requires explicit package name at perl_mecha_example_two.pl line 29. Execution of perl_mecha_example_two.pl aborted due to compilation errors. at perl_mecha_example_two.pl line 86
现在是调试部分。我要改变什么?如何以正确的方式应用 URL?
当我使用 strict 时,在声明变量之前不允许使用它。通常的解决方法是在第一次出现时添加 my
,例如 my $url
和 my $pagecontent
。
I have a running LWP::UserAgent that should be applied on following URL:
http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school=5503
This runs with many many similar targets see the following endings:
html?show_school=5503
html?show_school=9002
html?show_school=5512
I want to do this with use LWP::UserAgent:
for my $i (0..10000)
{ $ua->get(' [here the URL should be applied] ', id => 21, extern_uid => $i);
# process reply }
In any case, using a loop like this for that kind of job is a way to do it. I guess the LWP's API does not aim to replace the functionality of the core Perl, and I can use Perl loops to query multiple URLs.
The code that does not run because the loop has to be applied:
#use strict;
use DBI;
use LWP::UserAgent;
use HTTP::Request::Common;
use HTML::TreeBuilder::XPath;
# first get a list of all schools
my ($url = '[here the url should be applied] =',id);
for my $id (0..10000) {
$ua->get(' [here the url should be applied ] ', id => 21, extern_uid => $i);
# process reply
}
#my $request = POST $url,
# [
# Schulsuche=> "Ergebnisse anzeigen",
# order => "schule_ort",
# schulname => undef,
# schulort => undef,
# typid => "11",
# verbinder => "AND"
# ];
my $ua = LWP::UserAgent->new;
print "getting all schools - this could take some time\n";
my $response = $ua->request($request);
# extract the ids
my @ids = $response->content =~ /getSchoolDetail\((\d+)/gs;
print "found " . scalar @ids . " schools\n";
# for this demo we only do the first 5
my @ids_to_do = @ids[0..4];
# use your own user and password
my $dbh = DBI->connect("DBI:mysql:database=schulen", "user", "pass", { AutoCommit => 0 }) or die $!;
my $sth = $dbh->prepare(<<sqlend);
insert into schulen ( name , plz , ort, strasse , tel, fax , mail, quelle , original_id )
values ( ?, ?, ?, ?, ?, ?, ?, ?, ? )
sqlend
# now loop over ids
for my $id (@ids_to_do) {
# get detail information for id
my $res = $ua->get("[url]=> &gid=$id");
# parse the response
my $tree = HTML::TreeBuilder::XPath->new;
$tree->parse($res->content);
my $xpath = q|//div[@id='MCinhview']//div[@class='contentitem']//table|;
my ($adress_table, $tel_table) = $tree->findnodes($xpath);
my ($adr) = $adress_table->find("td");
my ($name, $city, $street) = map { s/^\s*//; s/\s*$//; $_ } ($adr->content_list)[2,4,6];
my($plz, $ort) = $city =~ /^(\d+)\s*(.*)/;
my ($tel, $fax, $mail) = map { s/^\s*//; s/\s*$//; $_ } map { ($_->content_list)[1] } $tel_table->find("td");
$sth->execute($name, $plz, $ort, $street, $tel, $fax, $mail, "SA", $id);
$dbh->commit;
$tree->delete;
print "$name done\n";
}
update on sunday october 25 th: I have applied the advice from OmnipotentEntity.
#!/usr/bin/perl -W
use strict;
use warnings; # give out some warnings if something does not run well
use diagnostics; # tell me when something is wrong
use DBI;
use LWP::UserAgent;
use HTTP::Request::Common;
use HTML::TreeBuilder::XPath;
# first get a list of all schools
my $ua = LWP::UserAgent->new;
$ua->agent("Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7");
#pretending to be firefox on linux.
for my $i (0..10000) {
my $request = HTTP::Request->new(GET => sprintf(" here to put the URL into =%d", $i));
$request->header('Accept' => 'text/html');
my $response = $ua->request($request);
if ($response->is_success) {
$pagecontent = $response -> content;
}
# now we can do whatever with the $pagecontent
}
my $request = POST $url,
[
order => "schule_ort",
schulname => undef,
Basisdaten => undef,
Profil => undef,
Schulort => undef,
typid => "11",
Fax =>
Homepage => undef,
verbinder => "AND"
];
print "getting all schools - this could take some time\n";
my $response = $ua->request($request);
# extract the ids
my @ids = $response->content =~ /getSchoolDetail\((\d+)/gs;
print "found " . scalar @ids . " schools\n";
# for this demo we only do the first 5
my @ids_to_do = @ids[0..4];
# use your own user and password
my $dbh = DBI->connect("DBI:mysql:database=schulen", "user", "pass", { AutoCommit => 0 }) or die $!;
my $sth = $dbh->prepare(<<sqlend);
insert into schulen ( name , plz , ort, strasse , tel, fax , mail, quelle , original_id )
values ( ?, ?, ?, ?, ?, ?, ?, ?, ? )
sqlend
# now loop over ids
for my $id (@ids_to_do) {
# get detail information for id
my $res = $ua->get(" here to put the URL into => &gid=$id");
# parse the response
my $tree = HTML::TreeBuilder::XPath->new;
$tree->parse($res->content);
my $xpath = q|//div[@id='MCinhview']//div[@class='floatbox']//table|;
my ($adress_table, $tel_table) = $tree->findnodes($xpath);
my ($adr) = $adress_table->find("td");
my ($name, $city, $street) = map { s/^\s*//; s/\s*$//; $_ } ($adr->content_list)[2,4,6];
my($plz, $ort) = $city =~ /^(\d+)\s*(.*)/;
my ($tel, $fax, $mail) = map { s/^\s*//; s/\s*$//; $_ } map { ($_->content_list)[1] } $tel_table->find("td");
$sth->execute($name, $plz, $ort, $street, $tel, $fax, $mail, "SA", $id);
$dbh->commit;
$tree->delete;
print "$name done\n";
}
I want to loop over the results and therefore I tried to apply the corresponding URLs but I got a bunch of errors:
suse-linux:/usr/perl # perl perl_mecha_example_two.pl Global symbol "$pagecontent" requires explicit package name at perl_mecha_example_two.pl line 24. Global symbol "$url" requires explicit package name at perl_mecha_example_two.pl line 29. Execution of perl_mecha_example_two.pl aborted due to compilation errors (#1) (F) You've said "use strict" or "use strict vars", which indicates that all variables must either be lexically scoped (using "my" or "state"), declared beforehand using "our", or explicitly qualified to say which package the global variable is in (using "::"). Uncaught exception from user code: Global symbol "$pagecontent" requires explicit package name at perl_mecha_example_two.pl line 24. Global symbol "$url" requires explicit package name at perl_mecha_example_two.pl line 29. Execution of perl_mecha_example_two.pl aborted due to compilation errors. at perl_mecha_example_two.pl line 86
Now the debugging part. What do I change? How to apply the URLs in the right way?
When I use strict I'm not allowed to use a variable before I declare it. The usual fix is to prepend my
, e.g. my $url
and my $pagecontent
on the first appearance of it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
它很简单:
假设您想要获取所有 10000 个页面。如果您只想获取特定的数字,那么您应该尝试将这些数字放入数组中,然后遍历该数组,而不是 1..10000
It's as simple as:
This is assuming you want to fetch all 10000 pages. If you only want to fetch particular ones then you should try throwing those numbers in an array, and then have for walk that array, rather than 1..10000