如何使用 Perl 的 LWP::UserAgent 获取具有不同查询字符串的相同 URL?
我查阅了有关使用 LWP 的文章,但我仍然迷失了!在这个网站上,我们找到了许多学校的列表;请参阅 概述页面 并点击一些链接并获取一些结果页面:
我想使用 LWP::UserAgent 解析站点,并且为了解析:想要使用 HTML::TreeBuilder::XPath 或 HTML::TokeParser
目前我正在考虑选择正确的 get-request! 我对 LWP::Useragent 有一些问题。可以通过直接链接访问概述的子网站。但-注意:每个网站都有内容。例如,上述结果页面的以下 URL。
作为这里的新手,我无法通过发布完整的 URL 来向您展示不同结尾的结尾,但在这里您可以看到结尾:
id=21&extern_eid=709
id=21&extern_eid=789
id=21&extern_eid=1297
id=21&extern_eid=761
有许多不同的 URL,它们的 URL 结尾有所不同。问题是:如何运行 LWP::UserAgent?我想要获取并解析& ** 所有 - 1000 个站点。**
问题; LWP 会自动完成这项工作吗?或者我是否必须设置 LWP :: UserAgent,它会自动查找不同的 URL...
解决方案:也许我们必须使用
extern_eid=709 -(从零计数到 100000)这里
www -db.sn.schule.de/index.php?id=21&extern_eid=709
顺便说一句:这里是 LWP 用户代理的数据;
请求方法描述的方法 本节中用于调度 通过用户代理请求。这 以下请求方法是 提供:
$ua->get( $url ) $ua->get( $url , $field_name =>; $值,...)
此方法将发送 GET 对给定 $url 的请求。更远 可以给出参数来初始化 请求的标头。这些都是 作为单独的名称/值对给出。 返回值是一个响应对象。 有关说明,请参阅 HTTP::Response 它提供的接口。那里 仍然是一个响应对象 当 LWP 无法连接到时返回 URL 中指定的服务器或当 协议处理程序中的其他故障 发生。
问题是:如何在上述网站上以正确的方式有效地使用 LWP::UserAgent!?
我期待任何和所有的帮助!
I looked up articles about using LWP however I am still lost! On this site we find a list of many schools; see the overview-page and follow some of the links and get some result pages:
I want to parse the sites using LWP::UserAgent and for the parsing : want to use either HTML::TreeBuilder::XPath or HTML::TokeParser
At the moment I am musing bout choosing the right get-request!
I have some issues with the LWP::Useragent. The subsite of the overview can be reached via direct links. but -note: each site has content. e.g. the following URLs of the above mentioned result-pages.
As a Novice here I cannot show you the endings of the different endings by posting the full URL but here you can see the endings:
id=21&extern_eid=709
id=21&extern_eid=789
id=21&extern_eid=1297
id=21&extern_eid=761
There are many different URLS that differ in the end of the URL. The question is : how to i run LWP::UserAgent? I want fetch and parse & ** all the - 1000 sites.**
Question; Does LWP do the job automatically!? Or do i have to set up LWP :: UserAgent that it will look up the different URLS automatically...
Solutions: Perhaps we have to count up form zero to 10000 with there
extern_eid=709 -(count from zero to 100000) here
www-db.sn.schule.de/index.php?id=21&extern_eid=709
BTW: Here the data for LWP User Agent;
REQUEST METHODS The methods described
in this section are used to dispatch
requests via the user agent. The
following request methods are
provided:$ua->get( $url ) $ua->get( $url ,
$field_name => $value, ... )This method will dispatch a GET
request on the given $url. Further
arguments can be given to initialize
the headers of the request. These are
given as separate name/value pairs.
The return value is a response object.
See HTTP::Response for a description
of the interface it provides. There
will still be a response object
returned when LWP can't connect to the
server specified in the URL or when
other failures in protocol handlers
occur.
The question is: How to use LWP::UserAgent on the above mentioned site the right way - effectively!?
I look forward to any and all help!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果我正确理解您的问题,您正在尝试在具有不同查询参数的相同 URL 上使用 LWP::UserAgent,并且您想知道 LWP::UserAgent 是否提供了一种循环查询参数的方法?
我不认为 LWP::UserAgent 有办法让你做到这一点。但是,您可以循环构造 URL 并重复使用 LWP::UserAgent:
或者,您可以添加一个 request_prepare 处理程序,在发送请求之前计算并添加查询参数。
If I understand your question correctly, you are trying to use LWP::UserAgent on same URLs with different query arguments, and you are wondering if LWP::UserAgent provides a way for you to loop through the query arguments?
I don't think LWP::UserAgent has a method for you to do that. However, you can have a loop constructing the URLs and use LWP::UserAgent repeatedly:
Alternatively you can add a request_prepare handler that computes and add the query arguments before you send out the request.
您出于网络抓取的目的描述了以下链接。 LWP 子类
WWW::Mechanize
比您当前的更容易做到这一点试图。You describe following links for the purpose of web scraping. The LWP subclass
WWW::Mechanize
does this more easily than your current attempt.