通过 Perl 的 URI 模块过滤 URL 可以获得什么?
当我像这样转换我的 $url
时,我会得到一些东西:$url = URI->new( $url )
吗?
#!/usr/bin/env perl
use warnings; use strict;
use 5.012;
use URI;
use XML::LibXML;
my $url = 'http://stackoverflow.com/';
$url = URI->new( $url );
my $doc = XML::LibXML->load_html( location => $url, recover => 2 );
my @nodes = $doc->getElementsByTagName( 'a' );
say scalar @nodes;
Do I gain something when I transform my $url
like this: $url = URI->new( $url )
?
#!/usr/bin/env perl
use warnings; use strict;
use 5.012;
use URI;
use XML::LibXML;
my $url = 'http://stackoverflow.com/';
$url = URI->new( $url );
my $doc = XML::LibXML->load_html( location => $url, recover => 2 );
my @nodes = $doc->getElementsByTagName( 'a' );
say scalar @nodes;
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
URI 模块构造函数 会为您清理 URI - 例如正确转义对 URI 构造无效的字符(请参阅 URI::Escape )。
The URI module constructor would clean up the URI for you - for example correctly escape the characters invalid for URI construction (see URI::Escape).
URI 模块有以下几个好处:
通过显示的一小段代码所获得的好处是微乎其微的,但是当您继续解决问题时,也许会抓取站点,当您选择下一步要做什么时,URI 就会变得更加方便。
The URI module as several benefits:
The benefit that you get with the little bit of code that you show is minimal, but as you continue to work on the problem, perhaps spidering the site, URI becomes more handy as you select what to do next.
我很惊讶还没有人提到它,但是
$url = URI->new( $url );
并没有清理你的$url
并将其交还对您来说,它创建了URI
类(或者更确切地说,如果是其子类)的一个新对象,然后可以将其传递给需要URI
对象的其他代码。在这种情况下,这并不是特别重要,因为XML::LibXML
似乎很乐意接受字符串或对象形式的位置,但其他一些模块要求您为它们提供URI
对象并将拒绝以纯字符串形式呈现的 URL。I'm surprised nobody has mentioned it yet, but
$url = URI->new( $url );
doesn't clean up your$url
and hand it back to you, it creates a new object of classURI
(or, rather, of one if its subclasses) which can then be passed to other code which requires aURI
object. That's not particularly important in this case, sinceXML::LibXML
appears to be happy to accept locations as either strings or objects, but some other modules require you to give them aURI
object and will reject URLs presented as plain strings.