无法使用字符串作为哈希引用..?
我正在尝试为网络索引程序解析 HTML 文档。为此,我使用 HTML::TokeParser。
我在第一个 if 语句的最后一行收到错误:
if ( $token->[1] eq 'a' ) {
#href attribute of tag A
my $suffix = $token->[2]{href};
表示 Can't use string ("<./a>") as a HASH ref while "strict refs" in use at ./ indexer.pl 第 270 行,
我的问题是(后缀?或 <./a>
?)是一个字符串,需要转换为哈希引用吗?我查看了其他有类似错误的帖子..但我仍然对此完全不确定。感谢您的任何帮助。
sub parse_document {
#passed from input
my $html_filename = $_[0];
#base url for links
my $base_url = $_[1];
#created to hold tokens
my @tokens = ();
#created for doc links
my @links = ();
#creates parser
my $p = HTML::TokeParser->new($html_filename);
#loops through doc tags
while (my $token = $p->get_token()) {
#code for retrieving links
if ( $token->[1] eq 'a' ) {
# href attribute of tag A
my $suffix = $token->[2]{href};
#if href exists & isn't an email link
if ( defined($suffix) && !($suffix =~ "^mailto:") ) {
#make the url absolute
my $new_url = make_absolute_url $base_url, $suffix;
#make sure it's of the http:// scheme
if ($new_url =~ "^http://"){
#normalize the url
my $new_normalized_url = normalize_url $new_url;
#add it to links array
push(@links, $new_normalized_url);
}
}
}
#code for text words
if ($token->[0] eq 'T') {
my $text = $token->[1];
#add words to end of array
#(split by non-letter chars)
my @words = split(/\P{L}+/, $text);
}
}
return (\@tokens, \@links);
}
I'm trying to parse an HTML document for a web indexing program. To do this I'm using HTML::TokeParser.
I'm getting an error on the last line of my first if statement:
if ( $token->[1] eq 'a' ) {
#href attribute of tag A
my $suffix = $token->[2]{href};
that says Can't use string ("<./a>") as a HASH ref while "strict refs" in use at ./indexer.pl line 270, <PAGE_DIR> line 1.
Is my problem that (the suffix? or <./a>
?) is a string and needs to be turned into a hash ref? I looked at other posts that had similar errors.. but I'm still not at all sure about this. Thanks for any help.
sub parse_document {
#passed from input
my $html_filename = $_[0];
#base url for links
my $base_url = $_[1];
#created to hold tokens
my @tokens = ();
#created for doc links
my @links = ();
#creates parser
my $p = HTML::TokeParser->new($html_filename);
#loops through doc tags
while (my $token = $p->get_token()) {
#code for retrieving links
if ( $token->[1] eq 'a' ) {
# href attribute of tag A
my $suffix = $token->[2]{href};
#if href exists & isn't an email link
if ( defined($suffix) && !($suffix =~ "^mailto:") ) {
#make the url absolute
my $new_url = make_absolute_url $base_url, $suffix;
#make sure it's of the http:// scheme
if ($new_url =~ "^http://"){
#normalize the url
my $new_normalized_url = normalize_url $new_url;
#add it to links array
push(@links, $new_normalized_url);
}
}
}
#code for text words
if ($token->[0] eq 'T') {
my $text = $token->[1];
#add words to end of array
#(split by non-letter chars)
my @words = split(/\P{L}+/, $text);
}
}
return (\@tokens, \@links);
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
get_token()
方法返回一个数组,其中$token->[2]
是包含您的href
的哈希引用(仅当$token->[0]
是一个 S(即开始标记)。在这种情况下,您将匹配结束标记(其中$token->[0]
是 E)。有关详细信息,请参阅 PerlDoc。要修复此问题,请
在循环顶部添加 a 。
The
get_token()
method returns an array where$token->[2]
is a hash reference containing yourhref
only if$token->[0]
is an S (that is, a start tag). In this case, you are matching an end tag (where$token->[0]
is an E). See the PerlDoc for details.To fix, add a
at the top of your loop.
$token->[2]
是一个字符串,而不是哈希引用。执行
print $token->[2]
,您将看到它是一个包含的字符串
$token->[2]
is a string, not a hash reference.Do a
print $token->[2]
and you'll see that it is a string containing</a>
显然
$token->[2]
正在解析为值为""
的哈希引用。肯定不是你想要的!apparently
$token->[2]
is resolving as a hash reference whose value is"</a>"
. Certainly not want you want!