无法使用字符串作为哈希引用..?

发布于 2024-12-13 06:05:07 字数 1975 浏览 1 评论 0原文

我正在尝试为网络索引程序解析 HTML 文档。为此,我使用 HTML::TokeParser

我在第一个 if 语句的最后一行收到错误:

 if ( $token->[1] eq 'a' ) {
     #href attribute of tag A
     my $suffix = $token->[2]{href};

表示 Can't use string ("<./a>") as a HASH ref while "strict refs" in use at ./ indexer.pl 第 270 行,第 1 行。

我的问题是(后缀?或 <./a>?)是一个字符串,需要转换为哈希引用吗?我查看了其他有类似错误的帖子..但我仍然对此完全不确定。感谢您的任何帮助。

sub parse_document {

    #passed from input
    my $html_filename = $_[0];

    #base url for links
    my $base_url = $_[1];

    #created to hold tokens
    my @tokens = ();

    #created for doc links
    my @links = ();

    #creates parser
    my $p = HTML::TokeParser->new($html_filename);

    #loops through doc tags
    while (my $token = $p->get_token()) {
        #code for retrieving links
        if ( $token->[1] eq 'a' ) {
            # href attribute of tag A
           my $suffix = $token->[2]{href};

            #if href exists & isn't an email link
            if ( defined($suffix) && !($suffix =~ "^mailto:") ) {
                #make the url absolute
                my $new_url = make_absolute_url $base_url, $suffix;

                #make sure it's of the http:// scheme
                if ($new_url =~ "^http://"){
                    #normalize the url
                    my $new_normalized_url = normalize_url $new_url;

                    #add it to links array
                    push(@links, $new_normalized_url);
                }
            }
        }

        #code for text words
        if ($token->[0] eq 'T') {
            my $text =  $token->[1];

            #add words to end of array
            #(split by non-letter chars)
            my @words = split(/\P{L}+/, $text);
        }
    }

    return (\@tokens, \@links);
}

I'm trying to parse an HTML document for a web indexing program. To do this I'm using HTML::TokeParser.

I'm getting an error on the last line of my first if statement:

 if ( $token->[1] eq 'a' ) {
     #href attribute of tag A
     my $suffix = $token->[2]{href};

that says Can't use string ("<./a>") as a HASH ref while "strict refs" in use at ./indexer.pl line 270, <PAGE_DIR> line 1.

Is my problem that (the suffix? or <./a>?) is a string and needs to be turned into a hash ref? I looked at other posts that had similar errors.. but I'm still not at all sure about this. Thanks for any help.

sub parse_document {

    #passed from input
    my $html_filename = $_[0];

    #base url for links
    my $base_url = $_[1];

    #created to hold tokens
    my @tokens = ();

    #created for doc links
    my @links = ();

    #creates parser
    my $p = HTML::TokeParser->new($html_filename);

    #loops through doc tags
    while (my $token = $p->get_token()) {
        #code for retrieving links
        if ( $token->[1] eq 'a' ) {
            # href attribute of tag A
           my $suffix = $token->[2]{href};

            #if href exists & isn't an email link
            if ( defined($suffix) && !($suffix =~ "^mailto:") ) {
                #make the url absolute
                my $new_url = make_absolute_url $base_url, $suffix;

                #make sure it's of the http:// scheme
                if ($new_url =~ "^http://"){
                    #normalize the url
                    my $new_normalized_url = normalize_url $new_url;

                    #add it to links array
                    push(@links, $new_normalized_url);
                }
            }
        }

        #code for text words
        if ($token->[0] eq 'T') {
            my $text =  $token->[1];

            #add words to end of array
            #(split by non-letter chars)
            my @words = split(/\P{L}+/, $text);
        }
    }

    return (\@tokens, \@links);
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

◇流星雨 2024-12-20 06:05:07

get_token() 方法返回一个数组,其中 $token->[2] 是包含您的 href 的哈希引用(仅当 $token->[0] 是一个 S(即开始标记)。在这种情况下,您将匹配结束标记(其中 $token->[0] 是 E)。有关详细信息,请参阅 PerlDoc

要修复此问题,请

next if $token->[0] ne 'S';

在循环顶部添加 a 。

The get_token() method returns an array where $token->[2] is a hash reference containing your href only if $token->[0] is an S (that is, a start tag). In this case, you are matching an end tag (where $token->[0] is an E). See the PerlDoc for details.

To fix, add a

next if $token->[0] ne 'S';

at the top of your loop.

笨笨の傻瓜 2024-12-20 06:05:07

$token->[2] 是一个字符串,而不是哈希引用。

执行 print $token->[2] ,您将看到它是一个包含 的字符串

$token->[2] is a string, not a hash reference.

Do a print $token->[2] and you'll see that it is a string containing </a>

不醒的梦 2024-12-20 06:05:07

显然 $token->[2] 正在解析为值为 "" 的哈希引用。肯定不是你想要的!

apparently $token->[2] is resolving as a hash reference whose value is "</a>". Certainly not want you want!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文