Perl Treebuilder HTML 解析,似乎无法解析到 DIV,出现错误“在模式匹配中使用未初始化的值”

发布于 2024-09-09 22:17:03 字数 2275 浏览 2 评论 0原文

我是使用 Perl 树构建器模块进行 HTML 解析的新手,无法弄清楚这个问题是什么。我花了几个小时试图让它工作并查看了一些教程,但我仍然得到这个错误:“在模式匹配中使用未初始化的值”,指的是我的代码中的这一行:

sub{ $_[0]-> tag() eq 'div' and ($_[0]->attr('class') =~ /snap_preview/)}
        );

此错误在终端中多次打印,我已经一遍又一遍地检查了所有内容,并且它肯定会获取输入,因为 $downloaded 页面是包含我在下面给出的字符串的完整 HTML 文件...非常感谢任何建议。

示例字符串,包含在我的代码的 $downloadedpage 变量中

        <div class='snap_preview'><p><a href="http://recipe4all.com/dishes/mexican/"><img src="http://www.dishbase.com/recipe_images/large/chicken-enchiladas-12005010871.jpg" width="160" height="115" align="left" border="0" alt="Mexican dishes recipes" style="border:none;"></a><a href="http://recipe4all.com/dishes/mexican/"><b>Mexican dishes recipes</b></a> <i></i><br />
Mexican cuisine is popular the world over for its intense flavor and colorful presentation. Traditional Mexican recipes such as tacos, quesadillas, enchiladas and barbacoa are consistently explored for options by some of the world&#8217;s foremost gourmet chefs. A celebration of spices and unique culinary trends, Mexican food is now dominating world cuisines.</p>
<div style="margin-top: 1em" class="possibly-related"><hr /><p><strong>Possibly related posts: (automatically generated)</strong></p><ul><li><a rel='related' href='http://vireja59.wordpress.com/2010/02/13/all-best-italian-dishes-recipes/' style='font-weight:bold'>All best Italian dishes recipes</a></li><li><a rel='related' href='http://vireja59.wordpress.com/2010/05/24/liver-dishes-recipes/' style='font-weight:bold'>Liver dishes recipes</a></li><li><a rel='related' href='http://vireja59.wordpress.com/2010/04/24/parsley-in-cooking/' style='font-weight:bold'>Parsley in cooking</a></li></ul></div>

    my $tree = HTML::TreeBuilder->new();
    $tree->parse($downloadedpage);
    $tree->eof();

    #the article is in the div with class "snap_preview"
    @article = $tree->look_down(
    sub{ $_[0]-> tag() eq 'div' and ($_[0]->attr('class') =~ /snap_preview/)}
    );

I'm new to using the Perl treebuilder module for HTML parsing and can't figure out what the issue is with this.. I have spent a few hours trying to get this to work and looked at a few tutorials but am still getting this error: "Use of uninitialized value in pattern match ", referring to this line in my code:

sub{ $_[0]-> tag() eq 'div' and ($_[0]->attr('class') =~ /snap_preview/)}
        );

This error prints out many times in the terminal, I have checked everything over and over and its definitely getting the input as the $downloaded page is a full HTML file that contains the string I give below... any advice is greatly appreciated.

sample string, contained within the $downloadedpage variable

        <div class='snap_preview'><p><a href="http://recipe4all.com/dishes/mexican/"><img src="http://www.dishbase.com/recipe_images/large/chicken-enchiladas-12005010871.jpg" width="160" height="115" align="left" border="0" alt="Mexican dishes recipes" style="border:none;"></a><a href="http://recipe4all.com/dishes/mexican/"><b>Mexican dishes recipes</b></a> <i></i><br />
Mexican cuisine is popular the world over for its intense flavor and colorful presentation. Traditional Mexican recipes such as tacos, quesadillas, enchiladas and barbacoa are consistently explored for options by some of the world’s foremost gourmet chefs. A celebration of spices and unique culinary trends, Mexican food is now dominating world cuisines.</p>
<div style="margin-top: 1em" class="possibly-related"><hr /><p><strong>Possibly related posts: (automatically generated)</strong></p><ul><li><a rel='related' href='http://vireja59.wordpress.com/2010/02/13/all-best-italian-dishes-recipes/' style='font-weight:bold'>All best Italian dishes recipes</a></li><li><a rel='related' href='http://vireja59.wordpress.com/2010/05/24/liver-dishes-recipes/' style='font-weight:bold'>Liver dishes recipes</a></li><li><a rel='related' href='http://vireja59.wordpress.com/2010/04/24/parsley-in-cooking/' style='font-weight:bold'>Parsley in cooking</a></li></ul></div>

my code:

    my $tree = HTML::TreeBuilder->new();
    $tree->parse($downloadedpage);
    $tree->eof();

    #the article is in the div with class "snap_preview"
    @article = $tree->look_down(
    sub{ $_[0]-> tag() eq 'div' and ($_[0]->attr('class') =~ /snap_preview/)}
    );

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

乖乖公主 2024-09-16 22:17:03

使用您提供的确切代码和示例,

use warnings;
use strict;
use HTML::TreeBuilder;
my $downloadedpage=<<EOF;
<div class='snap_preview'><p><a href="http://recipe4all.com/dishes/mexican/"><img src="http://www.dishbase.com/recipe_images/large/chicken-enchiladas-12005010871.jpg" width="160" height="115" align="left" border="0" alt="Mexican dishes recipes" style="border:none;"></a><a href="http://recipe4all.com/dishes/mexican/"><b>Mexican dishes recipes</b></a> <i></i><br />
Mexican cuisine is popular the world over for its intense flavor and colorful presentation. Traditional Mexican recipes such as tacos, quesadillas, enchiladas and barbacoa are consistently explored for options by some of the world’s foremost gourmet chefs. A celebration of spices and unique culinary trends, Mexican food is now dominating world cuisines.</p>
<div style="margin-top: 1em" class="possibly-related"><hr /><p><strong>Possibly related posts: (automatically generated)</strong></p><ul><li><a rel='related' href='http://vireja59.wordpress.com/2010/02/13/all-best-italian-dishes-recipes/' style='font-weight:bold'>All best Italian dishes recipes</a></li><li><a rel='related' href='http://vireja59.wordpress.com/2010/05/24/liver-dishes-recipes/' style='font-weight:bold'>Liver dishes recipes</a></li><li><a rel='related' href='http://vireja59.wordpress.com/2010/04/24/parsley-in-cooking/' style='font-weight:bold'>Parsley in cooking</a></li></ul></div>
EOF

my $tree = HTML::TreeBuilder->new();
    $tree->parse($downloadedpage);
    $tree->eof();

    #the article is in the div with class "snap_preview"
    my @article = $tree->look_down(
    sub{ $_[0]-> tag() eq 'div' and ($_[0]->attr('class') =~ /snap_preview/)}
    );

我根本没有收到任何错误。我的第一个猜测是 HTML 中有一些没有 class 属性的

也许你需要

sub{
     $_[0]-> tag() eq 'div' and 
     $_[0]->attr('class') and 
     ($_[0]->attr('class') =~ /snap_preview/)
}

在那里写?

Using the exact code and example you gave,

use warnings;
use strict;
use HTML::TreeBuilder;
my $downloadedpage=<<EOF;
<div class='snap_preview'><p><a href="http://recipe4all.com/dishes/mexican/"><img src="http://www.dishbase.com/recipe_images/large/chicken-enchiladas-12005010871.jpg" width="160" height="115" align="left" border="0" alt="Mexican dishes recipes" style="border:none;"></a><a href="http://recipe4all.com/dishes/mexican/"><b>Mexican dishes recipes</b></a> <i></i><br />
Mexican cuisine is popular the world over for its intense flavor and colorful presentation. Traditional Mexican recipes such as tacos, quesadillas, enchiladas and barbacoa are consistently explored for options by some of the world’s foremost gourmet chefs. A celebration of spices and unique culinary trends, Mexican food is now dominating world cuisines.</p>
<div style="margin-top: 1em" class="possibly-related"><hr /><p><strong>Possibly related posts: (automatically generated)</strong></p><ul><li><a rel='related' href='http://vireja59.wordpress.com/2010/02/13/all-best-italian-dishes-recipes/' style='font-weight:bold'>All best Italian dishes recipes</a></li><li><a rel='related' href='http://vireja59.wordpress.com/2010/05/24/liver-dishes-recipes/' style='font-weight:bold'>Liver dishes recipes</a></li><li><a rel='related' href='http://vireja59.wordpress.com/2010/04/24/parsley-in-cooking/' style='font-weight:bold'>Parsley in cooking</a></li></ul></div>
EOF

my $tree = HTML::TreeBuilder->new();
    $tree->parse($downloadedpage);
    $tree->eof();

    #the article is in the div with class "snap_preview"
    my @article = $tree->look_down(
    sub{ $_[0]-> tag() eq 'div' and ($_[0]->attr('class') =~ /snap_preview/)}
    );

I don't get any errors at all. My first guess would be that there are some <div>s in the HTML which don't have a class attribute.

Maybe you need to write

sub{
     $_[0]-> tag() eq 'div' and 
     $_[0]->attr('class') and 
     ($_[0]->attr('class') =~ /snap_preview/)
}

there?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文