Perl Treebuilder HTML 解析,似乎无法解析到 DIV,出现错误“在模式匹配中使用未初始化的值”
我是使用 Perl 树构建器模块进行 HTML 解析的新手,无法弄清楚这个问题是什么。我花了几个小时试图让它工作并查看了一些教程,但我仍然得到这个错误:“在模式匹配中使用未初始化的值”,指的是我的代码中的这一行:
sub{ $_[0]-> tag() eq 'div' and ($_[0]->attr('class') =~ /snap_preview/)}
);
此错误在终端中多次打印,我已经一遍又一遍地检查了所有内容,并且它肯定会获取输入,因为 $downloaded 页面是包含我在下面给出的字符串的完整 HTML 文件...非常感谢任何建议。
示例字符串,包含在我的代码的 $downloadedpage 变量中
<div class='snap_preview'><p><a href="http://recipe4all.com/dishes/mexican/"><img src="http://www.dishbase.com/recipe_images/large/chicken-enchiladas-12005010871.jpg" width="160" height="115" align="left" border="0" alt="Mexican dishes recipes" style="border:none;"></a><a href="http://recipe4all.com/dishes/mexican/"><b>Mexican dishes recipes</b></a> <i></i><br />
Mexican cuisine is popular the world over for its intense flavor and colorful presentation. Traditional Mexican recipes such as tacos, quesadillas, enchiladas and barbacoa are consistently explored for options by some of the world’s foremost gourmet chefs. A celebration of spices and unique culinary trends, Mexican food is now dominating world cuisines.</p>
<div style="margin-top: 1em" class="possibly-related"><hr /><p><strong>Possibly related posts: (automatically generated)</strong></p><ul><li><a rel='related' href='http://vireja59.wordpress.com/2010/02/13/all-best-italian-dishes-recipes/' style='font-weight:bold'>All best Italian dishes recipes</a></li><li><a rel='related' href='http://vireja59.wordpress.com/2010/05/24/liver-dishes-recipes/' style='font-weight:bold'>Liver dishes recipes</a></li><li><a rel='related' href='http://vireja59.wordpress.com/2010/04/24/parsley-in-cooking/' style='font-weight:bold'>Parsley in cooking</a></li></ul></div>
:
my $tree = HTML::TreeBuilder->new();
$tree->parse($downloadedpage);
$tree->eof();
#the article is in the div with class "snap_preview"
@article = $tree->look_down(
sub{ $_[0]-> tag() eq 'div' and ($_[0]->attr('class') =~ /snap_preview/)}
);
I'm new to using the Perl treebuilder module for HTML parsing and can't figure out what the issue is with this.. I have spent a few hours trying to get this to work and looked at a few tutorials but am still getting this error: "Use of uninitialized value in pattern match ", referring to this line in my code:
sub{ $_[0]-> tag() eq 'div' and ($_[0]->attr('class') =~ /snap_preview/)}
);
This error prints out many times in the terminal, I have checked everything over and over and its definitely getting the input as the $downloaded page is a full HTML file that contains the string I give below... any advice is greatly appreciated.
sample string, contained within the $downloadedpage variable
<div class='snap_preview'><p><a href="http://recipe4all.com/dishes/mexican/"><img src="http://www.dishbase.com/recipe_images/large/chicken-enchiladas-12005010871.jpg" width="160" height="115" align="left" border="0" alt="Mexican dishes recipes" style="border:none;"></a><a href="http://recipe4all.com/dishes/mexican/"><b>Mexican dishes recipes</b></a> <i></i><br />
Mexican cuisine is popular the world over for its intense flavor and colorful presentation. Traditional Mexican recipes such as tacos, quesadillas, enchiladas and barbacoa are consistently explored for options by some of the world’s foremost gourmet chefs. A celebration of spices and unique culinary trends, Mexican food is now dominating world cuisines.</p>
<div style="margin-top: 1em" class="possibly-related"><hr /><p><strong>Possibly related posts: (automatically generated)</strong></p><ul><li><a rel='related' href='http://vireja59.wordpress.com/2010/02/13/all-best-italian-dishes-recipes/' style='font-weight:bold'>All best Italian dishes recipes</a></li><li><a rel='related' href='http://vireja59.wordpress.com/2010/05/24/liver-dishes-recipes/' style='font-weight:bold'>Liver dishes recipes</a></li><li><a rel='related' href='http://vireja59.wordpress.com/2010/04/24/parsley-in-cooking/' style='font-weight:bold'>Parsley in cooking</a></li></ul></div>
my code:
my $tree = HTML::TreeBuilder->new();
$tree->parse($downloadedpage);
$tree->eof();
#the article is in the div with class "snap_preview"
@article = $tree->look_down(
sub{ $_[0]-> tag() eq 'div' and ($_[0]->attr('class') =~ /snap_preview/)}
);
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
使用您提供的确切代码和示例,
我根本没有收到任何错误。我的第一个猜测是 HTML 中有一些没有
class
属性的。
也许你需要
在那里写?
Using the exact code and example you gave,
I don't get any errors at all. My first guess would be that there are some
<div>
s in the HTML which don't have aclass
attribute.Maybe you need to write
there?