为什么 Perl 的 XML::RSS::Parser 抱怨“结束标记不匹配”?
我对这一切完全是菜鸟,但不久前我用 Perl 编写了一个小脚本来解析 RSS 提要。它是这样开始的:
use strict;
use XML::RSS::Parser;
use Data::Dumper;
my $url = "http://www.livenation.co.uk/Venue/159/Southampton-Guildhall-tickets/RSS";
my $parser = XML::RSS::Parser->new();
my $feed = $parser->parse_uri($url);
print Dumper( $feed );
print $parser->errstr();
它曾经有效(不记得我上次检查它是什么时候,但几周前它似乎有效),但今天它不再有效。 RSS 提要处于活动状态,并通过 feedvalidator.org 传递。 errstr()
返回这样的内容:
End tag mismatch (title != description) [Ln: 67, Col: 95]
我不太确定这是如何发生的或者这意味着什么。 RSS 的来源是这样的:
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
我不知道以前是否有所不同。我尝试了其他一些原子提要,解析器似乎在所有这些提要上都崩溃了。但问题是,系统管理员要到截止日期后才回来,所以我必须使用可用的资源。
更新:
有趣。它在我的 w7 64 *active perl) 和 ubuntu (32, 9.10) 安装上都崩溃了。不过在我朋友的 ubuntu 上运行良好(相同,9.10)。我尝试重新安装模块,但这似乎没有改变任何东西。
I'm a complete noob in all of this, but sometime ago I wrote a little script in Perl to parse an RSS feed. It starts like this:
use strict;
use XML::RSS::Parser;
use Data::Dumper;
my $url = "http://www.livenation.co.uk/Venue/159/Southampton-Guildhall-tickets/RSS";
my $parser = XML::RSS::Parser->new();
my $feed = $parser->parse_uri($url);
print Dumper( $feed );
print $parser->errstr();
It used to work (can't remember when I last checked it, but a few weeks ago it seemed to work), but today it no longer does. The RSS feed is alive, and passes through feedvalidator.org. The errstr()
returns this:
End tag mismatch (title != description) [Ln: 67, Col: 95]
I'm not really sure how this happened or what this means. The source of the RSS reads:
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
I don't know if it was different before. I tried a few other atom feeds and the parser seems to break on all of them. The problem is though, sysadmin is not back until after the deadline, so I have to use what's available.
UPDATE:
interesting. it breaks on both my w7 64 *active perl) and ubuntu (32, 9.10) installs. works fine on my friend's ubuntu though (same, 9.10). i tried reinstalling the modules, but that doesn't seem to change anything.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
现在对我有用。也许 RSS 源中有一段时间存在损坏的 XML?该错误似乎表明提要中指定行的标签不匹配。
如果仍然发生,请尝试使用curl(或类似的)来显示原始XML 并检查它是否有错误。
Works for me just now. Perhaps the RSS feed had bad corrupt XML in it for a while? The error seems to point to miss-matched tags in the feed at the line indicated.
If it is still happening try using curl (or similar) to display the raw XML and check it for errors.
全新安装 XML::RSS::Parser 及其使用的模块(它只是 XML::Elemental 上的提要结构的包装器,它使用 XML:: SAX 解析等)。
然而,Firefox 表明该文件有效。
XML::Tiny 似乎能够解析该文件,因此只需做一些工作即可对其进行转换。
I'm getting the same error (same message and line number) with a fresh install of XML::RSS::Parser and the modules it uses (it's just a wrapper for feed structure over XML::Elemental, which uses XML::SAX to parse, etc).
Firefox, however, indicates that the file is valid.
XML::Tiny seems to be able to parse the file, so that may be enough with a little work to transform it.
您需要查看实际来源才能了解发生了什么。不仅仅是“在浏览器中访问网站”,而是查看程序看到的实际源代码。谁知道发生了什么事?出现某种故障,只发送了一半的文档?因为不是同一个客户端而发送了不同的源?
每次程序运行时我都会转储 XML,并在出现错误时检查它。
You need to look at the actual source to see what's going on. Not just "go to the website in a browser", but look at the actual source the program is seeing. Who knows what happened? Some kind of glitch where only half the document got sent? Different source sent because it's not the same client?
I would do a dump of the XML every time every time the program runs and examine it when there are errors.