“内存不足”使用 perl 解析大型 (100 Mb) XML 文件时
我在解析大型 (100 Mb) XML 文件时出现错误“内存不足”,
use strict;
use warnings;
use XML::Twig;
my $twig=XML::Twig->new();
my $data = XML::Twig->new
->parsefile("divisionhouserooms-v3.xml")
->simplify( keyattr => []);
my @good_division_numbers = qw( 30 31 32 35 38 );
foreach my $property ( @{ $data->{DivisionHouseRoom}}) {
my $house_code = $property->{HouseCode};
print $house_code, "\n";
my $amount_of_bedrooms = 0;
foreach my $division ( @{ $property->{Divisions}->{Division} } ) {
next unless grep { $_ eq $division->{DivisionNumber} } @good_division_numbers;
$amount_of_bedrooms += $division->{DivisionQuantity};
}
open my $fh, ">>", "Result.csv" or die $!;
print $fh join("\t", $house_code, $amount_of_bedrooms), "\n";
close $fh;
}
我该如何解决此错误问题?
I have an error "Out of memory" while parsing large (100 Mb) XML file
use strict;
use warnings;
use XML::Twig;
my $twig=XML::Twig->new();
my $data = XML::Twig->new
->parsefile("divisionhouserooms-v3.xml")
->simplify( keyattr => []);
my @good_division_numbers = qw( 30 31 32 35 38 );
foreach my $property ( @{ $data->{DivisionHouseRoom}}) {
my $house_code = $property->{HouseCode};
print $house_code, "\n";
my $amount_of_bedrooms = 0;
foreach my $division ( @{ $property->{Divisions}->{Division} } ) {
next unless grep { $_ eq $division->{DivisionNumber} } @good_division_numbers;
$amount_of_bedrooms += $division->{DivisionQuantity};
}
open my $fh, ">>", "Result.csv" or die $!;
print $fh join("\t", $house_code, $amount_of_bedrooms), "\n";
close $fh;
}
What i can do to fix this error issue?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
处理不适合内存的大型 XML 文件是
XML::Twig
< /a> 广告:问题中发布的代码根本没有利用
XML::Twig
的优势(使用simplify
方法并不比XML::Simple
)。代码中缺少的是“
twig_handlers
”或“twig_roots
”,它们本质上会导致解析器高效地关注 XML 文档的相关部分。如果没有看到 XML,很难判断是否逐块处理文档或<一个href="http://search.cpan.org/~mirod/XML-Twig-3.38/Twig_pm.slow#Processing_just_parts_of_an_XML_document">仅选择部分是可行的方法,但任何一个都应该解决这个问题。
因此,代码应如下所示(逐块演示):
Handling large XML files that don't fit in memory is something that
XML::Twig
advertises:The code posted in the question isn't making use of the strength of
XML::Twig
at all (using thesimplify
method doesn't make it much better thanXML::Simple
).What's missing from the code are the '
twig_handlers
' or 'twig_roots
', which essentially cause the parser to focus on relevant portions of the XML document memory-efficiently.It's difficult to say without seeing the XML whether processing the document chunk-by-chunk or just selected parts is the way to go, but either one should solve this issue.
So the code should look something like the following (chunk-by-chunk demo):
请参阅 逐块处理 XML 文档部分http://search.cpan.org/perldoc?XML%3a%3aTwig" rel="nofollow">XML::Twig 文档,它具体讨论了如何处理逐部分记录文档,允许处理大型 XML 文件。
See Processing an XML document chunk by chunk section of XML::Twig documentation, it specifically discuss how to process document part by part, allowing for large XML file processing.