perl 内存不足消息处理仅 64 个 XML 文件,每个文件 2MB - unix
我尝试了全局变量和 undef 、增加 unix 中的数据段空间、本地化变量,但仍然遇到相同的错误。我需要处理大约 750 个文件。有人可以帮忙吗?谢谢。我知道将整个文件读入字符串可能是一个问题。但我不确定还有其他方法。但仍然当我将字符串声明为全局并使其为 ="" 时。应该在下一次迭代中释放内存吗?
foreach my $file_name (@dir_contents)
{
if(-f "rawdata/$file_name")
{
$xmlres="";
eval {
while(<FILE>)
{
$xmlres.=$_;
}
close FILE;
***$doc=$parser->parsestring($xmlres);***
foreach my $node($doc->getElementsByTagName("nam1"))
{
foreach my $tnode($node->getElementsByTagName(("name2")))
{
//processing
}
}
}
} }
I tried globalising variables and undef , increasing data segment space in unix , localising variable , but still getting the same error. I need to process around 750 files .Can anyone help? Thanks. I know reading the entire file into string may be a problem. But I am not sure of anyother ways. But still as i declare the string as global and making it ="" . shoulnd tht release memory in next iterations ?
foreach my $file_name (@dir_contents)
{
if(-f "rawdata/$file_name")
{
$xmlres="";
eval {
while(<FILE>)
{
$xmlres.=$_;
}
close FILE;
***$doc=$parser->parsestring($xmlres);***
foreach my $node($doc->getElementsByTagName("nam1"))
{
foreach my $tnode($node->getElementsByTagName(("name2")))
{
//processing
}
}
}
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
首先,样式注释有用且正确,并且会有所帮助。但是,如果您需要处理 1.5Gb 的 XML,则需要更好地管理内存。
XML::DOM
不会自动释放它使用的空间。这是它年龄的标志,较新的模块可以更好地管理内存,并且往往会自动执行此操作(我也使用XML::LibXML
来执行此操作,我也强烈推荐它)。主要是,当您使用完 DOM 树后,您需要调用
dispose
方法来清理它。这在XML::DOM
的 pod 概要中相当清楚。只需调用它就足以解决您的内存问题。 (从技术上讲,DOM 树往往包含循环引用,并且这些引用不会在简单的引用计数垃圾回收中自动管理。Perl 已使用弱引用来辅助,但看起来这尚未集成到XML::DOM< /code> 完全。仅仅取消引用树是不够的。)
我当然希望在其他地方改进样式。其他一些风格问题;我会尝试
Try::Tiny
来处理eval {}
,因为您似乎主要使用它来处理异常。另外,一些糟糕的经历告诉我,使用可靠的日期/时间解析器总是一个好主意。我使用DateTime::Format::*
中的那些。日期和时间解析中有许多奇怪的情况,这将节省您的代码行并使处理更加可靠。First of all, the style comments are useful and correct, and would help. However, if you need to process 1.5Gb of XML, you're going to need to manage memory a little bit better.
XML::DOM
doesn't automatically free space it used. This is a sign of its age, and newer modules manage memory much better, and tend to do this automatically (I also useXML::LibXML
, which does this, and I'd also recommend it highly).Mainly, you need to call the
dispose
method to clean out a DOM tree when you have finished with it. This is fairly clear in the pod synopsis forXML::DOM
. Just calling it may be enough to get your memory issues resolved. (Technically, DOM trees tend to contain cyclical references, and these are not automatically managed in simple referencing counting garbage collection. Perl has used weak references to assist, but it looks this hasn't been integrated inXML::DOM
fully. Simply unreferencing the tree is not enough.)I'd certainly look to improve style elsewhere. Some other style issues; I'd try
Try::Tiny
to handle theeval {}
, as you seem to be using it mainly for exception handling. Also, several bad experiences have taught me that using a solid date/time parser is always a good idea. I use the ones inDateTime::Format::*
. There are many odd cases in date and time parsing, and this will save you lines of code and make the handling more reliable.XML::DOM 很旧并且受到限制(更不用说我认为它不再被维护了)。尝试 XML::LibXML,它非常相似(它也实现了 DOM),除了更快、更节省内存、更强大(完整的 XPath 实现...)、维护...
XML::DOM is old and limited (not to mention that I don't think it's maintained any more). Try XML::LibXML, which is very similar (it also implements a DOM), except faster, more memory-frugal, more powerful (full XPath implementation...), maintained...