perl 内存不足消息处理仅 64 个 XML 文件,每个文件 2MB - unix

发布于 2024-11-19 12:29:08 字数 551 浏览 2 评论 0原文

我尝试了全局变量和 undef 、增加 unix 中的数据段空间、本地化变量,但仍然遇到相同的错误。我需要处理大约 750 个文件。有人可以帮忙吗?谢谢。我知道将整个文件读入字符串可能是一个问题。但我不确定还有其他方法。但仍然当我将字符串声明为全局并使其为 ="" 时。应该在下一次迭代中释放内存吗?

foreach my $file_name (@dir_contents) 
{

if(-f "rawdata/$file_name")
{
$xmlres="";
eval {

while(<FILE>)
{
    $xmlres.=$_;
}
close FILE;


 ***$doc=$parser->parsestring($xmlres);***  
foreach my $node($doc->getElementsByTagName("nam1"))
{
    foreach my $tnode($node->getElementsByTagName(("name2")))
    {
        //processing
    }
}
}

} }

I tried globalising variables and undef , increasing data segment space in unix , localising variable , but still getting the same error. I need to process around 750 files .Can anyone help? Thanks. I know reading the entire file into string may be a problem. But I am not sure of anyother ways. But still as i declare the string as global and making it ="" . shoulnd tht release memory in next iterations ?

foreach my $file_name (@dir_contents) 
{

if(-f "rawdata/$file_name")
{
$xmlres="";
eval {

while(<FILE>)
{
    $xmlres.=$_;
}
close FILE;


 ***$doc=$parser->parsestring($xmlres);***  
foreach my $node($doc->getElementsByTagName("nam1"))
{
    foreach my $tnode($node->getElementsByTagName(("name2")))
    {
        //processing
    }
}
}

}
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

○愚か者の日 2024-11-26 12:29:08

首先,样式注释有用且正确,并且会有所帮助。但是,如果您需要处理 1.5Gb 的 XML,则需要更好地管理内存。

XML::DOM 不会自动释放它使用的空间。这是它年龄的标志,较新的模块可以更好地管理内存,并且往往会自动执行此操作(我也使用 XML::LibXML 来执行此操作,我也强烈推荐它)。

主要是,当您使用完 DOM 树后,您需要调用 dispose 方法来清理它。这在 XML::DOM 的 pod 概要中相当清楚。只需调用它就足以解决您的内存问题。 (从技术上讲,DOM 树往往包含循环引用,并且这些引用不会在简单的引用计数垃圾回收中自动管理。Perl 已使用弱引用来辅助,但看起来这尚未集成到 XML::DOM< /code> 完全。仅仅取消引用树是不够的。)

我当然希望在其他地方改进样式。其他一些风格问题;我会尝试 Try::Tiny 来处理 eval {},因为您似乎主要使用它来处理异常。另外,一些糟糕的经历告诉我,使用可靠的日期/时间解析器总是一个好主意。我使用 DateTime::Format::* 中的那些。日期和时间解析中有许多奇怪的情况,这将节省您的代码行并使处理更加可靠。

First of all, the style comments are useful and correct, and would help. However, if you need to process 1.5Gb of XML, you're going to need to manage memory a little bit better.

XML::DOM doesn't automatically free space it used. This is a sign of its age, and newer modules manage memory much better, and tend to do this automatically (I also use XML::LibXML, which does this, and I'd also recommend it highly).

Mainly, you need to call the dispose method to clean out a DOM tree when you have finished with it. This is fairly clear in the pod synopsis for XML::DOM. Just calling it may be enough to get your memory issues resolved. (Technically, DOM trees tend to contain cyclical references, and these are not automatically managed in simple referencing counting garbage collection. Perl has used weak references to assist, but it looks this hasn't been integrated in XML::DOM fully. Simply unreferencing the tree is not enough.)

I'd certainly look to improve style elsewhere. Some other style issues; I'd try Try::Tiny to handle the eval {}, as you seem to be using it mainly for exception handling. Also, several bad experiences have taught me that using a solid date/time parser is always a good idea. I use the ones in DateTime::Format::*. There are many odd cases in date and time parsing, and this will save you lines of code and make the handling more reliable.

琉璃梦幻 2024-11-26 12:29:08

XML::DOM 很旧并且受到限制(更不用说我认为它不再被维护了)。尝试 XML::LibXML,它非常相似(它也实现了 DOM),除了更快、更节省内存、更强大(完整的 XPath 实现...)、维护...

XML::DOM is old and limited (not to mention that I don't think it's maintained any more). Try XML::LibXML, which is very similar (it also implements a DOM), except faster, more memory-frugal, more powerful (full XPath implementation...), maintained...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文