使用 xml::twig 解析 xml 文件
我有以下一个大型 xml 文件,其中包含以下格式的实体: 有人可以帮助我如何使用 xml::twig 处理它吗?
<root >
<entity id="1" last_modified="2011-10-1">
<entity_title> title</entity_title>
<entity_description>description </entity_description>
<entity_x> x </entity_x>
<entity_y> x </entity_y>
<entity_childs>
<child flag="1">
<child_name>name<child_name>
<child_type>type1</child_type>
<child_x> some_text</child__x>
</child>
<child flag="1">
<child_name>name1<child_name>
<child_type>type2</child_type>
<child_x> some_text</child__x>
</child>
<entity_sibling>
<family value="1" name="xc">fed</ext_ref>
<family value="1" name="df">ff</ext_ref>
</entity_sibling>
<\root>
;
我运行下面的代码并内存不足!
my $file = shift ||die $!;
my $twig = XML::Twig->new();
my $config = $twig->parsefile( $file )->simplify();
print Dumper( $config );
I have the following a large xml file which have entities on the below format :
could someone help how can i proccess it with xml::twig ?
<root >
<entity id="1" last_modified="2011-10-1">
<entity_title> title</entity_title>
<entity_description>description </entity_description>
<entity_x> x </entity_x>
<entity_y> x </entity_y>
<entity_childs>
<child flag="1">
<child_name>name<child_name>
<child_type>type1</child_type>
<child_x> some_text</child__x>
</child>
<child flag="1">
<child_name>name1<child_name>
<child_type>type2</child_type>
<child_x> some_text</child__x>
</child>
<entity_sibling>
<family value="1" name="xc">fed</ext_ref>
<family value="1" name="df">ff</ext_ref>
</entity_sibling>
<\root>
;
I run the below code and get out of memory !
my $file = shift ||die $!;
my $twig = XML::Twig->new();
my $config = $twig->parsefile( $file )->simplify();
print Dumper( $config );
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
XML::Twig 能够以两种模式运行:小型文档或大型文档。您说它很大,所以您需要文档概要中列出的第二种方法。
处理大型文档的示例如下所示:
所以我认为您想使用该方法,而不是您当前正在使用的方法,该方法被标记为仅适用于小文档。
XML::Twig is able to run in two modes, for small or for large documents. You say it's large, so you want the second approach listed in the documentation synopsis.
The example for processing huge documents goes like this:
So I think you want to use that method, not the one you're currently using which is noted as only for small documents.
是的,XML::Twig 中没有魔法,如果您编写
$twig->parsefile( $file )->simplify();
那么它会将整个文档加载到内存中。恐怕您将不得不投入一些工作才能获得您想要的部分并丢弃其余部分。查看概要或有关详细信息,请参阅文档顶部的 XML::Twig 101 部分。
这正在成为常见问题解答,因此我已将上面的简介添加到模块的文档中。
在这种特殊情况下,您可能希望在实体上设置处理程序(使用 twig_handlers 选项),处理每个实体,然后使用刷新将其丢弃> 如果您要更新文件,或者
purge
如果您只想从中提取数据。所以代码的架构应该是这样的:
Yep, there is no magic in XML::Twig, if you write
$twig->parsefile( $file )->simplify();
then it will load the entire document in memory. I am afraid you will have to put some work into it to get just the bits you want and discard the rest. Look at the synopsys orthe XML::Twig 101 section at the top of the docs for more information.
This is becoming a FAQ, so I have added the blurb above to the docs of the module.
In this particular case you probably want to set a handler (using the
twig_handlers
option) onentity
, process each entity and then discard it by usingflush
if you are updating the file, orpurge
if you just want to extract data from it.So the architecture of the code should look like this: