使用 perl 处理大型 xml 文件
我有一个大约 200MB 大小的 XML 文件,我希望逐行提取选定的信息。
我用 perl 编写了一个脚本,使用模块 XML::LibXML 来解析文件内容,然后循环内容并逐行提取信息。这是无效的,因为它将整个文件读入内存,但我喜欢 LibXML,因为我可以使用所需信息的 XPath 位置。
我可以获得有关如何使我的代码更有效的建议吗?
通过搜索,我已经了解了 XML::SAX 和 XML::LibXML::SAX,但我找不到解释其用法的文档,而且它们似乎不包含任何类型的 XPath 寻址结构。
I have an XML file which is about 200MB in size, i wish to extract selected information on a line by line bases.
I have written a script with perl using the module XML::LibXML to parse the file contents in and then loop the contents and extract the information line by line. This is ineffective as it reads in the whole file to memory, but I like LibXML as I can use the XPath locations of the information i require.
Can I get suggestions for ways to make my code more effective.
Through searching i have been made aware of XML::SAX and XML::LibXML::SAX but i cannot find documentation which explains the usage and they don't seem to include any type of XPath addressing structure.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您是否考虑过 XML::Twig 模块,它对于大文件效率更高处理,如 CPAN 模块描述中所述:
NAME
概要
Have you considered the XML::Twig module, which is much more efficient for large file processing, as it states in the CPAN module description:
NAME
SYNOPSIS
我在
XML::Twig
方面运气不错,但最终得到了 XML::LibXML::Reader 速度更快...如果您需要使用 XPath,您还可以检查XML::LibXML::Pattern
。I had some luck with
XML::Twig
but ended up with XML::LibXML::Reader which is much faster... You may also checkXML::LibXML::Pattern
if you need to use XPath.