将大型 XML 文件(100k 条记录)导入数据库
我在解析 XML 时遇到了问题。它消耗 47% 的 CPU 并且速度非常慢。看起来 DOM 将 XML 加载到内存中,并从那里开始逐节点读取 XML 树。
我正在读取一个节点并将其转储到数据库。
我想要一个可以读取 XML 而不加载到内存中的解决方案。
我使用的是JDK1.4.2_05。
I am facing the problem the problem while parsing the XML. Its cosuming 47% of CPU and its very slow. It seems like DOM loads the XML into the memory and from there it starts reading the XML Tree node by node.
I am reading a node and dumping it to the Database.
I want a solution where I can read the XML without loading into the memory.
I am using JDK1.4.2_05.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
寻找 SAX 解析器,这是使用 XML 执行某些操作而无需在内存中构建完整 DOM 的唯一方法。有一些限制,但也许它会满足您的需求。
Look for SAX parser, it's only way to do something with XML without build of full DOM in memory. There are some limitations but maybe it will suit your needs.
尝试 StAX 或 SAX。
Try StAX or SAX.
Nux 项目包括 StreamingPathFilter 类。通过此类,您可以将 SAX 的流媒体功能和低内存占用与 DOM 的易用性结合起来。
但只有当您的 XML 文档具有类似记录的结构时,这才有效。例如,很多
元素。(以下示例取自 Nux 网站并由我修改)
首先定义如何处理一条记录:
然后创建一个
StreamingPathFilter
传递与您的记录节点匹配的 XPath 表达式。Nux 库似乎不再维护了。但它仍然有用。
The Nux project includes the StreamingPathFilter class. With this class you can combine the streaming facilities and low memory footprint of SAX with the ease of use of DOM.
But this works only if your XML document has a record like structure. E.g. lots of
<person/>
elements.(Following examples are taken from the Nux website and modified by me)
First you define how to handle one record:
Then you create a
StreamingPathFilter
passing an XPath expression which matches to your record nodes.The Nux library seems not maintained any more. But it is still usefull.