将大型 XML 文件(100k 条记录)导入数据库

发布于 2024-12-03 05:49:20 字数 175 浏览 1 评论 0原文

我在解析 XML 时遇到了问题。它消耗 47% 的 CPU 并且速度非常慢。看起来 DOM 将 XML 加载到内存中,并从那里开始逐节点读取 XML 树。

我正在读取一个节点并将其转储到数据库。

我想要一个可以读取 XML 而不加载到内存中的解决方案。

我使用的是JDK1.4.2_05。

I am facing the problem the problem while parsing the XML. Its cosuming 47% of CPU and its very slow. It seems like DOM loads the XML into the memory and from there it starts reading the XML Tree node by node.

I am reading a node and dumping it to the Database.

I want a solution where I can read the XML without loading into the memory.

I am using JDK1.4.2_05.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

入画浅相思 2024-12-10 05:49:20

寻找 SAX 解析器,这是使用 XML 执行某些操作而无需在内存中构建完整 DOM 的唯一方法。有一些限制,但也许它会满足您的需求。

Look for SAX parser, it's only way to do something with XML without build of full DOM in memory. There are some limitations but maybe it will suit your needs.

时光病人 2024-12-10 05:49:20

尝试 StAX 或 SAX。

Try StAX or SAX.

埋情葬爱 2024-12-10 05:49:20

Nux 项目包括 StreamingPathFilter 类。通过此类,您可以将 SAX 的流媒体功能和低内存占用与 DOM 的易用性结合起来。

但只有当您的 XML 文档具有类似记录的结构时,这才有效。例如,很多 元素。

(以下示例取自 Nux 网站并由我修改)

首先定义如何处理一条记录:

StreamingTransform myTransform = new StreamingTransform() {
  public Nodes transform(Element person) {
    // Process person element, i.e. store it in a database
    return new Nodes(); // mark element as subject to garbage collection
  }
}; 

然后创建一个 StreamingPathFilter 传递与您的记录节点匹配的 XPath 表达式。

// parse document with a filtering Builder
NodeFactory factory = new StreamingPathFilter("/persons/person", null).
                            createNodeFactory(null, myTransform);
new Builder(factory).build(new File("/tmp/persons.xml"));

Nux 库似乎不再维护了。但它仍然有用。

The Nux project includes the StreamingPathFilter class. With this class you can combine the streaming facilities and low memory footprint of SAX with the ease of use of DOM.

But this works only if your XML document has a record like structure. E.g. lots of <person/> elements.

(Following examples are taken from the Nux website and modified by me)

First you define how to handle one record:

StreamingTransform myTransform = new StreamingTransform() {
  public Nodes transform(Element person) {
    // Process person element, i.e. store it in a database
    return new Nodes(); // mark element as subject to garbage collection
  }
}; 

Then you create a StreamingPathFilter passing an XPath expression which matches to your record nodes.

// parse document with a filtering Builder
NodeFactory factory = new StreamingPathFilter("/persons/person", null).
                            createNodeFactory(null, myTransform);
new Builder(factory).build(new File("/tmp/persons.xml"));

The Nux library seems not maintained any more. But it is still usefull.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文