当前位置：文江博客话题详情

将大型 XML 文件（100k 条记录）导入数据库

发布于 2024-12-03 05:49:20 字数 175 浏览 1 评论 0原文

我在解析 XML 时遇到了问题。它消耗 47% 的 CPU 并且速度非常慢。看起来 DOM 将 XML 加载到内存中，并从那里开始逐节点读取 XML 树。

我正在读取一个节点并将其转储到数据库。

我想要一个可以读取 XML 而不加载到内存中的解决方案。

我使用的是JDK1.4.2_05。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

入画浅相思 2024-12-10 05:49:20

寻找 SAX 解析器，这是使用 XML 执行某些操作而无需在内存中构建完整 DOM 的唯一方法。有一些限制，但也许它会满足您的需求。

回复收藏 0 原文

时光病人 2024-12-10 05:49:20

尝试 StAX 或 SAX。

回复收藏 0 原文

埋情葬爱 2024-12-10 05:49:20

Nux 项目包括 StreamingPathFilter 类。通过此类，您可以将 SAX 的流媒体功能和低内存占用与 DOM 的易用性结合起来。

但只有当您的 XML 文档具有类似记录的结构时，这才有效。例如，很多元素。

（以下示例取自 Nux 网站并由我修改）

首先定义如何处理一条记录：

StreamingTransform myTransform = new StreamingTransform() {
  public Nodes transform(Element person) {
    // Process person element, i.e. store it in a database
    return new Nodes(); // mark element as subject to garbage collection
  }
};

然后创建一个 StreamingPathFilter 传递与您的记录节点匹配的 XPath 表达式。

// parse document with a filtering Builder
NodeFactory factory = new StreamingPathFilter("/persons/person", null).
                            createNodeFactory(null, myTransform);
new Builder(factory).build(new File("/tmp/persons.xml"));

Nux 库似乎不再维护了。但它仍然有用。

The Nux project includes the StreamingPathFilter class. With this class you can combine the streaming facilities and low memory footprint of SAX with the ease of use of DOM.

But this works only if your XML document has a record like structure. E.g. lots of <person/> elements.

(Following examples are taken from the Nux website and modified by me)

First you define how to handle one record:

StreamingTransform myTransform = new StreamingTransform() {
  public Nodes transform(Element person) {
    // Process person element, i.e. store it in a database
    return new Nodes(); // mark element as subject to garbage collection
  }
};

Then you create a StreamingPathFilter passing an XPath expression which matches to your record nodes.

// parse document with a filtering Builder
NodeFactory factory = new StreamingPathFilter("/persons/person", null).
                            createNodeFactory(null, myTransform);
new Builder(factory).build(new File("/tmp/persons.xml"));

The Nux library seems not maintained any more. But it is still usefull.

回复收藏 0 原文

~没有更多了~