从大表中检索所有记录时如何避免 OOM(内存不足)错误?
我的任务是将一个巨大的表转换为自定义 XML 文件。 我将使用 Java 来完成这项工作。
如果我只是发出“SELECT * FROM customer”,它可能会返回大量数据,最终导致 OOM。 我想知道,有没有一种方法可以在记录可用后立即处理该记录,并在 sql 检索过程中从内存中删除该记录?
--- 2009 年 7 月 13 日编辑
让我详细阐述我的问题。 我有 1 台数据库服务器和 1 台应用程序服务器。 当我在应用程序中发出选择查询时,数据将从数据库服务器传输到应用程序服务器。
我相信(如果我错了,请纠正我)ResultSet 将需要等到收到查询中的所有记录。 即使我们将获取大小设置为 4,对于 1000 条记录的表,我们最终在应用程序服务器的堆内存中仍然有 1000 条记录,这是正确的吗? 获取大小仅影响从/到数据库服务器的往返次数。
我的问题是,如何在 4 个(或任何数量)记录到达应用程序服务器后立即开始处理它,并对其进行处理以释放应用程序服务器中的内存?
I am given a task to convert a huge table to custom XML file. I will be using Java for this job.
If I simply issue a "SELECT * FROM customer", it may return huge amount of data that eventually causing OOM. I wonder, is there a way i can process the record immediately once it become available, and remove the record from memory after that during sql retrieving process?
--- edited on 13 Jul 2009
Let me elaborate my question. I have 1 db server and 1 application server.
When I issue a select query in application, the data will travel from db server to app server.
I believe (correct me if I am wrong) ResultSet will need to wait until receiving all records in the query. Even if we set fetch size as 4, for a 1000-record table, we still end up having 1000 records in heap memory of app server, is it correct? Fetch size only affect the number of round trip from/to db server.
My question is, how to start processing on that 4 (or any number) records immediately after it's arrival to app server, and dispose it to free up memory in app server?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
通过更多信息,我可以获得更有用的答案。
如果您使用的是 MySQL:
来自 http://www.oracle .com/technology/tech/java/sqlj_jdbc/htdocs/jdbc_faq.html:
With a little more information I can get a more helpful answer.
If you are using MySQL:
from http://www.oracle.com/technology/tech/java/sqlj_jdbc/htdocs/jdbc_faq.html:
我认为您可以使用与这个相同的解决方案。 可滚动的结果集。
I think you could use the same solution as this one. A scrollable resultset.
如果您使用 JDBC,则可以使用带有游标的 ResultSet,一次遍历一条记录。 您需要确保将 XML 一次写入一个文件,而不是使用 DOM 构建 XML。
If you are using JDBC you can use a ResultSet with a cursor which you iterate through one record at a time. You need to makes sure then that you write your XML out to a file one record at a time rather than using DOM to build the XML.
我从经验中学到的一条经验法则是,永远不要将数据库中的所有数据带到应用程序服务器。 您可以做的一件事是实施一个过程来分页数据。
您可以获取包含大约 1000-5000 条记录的一页数据,对其进行处理,然后再次获取下一页的数据。
One rule of thumb that I've learnt from my experience is that you NEVER bring ALL the data from the database to your application server. One thing you can do is implement a procedure to page your data.
You can bring one page of data containing around 1000-5000 records, process them, then again fetch the data for the next page.
导出整个表的概念。
(专家注意:我知道它的缺点。)编辑 缺点(感谢@JS):
A concept for exporting the entire table.
(Note to experts: I'm aware of its shortcomings.)Edit The shortcomings (thanks @J.S.):
OOM 错误发生在哪个阶段,是在数据检索时还是在将数据处理为 XML 文件时?
如果是数据检索,则批量获取数据。 首先获取总行数,按主键对选择进行排序,并将所选行限制为可咀嚼的大小。
如果是在创建XML文件时,则将每个客户的XML节点发送到System.out.println,不要将其保存在内存中。 通过命令行启动程序并将所有输出重定向到文件;
当您循环浏览记录时,所有内容都会保存在文件中。
At which stage is the OOM error occurring, is it on data retrieval or processing data to XML file?
If its data retrieval, get the data in batches. Get the total number of rows first, order the selects by the primary key and limit the rows selected to chewable sizes.
If its at creating the XML file, send the XML node of each customer to System.out.println, don't hold it in memory. Launch the program via commad line and redirect all output to a file;
As you loop through the record all is saved in the file.