使用java将大量数据从数据库导出到.csv时出现问题
我,谢谢你的关注。
我想使用 java 将大量数据(实际上是 600 万行)导出到 .csv 文件。该应用程序是一个 swing 应用程序,带有 JPA,使用 toplink (ojdbc14)。
我尝试过使用:
BufferedWriter 随机存取文件 FileChannel
等,但内存消耗仍然很高,导致Java堆内存不足异常,尽管我将最大堆大小设置为800m(-Xmx800m)。
我的源代码的最后一个版本:
...(more lines of code)
FileChannel channel = getRandomAccessFile(tempFile).getChannel();
Object[][] data = pag.getRawData(); //Database data in a multidimentional array
for (int j = 0; j < data.length; j++) {
write(data[j], channel); //write data[j] (an array) into the channel
freeStringLine(data[j]); //data[j] is an array, this method sets all positions =null
data[j] = null;//sets reference in null
}
channel.force(false); //force writing in file system (HD)
channel.close(); //Close the channel
pag = null;
...(more lines of code)
private void write(Object[] row, FileChannel channel) throws DatabaseException {
if (byteBuff == null) {
byteBuff = ByteBuffer.allocateDirect(1024 * 1024);
}
for (int j = 0; j < row.length; j++) {
if (j < row.length - 1) {
if (row[j] != null) {
byteBuff.put(row[j].toString().getBytes());
}
byteBuff.put(SPLITER_BYTES);
} else {
if (row[j] != null) {
byteBuff.put(row[j].toString().getBytes());
}
}
}
byteBuff.put("\n".toString().getBytes());
byteBuff.flip();
try {
channel.write(byteBuff);
} catch (IOException ex) {
throw new DatabaseException("Imposible escribir en archivo temporal de exportación : " + ex.getMessage(), ex.getCause());
}
byteBuff.clear();
}
有 600 万行,我不想在创建文件时将该数据存储在内存中。我制作了许多临时文件(每个文件有 5000 行),并在该过程的最后,使用两个 FileChannel 将所有这些临时文件附加到一个文件中。然而,内存不足的异常是在加入之前启动的。
您现在有另一种导出大量数据的策略吗?
非常感谢您的任何答复。对不起我的英语,我正在进步xD
I, thank for your attention.
I want to export a lot of data, really a lot of data (6 million of rows) to a .csv file using java. The app is a swing application, with JPA, using toplink (ojdbc14).
I have tried to use:
BufferedWriter
RandomAccessFile
FileChannel
etc etc, but the consumption of memory remains very high, causing a Java Heap Out of Memory Exception, although I set the maximun heap size in 800m (-Xmx800m).
My last version of the souce code:
...(more lines of code)
FileChannel channel = getRandomAccessFile(tempFile).getChannel();
Object[][] data = pag.getRawData(); //Database data in a multidimentional array
for (int j = 0; j < data.length; j++) {
write(data[j], channel); //write data[j] (an array) into the channel
freeStringLine(data[j]); //data[j] is an array, this method sets all positions =null
data[j] = null;//sets reference in null
}
channel.force(false); //force writing in file system (HD)
channel.close(); //Close the channel
pag = null;
...(more lines of code)
private void write(Object[] row, FileChannel channel) throws DatabaseException {
if (byteBuff == null) {
byteBuff = ByteBuffer.allocateDirect(1024 * 1024);
}
for (int j = 0; j < row.length; j++) {
if (j < row.length - 1) {
if (row[j] != null) {
byteBuff.put(row[j].toString().getBytes());
}
byteBuff.put(SPLITER_BYTES);
} else {
if (row[j] != null) {
byteBuff.put(row[j].toString().getBytes());
}
}
}
byteBuff.put("\n".toString().getBytes());
byteBuff.flip();
try {
channel.write(byteBuff);
} catch (IOException ex) {
throw new DatabaseException("Imposible escribir en archivo temporal de exportación : " + ex.getMessage(), ex.getCause());
}
byteBuff.clear();
}
Being 6 millions of rows, I don't want to store that data in memory while the file is created. I made many temp files (wtih 5000 rows each one), and at the final of the process, append all those temp files in a single one, using two FileChannel. However, the exception for lack of memory is launched before the joining.
Do you now another strategy for export a lot of data?
Thanks a lot for any ansmwer. Sorry for my English, I'm improving xD
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
答案是使用“流”方法 - 即在滚动数据集时读取一行,写入一行。您需要将查询结果作为游标获取并迭代它,而不是获取整个结果集。
在 JPA 中,使用类似这样的代码:
这意味着内存中一次只有一行,它完全可以扩展到任意数量的行并且使用最少的内存(无论如何它更快)。
一次获取结果集中的所有行是一种方便的方法,适用于小型结果集(大多数情况下),但与往常一样,便利是有代价的,并且它并不适用于所有情况。
The answer is to use a "stream" approach - ie read one row, write one row as you scroll through the dataset. You'll need to get the query result as a cursor and iterate through it, not get the whole result set.
In JPA, use code something like this:
This means you only have one row in memory at a time, which is totally scalable to any number of rows and uses minimal memory (it's faster anyway).
Getting all rows at once in a result set is a convenience approach which works for small result set (which is most of the time), but as usual, convenience comes at a cost and it doesn't work in all situations.