在 Java 中从数据库写入 (zip) 文件的最节省内存的方法是什么?
我的程序足够快,但我宁愿为了内存优化而放弃这个速度,因为一个用户的最大内存使用量高达 300 MB,这意味着很少有人会不断导致应用程序崩溃。我发现的大多数答案都与速度优化有关,其他答案只是一般性的(“如果直接从数据库写入内存,则不应该有太多内存使用”)。好吧,看来是有:)我正在考虑不发布代码,这样我就不会“锁定”某人的想法,但另一方面,如果你看不到我已经做了什么,我可能会浪费你的时间所以这里是:
// First I get the data from the database in a way that I think can't be more
// optimized since i've done some testing and it seems to me that the problem
// isn't in the RS and setting FetchSize and/or direction does not help.
public static void generateAndWriteXML(String query, String oznaka, BufferedOutputStream bos, Connection conn)
throws Exception
{
ResultSet rs = null;
Statement stmt = null;
try
{
stmt = conn.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY);
rs = stmt.executeQuery(query);
writeToZip(rs, oznaka, bos);
} finally
{
ConnectionManager.close(rs, stmt, conn);
}
}
// then I open up my streams. In the next method I'll generate an XML from the
// ResultSet and I want that XML to be saved in an XML, but since its size takes up
// to 300MB, I want it to be saved in a ZIP. I'm thinking that maybe by writing
// first to file, then to zip I could get a slower but more efficient program.
private static void writeToZip(ResultSet rs, String oznaka, BufferedOutputStream bos)
throws SAXException, SQLException, IOException
{
ZipEntry ze = new ZipEntry(oznaka + ".xml");
ZipOutputStream zos = new ZipOutputStream(bos);
zos.putNextEntry(ze);
OutputStreamWriter writer = new OutputStreamWriter(zos, "UTF8");
writeXMLToWriter(rs, writer);
try
{
writer.close();
} catch (IOException e)
{
}
try
{
zos.closeEntry();
} catch (IOException e)
{
}
try
{
zos.flush();
} catch (IOException e)
{
}
try
{
bos.close();
} catch (IOException e)
{
}
}
// And finally, the method that does the actual generating and writing.
// This is the second point I think I could do the memory optimization since the
// DataWriter is custom and it extends a custom XMLWriter that extends the standard
// org.xml.sax.helpers.XMLFilterImpl I've tried with flushing at points in program,
// but the memory that is occupied remains the same, it only takes longer.
public static void writeXMLToWriter(ResultSet rs, Writer writer) throws SAXException, SQLException, IOException
{
//Set up XML
DataWriter w = new DataWriter(writer);
w.startDocument();
w.setIndentStep(2);
w.startElement(startingXMLElement);
// Get the metadata
ResultSetMetaData meta = rs.getMetaData();
int count = meta.getColumnCount();
// Iterate over the set
while (rs.next())
{
w.startElement(rowElement);
for (int i = 0; i < count; i++)
{
Object ob = rs.getObject(i + 1);
if (rs.wasNull())
{
ob = null;
}
// XML elements are repeated so they could benefit from caching
String colName = meta.getColumnLabel(i + 1).intern();
if (ob != null)
{
if (ob instanceof Timestamp)
{
w.dataElement(colName, Util.formatDate((Timestamp) ob, dateFormat));
}
else if (ob instanceof BigDecimal)
{
// Possible benefit from writing ints as strings and interning them
w.dataElement(colName, Util.transformToHTML(new Integer(((BigDecimal) ob).intValue())));
}
else
{ // there's enough of data that's repeated to validate the use of interning
w.dataElement(colName, ob.toString().intern());
}
}
else
{
w.emptyElement(colName);
}
}
w.endElement(rowElement);
}
w.endElement(startingXMLElement);
w.endDocument();
}
编辑:这是一个内存使用情况的示例(使用 VisualVM 获取):
编辑2:数据库是Oracle 10.2.0.4。我设置了 ResultSet.TYPE_FORWARD_ONLY 并获得了最大 50MB 的使用量!正如我在评论中所说,我会密切关注这一点,但它确实很有希望。
EDIT3:似乎还有另一种可能的优化可用。正如我所说,我正在生成一个 XML,这意味着大量数据被重复(如果没有别的,那么就是标签),这意味着 String.intern() 可以在这里帮助我,当我测试这个时我会发回来。
My program is fast enough, but I'd rather give up that speed for memory optimization since one user's maximum memory usage goes up to 300 MB meaning few of them could constantly crash the application. Most of the answers I found were related to speed optimization, and other were just general ("if you write directly from a database to memory there shouldn't be much memory usage"). Well, it seems there is :) I was thinking about not posting code so I wouldn't "lock" someone's ideas, but on the other hand, I could be wasting your time if you don't see what I've already done so here it is:
// First I get the data from the database in a way that I think can't be more
// optimized since i've done some testing and it seems to me that the problem
// isn't in the RS and setting FetchSize and/or direction does not help.
public static void generateAndWriteXML(String query, String oznaka, BufferedOutputStream bos, Connection conn)
throws Exception
{
ResultSet rs = null;
Statement stmt = null;
try
{
stmt = conn.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY);
rs = stmt.executeQuery(query);
writeToZip(rs, oznaka, bos);
} finally
{
ConnectionManager.close(rs, stmt, conn);
}
}
// then I open up my streams. In the next method I'll generate an XML from the
// ResultSet and I want that XML to be saved in an XML, but since its size takes up
// to 300MB, I want it to be saved in a ZIP. I'm thinking that maybe by writing
// first to file, then to zip I could get a slower but more efficient program.
private static void writeToZip(ResultSet rs, String oznaka, BufferedOutputStream bos)
throws SAXException, SQLException, IOException
{
ZipEntry ze = new ZipEntry(oznaka + ".xml");
ZipOutputStream zos = new ZipOutputStream(bos);
zos.putNextEntry(ze);
OutputStreamWriter writer = new OutputStreamWriter(zos, "UTF8");
writeXMLToWriter(rs, writer);
try
{
writer.close();
} catch (IOException e)
{
}
try
{
zos.closeEntry();
} catch (IOException e)
{
}
try
{
zos.flush();
} catch (IOException e)
{
}
try
{
bos.close();
} catch (IOException e)
{
}
}
// And finally, the method that does the actual generating and writing.
// This is the second point I think I could do the memory optimization since the
// DataWriter is custom and it extends a custom XMLWriter that extends the standard
// org.xml.sax.helpers.XMLFilterImpl I've tried with flushing at points in program,
// but the memory that is occupied remains the same, it only takes longer.
public static void writeXMLToWriter(ResultSet rs, Writer writer) throws SAXException, SQLException, IOException
{
//Set up XML
DataWriter w = new DataWriter(writer);
w.startDocument();
w.setIndentStep(2);
w.startElement(startingXMLElement);
// Get the metadata
ResultSetMetaData meta = rs.getMetaData();
int count = meta.getColumnCount();
// Iterate over the set
while (rs.next())
{
w.startElement(rowElement);
for (int i = 0; i < count; i++)
{
Object ob = rs.getObject(i + 1);
if (rs.wasNull())
{
ob = null;
}
// XML elements are repeated so they could benefit from caching
String colName = meta.getColumnLabel(i + 1).intern();
if (ob != null)
{
if (ob instanceof Timestamp)
{
w.dataElement(colName, Util.formatDate((Timestamp) ob, dateFormat));
}
else if (ob instanceof BigDecimal)
{
// Possible benefit from writing ints as strings and interning them
w.dataElement(colName, Util.transformToHTML(new Integer(((BigDecimal) ob).intValue())));
}
else
{ // there's enough of data that's repeated to validate the use of interning
w.dataElement(colName, ob.toString().intern());
}
}
else
{
w.emptyElement(colName);
}
}
w.endElement(rowElement);
}
w.endElement(startingXMLElement);
w.endDocument();
}
EDIT: Here is an example of memory usage (taken with visualVM):
EDIT2: The database is Oracle 10.2.0.4. and I've set ResultSet.TYPE_FORWARD_ONLY and got a maximum of 50MB usage! As I said in the comments, I'll keep an eye on this, but it's really promising.
EDIT3: It seems there's another possible optimization available. As I said, I'm generating an XML, meaning lots of data is repeated (if nothing else, then tags), meaning String.intern() could help me here, I'll post back when I test this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
是否可以使用 ResultSet.TYPE_FORWARD_ONLY?
您使用了 ResultSet.TYPE_SCROLL_INSENSITIVE。我相信对于某些数据库(您没有说您使用哪一个)这会导致整个结果集加载到内存中。
Is it possible to use ResultSet.TYPE_FORWARD_ONLY?
You have used ResultSet.TYPE_SCROLL_INSENSITIVE. I believe for some databases (you didn't say which one you use) this causes the whole result set to be loaded in memory.
由于它是Java,内存应该只会暂时激增,除非您泄漏引用,例如,如果您将事物推入一个列表,该列表是具有整个程序生命周期的单例成员,或者根据我的经验,更有可能是资源泄漏,当使用文件句柄等非托管资源的对象从不调用其清理代码时(我假设这适用于 Java,尽管我正在考虑 C#),就会发生这种情况,这种情况通常是由不重新抛出的空异常处理程序引起的到父堆栈框架,这具有绕过finally块的最终效果......
Since it's Java, the memory should only spike temporarily, unless you are leaking references, like if you push things onto a list that is a member of a singleton that has life span of the entire program, or in my experience more likely is resource leaking, which happens when (and this I'm assuming applies to Java although I'm thinking of C#) objects that use unmanaged resources like file handles never call their cleanup code, a condition commonly caused by empty exception handlers that do not re-throw to the parent stack frame, which has the net effect of circumventing the finally block...
我运行了更多测试,结论是:
这是优化 2 和 3 的用法(如果没有 String.intern(),图表将是相同的,您应该只为每个点添加 5 MB)
这是没有它们的用法(最后使用量较少是由于程序内存不足:))
感谢大家的帮助。
I've ran some more tests and the conclusions are:
This is the usage with the optimizations 2 and 3 (if there wasn't String.intern() the graph would be the same, you should only add 5 MB more to every point)
and this is the usage without them (the lesser usage at the end is due to the program going out of memory :) )
Thank you everyone for your assistance.