在 Java 中从数据库写入 (zip) 文件的最节省内存的方法是什么?

发布于 2024-09-24 00:13:13 字数 4466 浏览 4 评论 0原文

我的程序足够快,但我宁愿为了内存优化而放弃这个速度,因为一个用户的最大内存使用量高达 300 MB,这意味着很少有人会不断导致应用程序崩溃。我发现的大多数答案都与速度优化有关,其他答案只是一般性的(“如果直接从数据库写入内存,则不应该有太多内存使用”)。好吧,看来是有:)我正在考虑不发布代码,这样我就不会“锁定”某人的想法,但另一方面,如果你看不到我已经做了什么,我可能会浪费你的时间所以这里是:

// First I get the data from the database in a way that I think can't be more 
// optimized since i've done some testing and it seems to me that the problem 
// isn't in the RS and setting FetchSize and/or direction does not help.

public static void generateAndWriteXML(String query, String oznaka, BufferedOutputStream bos, Connection conn)
        throws Exception
{
    ResultSet rs = null;
    Statement stmt = null;
    try
    {
        stmt = conn.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY);
        rs = stmt.executeQuery(query);
        writeToZip(rs, oznaka, bos);
    } finally
    {
        ConnectionManager.close(rs, stmt, conn);
    }
}

// then I open up my streams. In the next method I'll generate an XML from the
// ResultSet and I want that XML to be saved in an XML, but since its size takes up
// to 300MB, I want it to be saved in a ZIP. I'm thinking that maybe by writing 
// first to file, then to zip I could get a slower but more efficient program.

private static void writeToZip(ResultSet rs, String oznaka, BufferedOutputStream bos)
        throws SAXException, SQLException, IOException
{
    ZipEntry ze = new ZipEntry(oznaka + ".xml");
    ZipOutputStream zos = new ZipOutputStream(bos);
    zos.putNextEntry(ze);
    OutputStreamWriter writer = new OutputStreamWriter(zos, "UTF8");
    writeXMLToWriter(rs, writer);
    try
    {
        writer.close();
    } catch (IOException e)
    {
    }
    try
    {
        zos.closeEntry();
    } catch (IOException e)
    {
    }
    try
    {
        zos.flush();
    } catch (IOException e)
    {
    }
    try
    {
        bos.close();
    } catch (IOException e)
    {
    }
}

// And finally, the method that does the actual generating and writing. 
// This is the second point I think I could do the memory optimization since the
// DataWriter is custom and it extends a custom XMLWriter that extends the standard
// org.xml.sax.helpers.XMLFilterImpl I've tried with flushing at points in program,
// but the memory that is occupied remains the same, it only takes longer.

public static void writeXMLToWriter(ResultSet rs, Writer writer) throws SAXException, SQLException, IOException
{
    //Set up XML
    DataWriter w = new DataWriter(writer);
    w.startDocument();
    w.setIndentStep(2);
    w.startElement(startingXMLElement);
    // Get the metadata
    ResultSetMetaData meta = rs.getMetaData();
    int count = meta.getColumnCount();
    // Iterate over the set
    while (rs.next())
    {
        w.startElement(rowElement);
        for (int i = 0; i < count; i++)
        {
            Object ob = rs.getObject(i + 1);
            if (rs.wasNull())
            {
                ob = null;
            }
            // XML elements are repeated so they could benefit from caching
            String colName = meta.getColumnLabel(i + 1).intern();
            if (ob != null)
            {
                if (ob instanceof Timestamp)
                {
                    w.dataElement(colName, Util.formatDate((Timestamp) ob, dateFormat));
                }
                else if (ob instanceof BigDecimal)
                {
                    // Possible benefit from writing ints as strings and interning them
                    w.dataElement(colName, Util.transformToHTML(new Integer(((BigDecimal) ob).intValue())));
                }
                else
                {   // there's enough of data that's repeated to validate the use of interning
                    w.dataElement(colName, ob.toString().intern());
                }

            }
            else
            {
                w.emptyElement(colName);
            }
        }
        w.endElement(rowElement);
    }
    w.endElement(startingXMLElement);
    w.endDocument();
}

编辑:这是一个内存使用情况的示例(使用 VisualVM 获取):

内存使用情况截图

编辑2:数据库是Oracle 10.2.0.4。我设置了 ResultSet.TYPE_FORWARD_ONLY 并获得了最大 50MB 的使用量!正如我在评论中所说,我会密切关注这一点,但它确实很有希望。

添加 ResultSet.TYPE_FORWARD_ONLY 后的内存使用情况

EDIT3:似乎还有另一种可能的优化可用。正如我所说,我正在生成一个 XML,这意味着大量数据被重复(如果没有别的,那么就是标签),这意味着 String.intern() 可以在这里帮助我,当我测试这个时我会发回来。

My program is fast enough, but I'd rather give up that speed for memory optimization since one user's maximum memory usage goes up to 300 MB meaning few of them could constantly crash the application. Most of the answers I found were related to speed optimization, and other were just general ("if you write directly from a database to memory there shouldn't be much memory usage"). Well, it seems there is :) I was thinking about not posting code so I wouldn't "lock" someone's ideas, but on the other hand, I could be wasting your time if you don't see what I've already done so here it is:

// First I get the data from the database in a way that I think can't be more 
// optimized since i've done some testing and it seems to me that the problem 
// isn't in the RS and setting FetchSize and/or direction does not help.

public static void generateAndWriteXML(String query, String oznaka, BufferedOutputStream bos, Connection conn)
        throws Exception
{
    ResultSet rs = null;
    Statement stmt = null;
    try
    {
        stmt = conn.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY);
        rs = stmt.executeQuery(query);
        writeToZip(rs, oznaka, bos);
    } finally
    {
        ConnectionManager.close(rs, stmt, conn);
    }
}

// then I open up my streams. In the next method I'll generate an XML from the
// ResultSet and I want that XML to be saved in an XML, but since its size takes up
// to 300MB, I want it to be saved in a ZIP. I'm thinking that maybe by writing 
// first to file, then to zip I could get a slower but more efficient program.

private static void writeToZip(ResultSet rs, String oznaka, BufferedOutputStream bos)
        throws SAXException, SQLException, IOException
{
    ZipEntry ze = new ZipEntry(oznaka + ".xml");
    ZipOutputStream zos = new ZipOutputStream(bos);
    zos.putNextEntry(ze);
    OutputStreamWriter writer = new OutputStreamWriter(zos, "UTF8");
    writeXMLToWriter(rs, writer);
    try
    {
        writer.close();
    } catch (IOException e)
    {
    }
    try
    {
        zos.closeEntry();
    } catch (IOException e)
    {
    }
    try
    {
        zos.flush();
    } catch (IOException e)
    {
    }
    try
    {
        bos.close();
    } catch (IOException e)
    {
    }
}

// And finally, the method that does the actual generating and writing. 
// This is the second point I think I could do the memory optimization since the
// DataWriter is custom and it extends a custom XMLWriter that extends the standard
// org.xml.sax.helpers.XMLFilterImpl I've tried with flushing at points in program,
// but the memory that is occupied remains the same, it only takes longer.

public static void writeXMLToWriter(ResultSet rs, Writer writer) throws SAXException, SQLException, IOException
{
    //Set up XML
    DataWriter w = new DataWriter(writer);
    w.startDocument();
    w.setIndentStep(2);
    w.startElement(startingXMLElement);
    // Get the metadata
    ResultSetMetaData meta = rs.getMetaData();
    int count = meta.getColumnCount();
    // Iterate over the set
    while (rs.next())
    {
        w.startElement(rowElement);
        for (int i = 0; i < count; i++)
        {
            Object ob = rs.getObject(i + 1);
            if (rs.wasNull())
            {
                ob = null;
            }
            // XML elements are repeated so they could benefit from caching
            String colName = meta.getColumnLabel(i + 1).intern();
            if (ob != null)
            {
                if (ob instanceof Timestamp)
                {
                    w.dataElement(colName, Util.formatDate((Timestamp) ob, dateFormat));
                }
                else if (ob instanceof BigDecimal)
                {
                    // Possible benefit from writing ints as strings and interning them
                    w.dataElement(colName, Util.transformToHTML(new Integer(((BigDecimal) ob).intValue())));
                }
                else
                {   // there's enough of data that's repeated to validate the use of interning
                    w.dataElement(colName, ob.toString().intern());
                }

            }
            else
            {
                w.emptyElement(colName);
            }
        }
        w.endElement(rowElement);
    }
    w.endElement(startingXMLElement);
    w.endDocument();
}

EDIT: Here is an example of memory usage (taken with visualVM):

Memory usage screenshot

EDIT2: The database is Oracle 10.2.0.4. and I've set ResultSet.TYPE_FORWARD_ONLY and got a maximum of 50MB usage! As I said in the comments, I'll keep an eye on this, but it's really promising.

Memory usage after adding  ResultSet.TYPE_FORWARD_ONLY

EDIT3: It seems there's another possible optimization available. As I said, I'm generating an XML, meaning lots of data is repeated (if nothing else, then tags), meaning String.intern() could help me here, I'll post back when I test this.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

时间你老了 2024-10-01 00:13:13

是否可以使用 ResultSet.TYPE_FORWARD_ONLY?

您使用了 ResultSet.TYPE_SCROLL_INSENSITIVE。我相信对于某些数据库(您没有说您使用哪一个)这会导致整个结果集加载到内存中。

Is it possible to use ResultSet.TYPE_FORWARD_ONLY?

You have used ResultSet.TYPE_SCROLL_INSENSITIVE. I believe for some databases (you didn't say which one you use) this causes the whole result set to be loaded in memory.

把回忆走一遍 2024-10-01 00:13:13

由于它是Java,内存应该只会暂时激增,除非您泄漏引用,例如,如果您将事物推入一个列表,该列表是具有整个程序生命周期的单例成员,或者根据我的经验,更有可能是资源泄漏,当使用文件句柄等非托管资源的对象从不调用其清理代码时(我假设这适用于 Java,尽管我正在考虑 C#),就会发生这种情况,这种情况通常是由不重新抛出的空异常处理程序引起的到父堆栈框架,这具有绕过finally块的最终效果......

Since it's Java, the memory should only spike temporarily, unless you are leaking references, like if you push things onto a list that is a member of a singleton that has life span of the entire program, or in my experience more likely is resource leaking, which happens when (and this I'm assuming applies to Java although I'm thinking of C#) objects that use unmanaged resources like file handles never call their cleanup code, a condition commonly caused by empty exception handlers that do not re-throw to the parent stack frame, which has the net effect of circumventing the finally block...

烏雲後面有陽光 2024-10-01 00:13:13

我运行了更多测试,结论是:

  1. 最大的收获是在 JVM 中(或者 VisualVM 在监视 Java 5 堆空间时出现问题:)。当我第一次报告 ResultSet.TYPE_FORWARD_ONLY 给我带来了显着的收益时,我错了。最大的收获是使用 Java 5,在 Java 5 中,相同的程序使用多达 50 MB 的堆空间,而在 Java 6 中,相同的代码占用多达 150 MB 的堆空间。
  2. 第二个好处是 ResultSet.TYPE_FORWARD_ONLY,这使得程序占用尽可能少的内存。
  3. 第三个好处是 Sting.intern() ,它使程序占用的内存更少,因为它缓存字符串而不是创建新字符串。

这是优化 2 和 3 的用法(如果没有 String.intern(),图表将是相同的,您应该只为每个点添加 5 MB)

alt text

这是没有它们的用法(最后使用量较少是由于程序内存不足:))
alt text

感谢大家的帮助。

I've ran some more tests and the conclusions are:

  1. The biggest gain is in JVM (or visualvm has problems monitoring Java 5 Heap space:). When I first reported that ResultSet.TYPE_FORWARD_ONLY got me a significant gain, I was wrong. The biggest gain was by using Java 5 under which the same program used up to 50MB of heapspace, as opposed to Java 6 under which the same code took up to 150 MB.
  2. Second gain is in ResultSet.TYPE_FORWARD_ONLY which made the program take as small amount of memory as possible.
  3. Third gain is in Sting.intern() which made the program take a bit less memory since it caches strings instead of creating new ones.

This is the usage with the optimizations 2 and 3 (if there wasn't String.intern() the graph would be the same, you should only add 5 MB more to every point)

alt text

and this is the usage without them (the lesser usage at the end is due to the program going out of memory :) )
alt text

Thank you everyone for your assistance.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文