使用 hibernate 读取大量数据时出现 OutOfMemory

发布于 2024-08-20 21:02:08 字数 3891 浏览 10 评论 0原文

我需要从数据库导出大量数据。这是代表我的数据的类:

public class Product{
...

    @OneToMany
    @JoinColumn(name = "product_id")
    @Cascade({SAVE_UPDATE, DELETE_ORPHAN})
    List<ProductHtmlSource> htmlSources = new ArrayList<ProductHtmlSource>();

...

ProductHtmlSource - 包含我实际需要导出的大字符串。

由于导出数据的大小大于 JVM 内存,因此我正在按块读取数据。像这样:

final int batchSize = 1000;      
for (int i = 0; i < 50; i++) {
  ScrollableResults iterator = getProductIterator(batchSize * i, batchSize * (i + 1));
  while (iterator.getScrollableResults().next()) {
     Product product = (Product) iterator.getScrollableResults().get(0); 
     List<String> htmls = product.getHtmlSources();
     <some processing>
  }

}

getProductIterator 的代码:

public ScrollableResults getProductIterator(int offset, int limit) {
        Session session = getSession(true);
        session.setCacheMode(CacheMode.IGNORE);
        ScrollableResults iterator = session
                .createCriteria(Product.class)
                .add(Restrictions.eq("status", Product.Status.DONE))
                .setFirstResult(offset)
                .setMaxResults(limit)
                .scroll(ScrollMode.FORWARD_ONLY);
        session.flush();
        session.clear();

        return iterator;
    }

问题是,尽管我在读取每个数据块后清除会话,Product 对象仍会累积在某处,并且出现 OutOfMemory 异常。问题不在于处理代码块,即使没有它我也会遇到内存错误。批处理的大小也不是问题,因为 1000 个对象可以轻松放入内存。

Profiler 显示对象在 org.hibernate.engine.StatefulPersistenceContext 类中累积。

堆栈跟踪:

Caused by: java.lang.OutOfMemoryError: Java heap space
    at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:518)
    at java.lang.StringBuffer.append(StringBuffer.java:307)
    at org.hibernate.type.TextType.get(TextType.java:41)
    at org.hibernate.type.NullableType.nullSafeGet(NullableType.java:163)
    at org.hibernate.type.NullableType.nullSafeGet(NullableType.java:154)
    at org.hibernate.type.AbstractType.hydrate(AbstractType.java:81)
    at org.hibernate.persister.entity.AbstractEntityPersister.hydrate(AbstractEntityPersister.java:2101)
    at org.hibernate.loader.Loader.loadFromResultSet(Loader.java:1380)
    at org.hibernate.loader.Loader.instanceNotYetLoaded(Loader.java:1308)
    at org.hibernate.loader.Loader.getRow(Loader.java:1206)
    at org.hibernate.loader.Loader.getRowFromResultSet(Loader.java:580)
    at org.hibernate.loader.Loader.doQuery(Loader.java:701)
    at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
    at org.hibernate.loader.Loader.loadCollection(Loader.java:1994)
    at org.hibernate.loader.collection.CollectionLoader.initialize(CollectionLoader.java:36)
    at org.hibernate.persister.collection.AbstractCollectionPersister.initialize(AbstractCollectionPersister.java:565)
    at org.hibernate.event.def.DefaultInitializeCollectionEventListener.onInitializeCollection(DefaultInitializeCollectionEventListener.java:63)
    at org.hibernate.impl.SessionImpl.initializeCollection(SessionImpl.java:1716)
    at org.hibernate.collection.AbstractPersistentCollection.initialize(AbstractPersistentCollection.java:344)
    at org.hibernate.collection.AbstractPersistentCollection.read(AbstractPersistentCollection.java:86)
    at org.hibernate.collection.AbstractPersistentCollection.readSize(AbstractPersistentCollection.java:109)
    at org.hibernate.collection.PersistentBag.size(PersistentBag.java:225)
    **at com.rivalwatch.plum.model.Product.getHtmlSource(Product.java:76)
    at com.rivalwatch.plum.model.Product.getHtmlSourceText(Product.java:80)
    at com.rivalwatch.plum.readers.AbstractDataReader.getData(AbstractDataReader.java:64)**

I need to export big amount of data from database. Here is classes that represents my data:

public class Product{
...

    @OneToMany
    @JoinColumn(name = "product_id")
    @Cascade({SAVE_UPDATE, DELETE_ORPHAN})
    List<ProductHtmlSource> htmlSources = new ArrayList<ProductHtmlSource>();

...
}

ProductHtmlSource - contains big string inside which I actually need to export.

Since size of exported data is bigger than JVM memory I'm reading my data by chunks. Like this:

final int batchSize = 1000;      
for (int i = 0; i < 50; i++) {
  ScrollableResults iterator = getProductIterator(batchSize * i, batchSize * (i + 1));
  while (iterator.getScrollableResults().next()) {
     Product product = (Product) iterator.getScrollableResults().get(0); 
     List<String> htmls = product.getHtmlSources();
     <some processing>
  }

}

Code of getProductIterator :

public ScrollableResults getProductIterator(int offset, int limit) {
        Session session = getSession(true);
        session.setCacheMode(CacheMode.IGNORE);
        ScrollableResults iterator = session
                .createCriteria(Product.class)
                .add(Restrictions.eq("status", Product.Status.DONE))
                .setFirstResult(offset)
                .setMaxResults(limit)
                .scroll(ScrollMode.FORWARD_ONLY);
        session.flush();
        session.clear();

        return iterator;
    }

The problem is that in spite of I clearing session after reading of each data chunk Product objects accumulates somewhere and I'm get OutOfMemory exception. The problem is not in processing block of code even without it I get memory error. The size of batch also is not a problem since 1000 objects easily sit into memory.

Profiler showed that objects accumulates in org.hibernate.engine.StatefulPersistenceContext class.

The stacktrace:

Caused by: java.lang.OutOfMemoryError: Java heap space
    at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:518)
    at java.lang.StringBuffer.append(StringBuffer.java:307)
    at org.hibernate.type.TextType.get(TextType.java:41)
    at org.hibernate.type.NullableType.nullSafeGet(NullableType.java:163)
    at org.hibernate.type.NullableType.nullSafeGet(NullableType.java:154)
    at org.hibernate.type.AbstractType.hydrate(AbstractType.java:81)
    at org.hibernate.persister.entity.AbstractEntityPersister.hydrate(AbstractEntityPersister.java:2101)
    at org.hibernate.loader.Loader.loadFromResultSet(Loader.java:1380)
    at org.hibernate.loader.Loader.instanceNotYetLoaded(Loader.java:1308)
    at org.hibernate.loader.Loader.getRow(Loader.java:1206)
    at org.hibernate.loader.Loader.getRowFromResultSet(Loader.java:580)
    at org.hibernate.loader.Loader.doQuery(Loader.java:701)
    at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
    at org.hibernate.loader.Loader.loadCollection(Loader.java:1994)
    at org.hibernate.loader.collection.CollectionLoader.initialize(CollectionLoader.java:36)
    at org.hibernate.persister.collection.AbstractCollectionPersister.initialize(AbstractCollectionPersister.java:565)
    at org.hibernate.event.def.DefaultInitializeCollectionEventListener.onInitializeCollection(DefaultInitializeCollectionEventListener.java:63)
    at org.hibernate.impl.SessionImpl.initializeCollection(SessionImpl.java:1716)
    at org.hibernate.collection.AbstractPersistentCollection.initialize(AbstractPersistentCollection.java:344)
    at org.hibernate.collection.AbstractPersistentCollection.read(AbstractPersistentCollection.java:86)
    at org.hibernate.collection.AbstractPersistentCollection.readSize(AbstractPersistentCollection.java:109)
    at org.hibernate.collection.PersistentBag.size(PersistentBag.java:225)
    **at com.rivalwatch.plum.model.Product.getHtmlSource(Product.java:76)
    at com.rivalwatch.plum.model.Product.getHtmlSourceText(Product.java:80)
    at com.rivalwatch.plum.readers.AbstractDataReader.getData(AbstractDataReader.java:64)**

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

ㄖ落Θ余辉 2024-08-27 21:02:08

看起来您正在使用起始行号和结束行号调用 getProductIterator() ,而 getProductIterator() 则需要起始行和行计数。随着“上限”变高,您正在读取更大块的数据。我认为您的意思是将batchSize 作为第二个参数传递给getProductIterator()。

It looks like you are calling getProductIterator() with the starting and ending row numbers, while getProductIterator() is expecting the starting row and a row count. As your "upper limit" gets higher you are reading data in bigger chunks. I think you mean to pass batchSize as the second argument to getProductIterator().

哎呦我呸! 2024-08-27 21:02:08

不是直接答案,但对于这种数据操作,我会使用 StatelessSession 接口

Not a direct answer but for this kind of data manipulation, I would use the StatelessSession interface.

自控 2024-08-27 21:02:08

KeithL 是对的——你正在超越一个不断增加的限制。但这样拆散无论如何也没有意义。滚动游标的全部要点是一次处理一行,因此无需将其分成块。获取大小减少了数据库访问次数,但代价是占用更多内存。一般模式应该是:

Query q = session.createCriteria(... no offset or limit ...);
q.setCacheMode(CacheMode.IGNORE); // prevent query or second level caching
q.setFetchSize(1000);  // experiment with this to optimize performance vs. memory
ScrollableResults iterator = query.scroll(ScrollMode.FORWARD_ONLY);
while (iterator.next()) {
  Product p = (Product)iterator.get();
  ...
  session.evict(p);  // required to keep objects from accumulating in the session
}

也就是说,错误是 getHtmlSources,因此问题可能与会话/光标/滚动问题完全无关。如果这些 html 字符串很大并且它们一直被引用,那么您可能会耗尽连续内存。

顺便说一句,我在 ScrollableResults 上没有看到 getScrollableResults 方法。

KeithL is right - you're passing an ever-increasing limit. But breaking it up that way doesn't make sense anyway. The whole point of a scroll cursor is that you process a row at a time so there's no need to break it up into chunks. The fetch size reduces the trips to the database at the cost of using up more memory. The general pattern should be:

Query q = session.createCriteria(... no offset or limit ...);
q.setCacheMode(CacheMode.IGNORE); // prevent query or second level caching
q.setFetchSize(1000);  // experiment with this to optimize performance vs. memory
ScrollableResults iterator = query.scroll(ScrollMode.FORWARD_ONLY);
while (iterator.next()) {
  Product p = (Product)iterator.get();
  ...
  session.evict(p);  // required to keep objects from accumulating in the session
}

That said, the error is getHtmlSources so the problem may be completely unrelated to the session/cursor/scroll issue. If those html strings are huge and they're being referenced the entire time, you may just be running out of contiguous memory.

Btw, I don't see a getScrollableResults method on ScrollableResults.

心碎的声音 2024-08-27 21:02:08

冒着显得愚蠢的风险——你有没有考虑过用另一种方式来做这件事?

就我个人而言,我会避免进行距离数据库“很远”的批处理。我不知道您正在使用什么数据库,但通常有一种机制可以有效地从数据库中提取数据集并提取数据。到一个文件中,即使它在退出时涉及相当简单的操作。存储过程,特定的导出实用程序。调查您的数据库供应商还提供哪些产品。

At the risk of appearing stupid - have you considered doing this another way?

Personally I would avoid doing batch processing that "far away" from the database. I don't know what database you're using but there's usually a mechanism for efficiently pulling a dataset out of the database & into a file even if it involves moderately simple manipulation on the way out. Stored procedures, specific export utilities. Investigate what else is available from your database vendor.

樱娆 2024-08-27 21:02:08

您可以发布异常堆栈跟踪吗?
可以通过为 GC 传递合适的 JVM 选项来解决这个问题。

我认为这是相关的 - Java StringBuilder 巨大的开销

从 StackTrace 可以看出,正在创建一个非常大的字符串并导致异常。

Can you post the Exception stacktrace?
It may be solved by passing suitable JVM options for GC.

I think this is related - Java StringBuilder huge overhead.

Looks from the StackTrace that a very large String is being created and causing the exception.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文