使用 hibernate 读取大量数据时出现 OutOfMemory

发布于 2024-08-20 21:02:08 字数 3891 浏览 10 评论 0原文

我需要从数据库导出大量数据。这是代表我的数据的类：

public class Product{
...

    @OneToMany
    @JoinColumn(name = "product_id")
    @Cascade({SAVE_UPDATE, DELETE_ORPHAN})
    List<ProductHtmlSource> htmlSources = new ArrayList<ProductHtmlSource>();

...

ProductHtmlSource - 包含我实际需要导出的大字符串。

由于导出数据的大小大于 JVM 内存，因此我正在按块读取数据。像这样：

final int batchSize = 1000;      
for (int i = 0; i < 50; i++) {
  ScrollableResults iterator = getProductIterator(batchSize * i, batchSize * (i + 1));
  while (iterator.getScrollableResults().next()) {
     Product product = (Product) iterator.getScrollableResults().get(0); 
     List<String> htmls = product.getHtmlSources();
     <some processing>
  }

}

getProductIterator 的代码：

public ScrollableResults getProductIterator(int offset, int limit) {
        Session session = getSession(true);
        session.setCacheMode(CacheMode.IGNORE);
        ScrollableResults iterator = session
                .createCriteria(Product.class)
                .add(Restrictions.eq("status", Product.Status.DONE))
                .setFirstResult(offset)
                .setMaxResults(limit)
                .scroll(ScrollMode.FORWARD_ONLY);
        session.flush();
        session.clear();

        return iterator;
    }

问题是，尽管我在读取每个数据块后清除会话，Product 对象仍会累积在某处，并且出现 OutOfMemory 异常。问题不在于处理代码块，即使没有它我也会遇到内存错误。批处理的大小也不是问题，因为 1000 个对象可以轻松放入内存。

Profiler 显示对象在 org.hibernate.engine.StatefulPersistenceContext 类中累积。

堆栈跟踪：

Caused by: java.lang.OutOfMemoryError: Java heap space
    at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:518)
    at java.lang.StringBuffer.append(StringBuffer.java:307)
    at org.hibernate.type.TextType.get(TextType.java:41)
    at org.hibernate.type.NullableType.nullSafeGet(NullableType.java:163)
    at org.hibernate.type.NullableType.nullSafeGet(NullableType.java:154)
    at org.hibernate.type.AbstractType.hydrate(AbstractType.java:81)
    at org.hibernate.persister.entity.AbstractEntityPersister.hydrate(AbstractEntityPersister.java:2101)
    at org.hibernate.loader.Loader.loadFromResultSet(Loader.java:1380)
    at org.hibernate.loader.Loader.instanceNotYetLoaded(Loader.java:1308)
    at org.hibernate.loader.Loader.getRow(Loader.java:1206)
    at org.hibernate.loader.Loader.getRowFromResultSet(Loader.java:580)
    at org.hibernate.loader.Loader.doQuery(Loader.java:701)
    at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
    at org.hibernate.loader.Loader.loadCollection(Loader.java:1994)
    at org.hibernate.loader.collection.CollectionLoader.initialize(CollectionLoader.java:36)
    at org.hibernate.persister.collection.AbstractCollectionPersister.initialize(AbstractCollectionPersister.java:565)
    at org.hibernate.event.def.DefaultInitializeCollectionEventListener.onInitializeCollection(DefaultInitializeCollectionEventListener.java:63)
    at org.hibernate.impl.SessionImpl.initializeCollection(SessionImpl.java:1716)
    at org.hibernate.collection.AbstractPersistentCollection.initialize(AbstractPersistentCollection.java:344)
    at org.hibernate.collection.AbstractPersistentCollection.read(AbstractPersistentCollection.java:86)
    at org.hibernate.collection.AbstractPersistentCollection.readSize(AbstractPersistentCollection.java:109)
    at org.hibernate.collection.PersistentBag.size(PersistentBag.java:225)
    **at com.rivalwatch.plum.model.Product.getHtmlSource(Product.java:76)
    at com.rivalwatch.plum.model.Product.getHtmlSourceText(Product.java:80)
    at com.rivalwatch.plum.readers.AbstractDataReader.getData(AbstractDataReader.java:64)**

原文

I need to export big amount of data from database. Here is classes that represents my data:

public class Product{
...

    @OneToMany
    @JoinColumn(name = "product_id")
    @Cascade({SAVE_UPDATE, DELETE_ORPHAN})
    List<ProductHtmlSource> htmlSources = new ArrayList<ProductHtmlSource>();

...
}

ProductHtmlSource - contains big string inside which I actually need to export.

Since size of exported data is bigger than JVM memory I'm reading my data by chunks. Like this:

final int batchSize = 1000;      
for (int i = 0; i < 50; i++) {
  ScrollableResults iterator = getProductIterator(batchSize * i, batchSize * (i + 1));
  while (iterator.getScrollableResults().next()) {
     Product product = (Product) iterator.getScrollableResults().get(0); 
     List<String> htmls = product.getHtmlSources();
     <some processing>
  }

}

Code of getProductIterator :

public ScrollableResults getProductIterator(int offset, int limit) {
        Session session = getSession(true);
        session.setCacheMode(CacheMode.IGNORE);
        ScrollableResults iterator = session
                .createCriteria(Product.class)
                .add(Restrictions.eq("status", Product.Status.DONE))
                .setFirstResult(offset)
                .setMaxResults(limit)
                .scroll(ScrollMode.FORWARD_ONLY);
        session.flush();
        session.clear();

        return iterator;
    }

The problem is that in spite of I clearing session after reading of each data chunk Product objects accumulates somewhere and I'm get OutOfMemory exception. The problem is not in processing block of code even without it I get memory error. The size of batch also is not a problem since 1000 objects easily sit into memory.

Profiler showed that objects accumulates in org.hibernate.engine.StatefulPersistenceContext class.

The stacktrace:

Caused by: java.lang.OutOfMemoryError: Java heap space
    at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:518)
    at java.lang.StringBuffer.append(StringBuffer.java:307)
    at org.hibernate.type.TextType.get(TextType.java:41)
    at org.hibernate.type.NullableType.nullSafeGet(NullableType.java:163)
    at org.hibernate.type.NullableType.nullSafeGet(NullableType.java:154)
    at org.hibernate.type.AbstractType.hydrate(AbstractType.java:81)
    at org.hibernate.persister.entity.AbstractEntityPersister.hydrate(AbstractEntityPersister.java:2101)
    at org.hibernate.loader.Loader.loadFromResultSet(Loader.java:1380)
    at org.hibernate.loader.Loader.instanceNotYetLoaded(Loader.java:1308)
    at org.hibernate.loader.Loader.getRow(Loader.java:1206)
    at org.hibernate.loader.Loader.getRowFromResultSet(Loader.java:580)
    at org.hibernate.loader.Loader.doQuery(Loader.java:701)
    at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
    at org.hibernate.loader.Loader.loadCollection(Loader.java:1994)
    at org.hibernate.loader.collection.CollectionLoader.initialize(CollectionLoader.java:36)
    at org.hibernate.persister.collection.AbstractCollectionPersister.initialize(AbstractCollectionPersister.java:565)
    at org.hibernate.event.def.DefaultInitializeCollectionEventListener.onInitializeCollection(DefaultInitializeCollectionEventListener.java:63)
    at org.hibernate.impl.SessionImpl.initializeCollection(SessionImpl.java:1716)
    at org.hibernate.collection.AbstractPersistentCollection.initialize(AbstractPersistentCollection.java:344)
    at org.hibernate.collection.AbstractPersistentCollection.read(AbstractPersistentCollection.java:86)
    at org.hibernate.collection.AbstractPersistentCollection.readSize(AbstractPersistentCollection.java:109)
    at org.hibernate.collection.PersistentBag.size(PersistentBag.java:225)
    **at com.rivalwatch.plum.model.Product.getHtmlSource(Product.java:76)
    at com.rivalwatch.plum.model.Product.getHtmlSourceText(Product.java:80)
    at com.rivalwatch.plum.readers.AbstractDataReader.getData(AbstractDataReader.java:64)**

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ㄖ落Θ余辉 2024-08-27 21:02:08

看起来您正在使用起始行号和结束行号调用 getProductIterator() ，而 getProductIterator() 则需要起始行和行计数。随着“上限”变高，您正在读取更大块的数据。我认为您的意思是将batchSize 作为第二个参数传递给getProductIterator()。

回复收藏 0 原文

哎呦我呸! 2024-08-27 21:02:08

不是直接答案，但对于这种数据操作，我会使用 StatelessSession 接口。

回复收藏 0 原文

自控 2024-08-27 21:02:08

KeithL 是对的——你正在超越一个不断增加的限制。但这样拆散无论如何也没有意义。滚动游标的全部要点是一次处理一行，因此无需将其分成块。获取大小减少了数据库访问次数，但代价是占用更多内存。一般模式应该是：

Query q = session.createCriteria(... no offset or limit ...);
q.setCacheMode(CacheMode.IGNORE); // prevent query or second level caching
q.setFetchSize(1000);  // experiment with this to optimize performance vs. memory
ScrollableResults iterator = query.scroll(ScrollMode.FORWARD_ONLY);
while (iterator.next()) {
  Product p = (Product)iterator.get();
  ...
  session.evict(p);  // required to keep objects from accumulating in the session
}

也就是说，错误是 getHtmlSources，因此问题可能与会话/光标/滚动问题完全无关。如果这些 html 字符串很大并且它们一直被引用，那么您可能会耗尽连续内存。

顺便说一句，我在 ScrollableResults 上没有看到 getScrollableResults 方法。

KeithL is right - you're passing an ever-increasing limit. But breaking it up that way doesn't make sense anyway. The whole point of a scroll cursor is that you process a row at a time so there's no need to break it up into chunks. The fetch size reduces the trips to the database at the cost of using up more memory. The general pattern should be:

Query q = session.createCriteria(... no offset or limit ...);
q.setCacheMode(CacheMode.IGNORE); // prevent query or second level caching
q.setFetchSize(1000);  // experiment with this to optimize performance vs. memory
ScrollableResults iterator = query.scroll(ScrollMode.FORWARD_ONLY);
while (iterator.next()) {
  Product p = (Product)iterator.get();
  ...
  session.evict(p);  // required to keep objects from accumulating in the session
}

That said, the error is getHtmlSources so the problem may be completely unrelated to the session/cursor/scroll issue. If those html strings are huge and they're being referenced the entire time, you may just be running out of contiguous memory.

Btw, I don't see a getScrollableResults method on ScrollableResults.

回复收藏 0 原文