如何使用 hibernate 正确迭代数据库记录

发布于 2024-12-22 13:21:38 字数 754 浏览 4 评论 0原文

我想迭代数据库中的记录并更新它们。然而,由于更新既需要一些时间又容易出错,所以我需要a)不要让数据库等待(例如使用ScrollableResults)和b)在每次更新后提交。 第二件事是,这是在多个线程中完成的,因此我需要确保如果线程 A 正在处理一条记录,则线程 B 正在获取另一条记录。 我怎样才能用 hibernate 明智地实现这一点?

为了给出更好的想法,以下代码将由多个线程执行,其中所有线程共享 RecordIterator 的单个实例:

Iterator<Record> iter = db.getRecordIterator();
while(iter.hasNext()){
    Record rec = iter.next();
    // do something lengthy here
    db.save(rec);
}

所以我的问题是如何实现 RecordIterator 。如果在每个 next() 上执行查询,如何确保不会两次返回相同的记录?如果不这样做,使用哪个查询来返回分离的对象?一般方法是否存在缺陷(例如,每个线程使用一个 RecordIterator 并让数据库以某种方式处理同步)?附加信息:许多记录可以通过多种方式在本地保存(例如,在一组已处理的记录中)。

更新:由于整个过程需要一些时间,因此记录的状态可能会发生变化。因此,查询结果的顺序可能会发生变化。我想为了解决这个问题,我必须在返回记录进行处理后在数据库中标记它们......

I want to iterate over records in the database and update them. However since that updating is both taking some time and prone to errors, I need to a) don't keep the db waiting (as e.g. with a ScrollableResults) and b) commit after each update.
Second thing is that this is done in multiple threads, so I need to ensure that if thread A is taking care of a record, thread B is getting another one.
How can I implement this sensibly with hibernate?

To give a better idea, the following code would be executed by several threads, where all threads share a single instance of the RecordIterator:

Iterator<Record> iter = db.getRecordIterator();
while(iter.hasNext()){
    Record rec = iter.next();
    // do something lengthy here
    db.save(rec);
}

So my question is how to implement the RecordIterator. If on every next() I perform a query, how to ensure that I don't return the same record twice? If I don't, which query to use to return detached objects? Is there a flaw in the general approach (e.g. use one RecordIterator per thread and let the db somehow handle synchronization)? Additional info: there are way to many records to locally keep them (e.g. in a set of treated records).

Update: Because the overall process takes some time, it can happen that the status of Records changes. Due to that the ordering of the result of a query can change. I guess to solve this problem I have to mark records in the database once I return them for processing...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

梦旅人picnic 2024-12-29 13:21:39

我的建议是,由于您正在共享主迭代器的一个实例,因此使用共享的 Hibernate 事务来运行所有线程,在开始时进行一次加载,在最后进行一次大的保存。您将所有数据加载到一个“Set”中,您可以使用线程对其进行迭代(注意锁定,因此您可能需要为每个线程拆分一个部分,或者以某种方式管理共享资源,这样您就不会“ t 重叠)。

Hibernate 解决方案的优点在于,记录不会立即保存到数据库(因为您正在使用事务),而是存储在 Hibernate 的缓存中。然后最后它们都会被立即写回数据库。这将节省您担心的那些昂贵的数据库写入,而且它为您提供了在每次迭代时使用的实际对象,而不仅仅是数据库行。

我在您的更新中看到记录的状态在处理过程中可能会发生变化,这总是会导致问题。如果这是一个不断运行的过程或长时间运行,那么我使用休眠解决方案的建议是在较小的集合中工作,是的,添加一个标志来标记已更新的记录,以便当您移动到下一个集合时可以捡起那些没有被触及的。

My suggestion would be, since you're sharing an instance of the master iterator, is to run all of your threads using a shared Hibernate transaction, with one load at the beginning and a big save at the end. You load all of your data into a single 'Set' which you can iterate over using your threads (be careful of locking, so you might want to split off a section for each thread, or somehow manage the shared resource so that you don't overlap).

The beauty of the Hibernate solution is that the records aren't immediately saved to the database, since you're using a transaction, and are stored in hibernate's cache. Then at the end they'd all be written back to the database at once. This would save on those expensive database writes you're worried about, plus it gives you an actual object to work with on each iteration, instead of just a database row.

I see in your update that the status of the records may change during processing, and this could always cause a problem. If this is a constantly running process or long running, then my advice using a hibernate solution would be to work in smaller sets, and yes, add a flag to mark records that have been updated, so that when you move to the next set you can pick up ones that haven't been touched.

久光 2024-12-29 13:21:38

嗯,如何将对象从读取器线程推送到某个有界阻塞队列中,并让更新器线程从该队列中读取。

在您的阅读器中,使用 setFirstResult/setMaxResults 进行一些分页。例如,如果队列中最多有 1000 个元素,则一次填充 500 个。当队列已满时,下一次推送将自动等待,直到更新程序获取下一个元素。

Hmmm, what about pushing your objects from a reader thread in some bounded blocking queue, and let your updater threads read from that queue.

In your reader, do some paging with setFirstResult/setMaxResults. E.g. if you have 1000 elements maximum in your queue, fill them up 500 at a time. When the queue is full, the next push will automatically wait until the updaters take the next elements.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文