使用多线程处理大量数据

发布于 2024-10-22 23:33:12 字数 304 浏览 1 评论 0原文

我需要编写需要处理数据库中存储的大量数据(100 000 条记录)的 ac# 服务(可以是 Windows 服务或控制台应用程序)。 处理每条记录也是一个相当复杂的操作。作为处理的一部分,我需要执行大量插入和更新。

我们使用 NHibernate 作为 ORM。

一种方法是加载所有记录并按顺序处理它们......这可能会非常慢。 我正在研究多线程选项,并考虑让多个线程同时处理大量记录。

任何人都可以给我一些关于我应该如何处理这个问题的指示..考虑到我正在使用 NHibernate 以及可能出现的问题,例如死锁等,

非常感谢。

I need to write a c# service ( could be a windows service or a console app) that needs to process large amounts of data ( 100 000 records) stored in a database.
Processing each record is also a fairly complex operation. I need to perform a lot of inserts an updates as part of the processing.

We are using NHibernate as the ORM.

One way is to load all the records and process them sequentially... which could turn out to be quite slow.
I was looking at multi threading options and was thinking of having multiples threads processing chunks of records simultaneously .

Could anyone give me some pointers on how I should approach this.. considering that I'm using NHibernate and what are the possible gotchas like deadlock etc

Thanks a lot.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

本宫微胖 2024-10-29 23:33:12

您应该考虑任务并行库

You should consider Task Parallel Library.

谷夏 2024-10-29 23:33:12

假设您使用的是 .NET 4.0,您可以使用任务并行库(如前所述)执行以下操作:

Parallel.ForEach(sourceCollection, item => Process(item));

您的源集合将是已加载记录的 IEnumerable。图书馆将为您处理一切:

源集合已分区,工作根据系统环境安排在多个线程上。系统上的处理器越多,并行方法运行的速度就越快。

阅读使用Parallel.ForEach()的教程可能会有所帮助。另外,请注意潜在的陷阱

Assuming you are using .NET 4.0, you can use the Task Parallel Library (as has been mentioned) to do something like this:

Parallel.ForEach(sourceCollection, item => Process(item));

Your source collection would be an IEnumerable of the loaded records. The library will handle everything for you:

The source collection is partitioned and the work is scheduled on multiple threads based on the system environment. The more processors on the system, the faster the parallel method runs.

It may help to read a tutorial on using Parallel.ForEach(). Also, be aware of potential pitfalls.

绻影浮沉 2024-10-29 23:33:12

听起来 PLINQ 是最好的解决方案(本文第 5 章)。但由于每个计算都需要大量使用数据库,因此您应该为每个线程创建单独的会话。

Sounds like PLINQ is the best solution (Chapter 5 in this article). But as each calculation is working a lot with database, you should create separate session for each thread.

狼性发作 2024-10-29 23:33:12

如果可能,请使用 IStatelessSessions 并尝试 adonet.batch_size 属性。

另外它的性能需要如何?我是 NH 的粉丝,但在这种情况下,存储过程可能会更好

Use IStatelessSessions if possible and experiment with the adonet.batch_size property.

Also how performant does it need to be? I'm a fan of NH but this is one scenario where stored procedures might be better

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文