使用多线程处理大量数据
我需要编写需要处理数据库中存储的大量数据(100 000 条记录)的 ac# 服务(可以是 Windows 服务或控制台应用程序)。 处理每条记录也是一个相当复杂的操作。作为处理的一部分,我需要执行大量插入和更新。
我们使用 NHibernate 作为 ORM。
一种方法是加载所有记录并按顺序处理它们......这可能会非常慢。 我正在研究多线程选项,并考虑让多个线程同时处理大量记录。
任何人都可以给我一些关于我应该如何处理这个问题的指示..考虑到我正在使用 NHibernate 以及可能出现的问题,例如死锁等,
非常感谢。
I need to write a c# service ( could be a windows service or a console app) that needs to process large amounts of data ( 100 000 records) stored in a database.
Processing each record is also a fairly complex operation. I need to perform a lot of inserts an updates as part of the processing.
We are using NHibernate as the ORM.
One way is to load all the records and process them sequentially... which could turn out to be quite slow.
I was looking at multi threading options and was thinking of having multiples threads processing chunks of records simultaneously .
Could anyone give me some pointers on how I should approach this.. considering that I'm using NHibernate and what are the possible gotchas like deadlock etc
Thanks a lot.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您应该考虑任务并行库。
You should consider Task Parallel Library.
假设您使用的是 .NET 4.0,您可以使用任务并行库(如前所述)执行以下操作:
您的源集合将是已加载记录的
IEnumerable
。图书馆将为您处理一切:阅读使用
Parallel.ForEach()的教程可能会有所帮助
。另外,请注意潜在的陷阱。
Assuming you are using .NET 4.0, you can use the Task Parallel Library (as has been mentioned) to do something like this:
Your source collection would be an
IEnumerable
of the loaded records. The library will handle everything for you:It may help to read a tutorial on using
Parallel.ForEach()
. Also, be aware of potential pitfalls.听起来 PLINQ 是最好的解决方案(本文第 5 章)。但由于每个计算都需要大量使用数据库,因此您应该为每个线程创建单独的会话。
Sounds like PLINQ is the best solution (Chapter 5 in this article). But as each calculation is working a lot with database, you should create separate session for each thread.
如果可能,请使用 IStatelessSessions 并尝试 adonet.batch_size 属性。
另外它的性能需要如何?我是 NH 的粉丝,但在这种情况下,存储过程可能会更好
Use IStatelessSessions if possible and experiment with the adonet.batch_size property.
Also how performant does it need to be? I'm a fan of NH but this is one scenario where stored procedures might be better