使用并行扩展或并行 LINQ 与 LINQ Take
我有一个大约有 500 万行的数据库。我正在尝试为数据库生成 XML 字符串并将它们推送到服务。该服务支持一次获取 1000 条记录,而不是一次执行一项操作。目前,速度相当慢,每 1000 条记录需要 10 秒以上(包括写回数据库和上传到服务)。
我尝试让以下代码工作,但失败了......当我尝试它时,我遇到了崩溃。有什么想法吗?
var data = <insert LINQ query here>
int take = 1000
int left = data.Count();
Parallel.For(0, left / 1000, i =>
{
data.Skip(i*1000).Take(1000)...
//Generate XML here.
//Write to service here...
//Mark items in database as generated.
});
//Get companies which are still marked as not generated.
//Create XML.
//Write to Service.
我收到崩溃消息,告诉我索引超出范围。如果 left
是 500 万,则循环中的数字不应超过 5000。如果我再将其乘以 1000,则不应超过 500 万。如果它工作了一段时间然后失败了,我不介意,但它只是在 SQL 查询之后失败!
I have a database with about 5 million rows in it. I am trying to generate XML strings for the database and push them to a service. Instead of doing this one at a time, the service supports taking 1000 records at a time. At the moment, this is quite slow, taking upwards of 10 seconds per 1000 records (including writing back to the database and uploading to the service).
I tried to get the following code working, but have failed... I get a crash when I try it. Any ideas?
var data = <insert LINQ query here>
int take = 1000
int left = data.Count();
Parallel.For(0, left / 1000, i =>
{
data.Skip(i*1000).Take(1000)...
//Generate XML here.
//Write to service here...
//Mark items in database as generated.
});
//Get companies which are still marked as not generated.
//Create XML.
//Write to Service.
I get a crash telling me that the index is out of bounds. If left
is 5 million, the number in the loop should be no more than 5000. If I multiply that again by 1000, I should not get more than 5 million. I wouldn't mind if it worked for a bit, and then failed, but it just fails after the SQL query!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我认为它不喜欢你的最后一个索引值 - 它应该是 left / 1000 -1,而不是 left / 1000:
I think it doesn't like your last index value - it should be left / 1000 -1, not left / 1000:
我怀疑索引越界错误是由当前显示的代码以外的代码引起的。
话虽如此,这可以以更简洁的方式处理。您应该考虑切换到使用自定义分区程序,而不是使用此方法。这将显着提高效率,因为每次调用 Skip/Take 都会强制重新评估您的序列。
I suspect the index out of bounds error is caused by code other than what is currently being displayed.
That being said, this could be handled in a much cleaner manner. Instead of using this approach, you should consider switching to using a custom partitioner. This will be dramatically more efficient, as each call to Skip/Take is going to force a re-evaluation of your sequence.