Linq 和二分搜索 - 改进这个缓慢的Where 语句?
我有两个藏品,每个藏品大约有 40,000 件。
列表 2 中的元素通过外键链接到列表 1 的元素。
对于列表一的每个元素,我想在列表二中找到相应的元素。
像这样的事情:
foreach(var item in list1)
{
var match = list2.Where(child => child.ID == item.ChildID).FirstOrDefault();
item.Child = match;
}
这可行,但速度慢得要命。
现在,list1 和 list 2 都根据数据库中的这些键进行排序。因此 list1 按 ChildID 排序,list2 按 ID 排序(相同的值)。
我认为二分搜索会大大加快速度,但我在某处读到 Linq 会为Where 子句中的列表选择最合适的策略。也许我需要显式转换为排序列表?或者也许我需要使用比较器实现自定义二分搜索算法?
任何见解表示赞赏。
谢谢。
I've got two collections, each containing about 40,000 items.
The elements in list 2 are linked to the elements of list one via a foreign key.
For each element of list one, I want to find the corresponding element in list two.
Something like this:
foreach(var item in list1)
{
var match = list2.Where(child => child.ID == item.ChildID).FirstOrDefault();
item.Child = match;
}
This works but it's slow as hell.
Now, both list1 and list 2 are sorted by these keys from the database. So list1 is ordered by ChildID and list2 is ordered by ID (same value).
I think a Binary search would dramatically speed this up, but I read somewhere that Linq would choose the most appropriate strategy for the list in the Where clause. Maybe I need to explicitly cast to a sorted list? Or maybe I need to implement a custom Binary Search algorithm w/ a comparer?
Any insights are appreciated.
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
为什么不使用联接?
Why not use a join?
这个怎么样:
?
What about this:
?
我以前遇到过这个问题,与基于 DB 的搜索相比,基于 LINQ 的搜索速度非常慢,因为它没有使用任何索引。
您是否考虑过使用 字典 而不是列表?
您可以实现一个字典,然后不使用Where,而是使用 ContainsKey 如果确实存在,请使用 索引访问器</a>。
示例代码:
使用索引进行访问比搜索列表要快得多,但代价是索引需要额外的内存。
I had this problem before, LINQ-based searching is extremely slow compared to DB-based because it's not utilizing any index.
Have you considered using a Dictionary instead of List?
You can implement a Dictionary and then instead of using Where, you can use ContainsKey and if it does exist, get the value using index accessor.
Sample code:
Access using index would be significantly faster than searching a list, on the cost of extra memory required for the index.
由于两个列表都按相同的值排序,因此您可以并行循环它们:
或者:
另一种有效的替代方案,但不利用列表的顺序,是通过放置其中一个列表来创建索引在字典中:
As both lists are sorted on the same value, you can just loop through them in parallel:
or:
Another efficient alternative, but which doesn't take advantage of the order of the lists, is to create an index by putting one of the lists in a dictionary:
我忍不住回答这个问题:-)
你的代码变慢的主要原因是你的项目将被多次读取。速度的艺术是:只在需要时才读取内存,如果需要读取,则尽可能少地读取。
下面是一个示例:
代码
示例代码:
主要目标是扫描列表并比较当前索引上的 item 和 itemdetail。当parentid等于它的parents id时。将其添加到列表中并继续下一个详细信息。如果不同,则继续寻找下一位父母。
结果
我多花了 10 倍的项目才能看到更好的结果:
添加项目:
总毫秒数:140ms
添加项目详细信息:
总毫秒数:203ms
项目列表数量:400000
项目详细列表计数:800000
配套项目:
总毫秒数:265ms
打字速度很快,而且还可以更简洁。所以我希望你能读一下。玩它。
问候,
杰罗恩.
I couldn't resist to answer this :-)
The main reason your code gets slow, is that your items will be readed many many times. The art of speed is: Read memory only if you need to and if you need to read it, read it as less as possible.
Here is an example:
code
example code:
The main goal is to scan the lists and compare the item and itemdetail on the current indexes. When the parentid is equal to it's parents id. add it to the list and go on to the next detail. If it is different go on to the next parent.
results
I took 10 times more items to see a better result:
Adding items:
Total milliseconds: 140ms
Adding itemdetails:
Total milliseconds: 203ms
ItemList Count: 400000
ItemDetailList Count: 800000
Matching items:
Total milliseconds: 265ms
This was typed fast and could be cleaner. So i hope you can read it. Play with it.
Greetz,
Jeroen.
不确定这是否真的会加快速度,但您可以将谓词移动到 FirstOrDefault() 子句中并完全删除 where 。
它实际上可能没有帮助,因为这可能只是隐式调用Where()。
但这里有一个问题,该方法实际上很慢还是仅在编译后第一次运行时才慢?查看对此的讨论发布。
Not sure if this would actually speed it up any, but you can move the predicate into the FirstOrDefault() clause and get rid of where entirely.
It might not actually help since this might implicitly just call Where().
Here's a question though, is the method actually slow or is it slow only the first time it is run after a compilation? Check out the discussion on this post.