LINQ 到 SortedList
我是一个完整的 LINQ 新手,所以我不知道我的 LINQ 是否不适合我需要做的事情,或者我对性能的期望是否太高。
我有一个对象的 SortedList,以 int 为键; SortedList 与 SortedDictionary 相反,因为我将使用预先排序的数据填充集合。我的任务是找到确切的密钥,或者如果没有确切的密钥,则找到具有下一个更高值的密钥。如果搜索对于列表来说太高(例如最高键是 100,但搜索 105),则返回 null。
// The structure of this class is unimportant. Just using
// it as an illustration.
public class CX
{
public int KEY;
public DateTime DT;
}
static CX getItem(int i, SortedList<int, CX> list)
{
var items =
(from kv in list
where kv.Key >= i
select kv.Key);
if (items.Any())
{
return list[items.Min()];
}
return null;
}
给定一个包含 50,000 条记录的列表,调用 getItem 500 次大约需要一秒半的时间。调用 50,000 次需要 2 分钟以上。这个性能看起来很差。我的 LINQ 不好吗?我是不是期待太多了?我应该推出自己的二分搜索功能吗?
I'm a complete LINQ newbie, so I don't know if my LINQ is incorrect for what I need to do or if my expectations of performance are too high.
I've got a SortedList of objects, keyed by int; SortedList as opposed to SortedDictionary because I'll be populating the collection with pre-sorted data. My task is to find either the exact key or, if there is no exact key, the one with the next higher value. If the search is too high for the list (e.g. highest key is 100, but search for 105), return null.
// The structure of this class is unimportant. Just using
// it as an illustration.
public class CX
{
public int KEY;
public DateTime DT;
}
static CX getItem(int i, SortedList<int, CX> list)
{
var items =
(from kv in list
where kv.Key >= i
select kv.Key);
if (items.Any())
{
return list[items.Min()];
}
return null;
}
Given a list of 50,000 records, calling getItem 500 times takes about a second and a half. Calling it 50,000 times takes over 2 minutes. This performance seems very poor. Is my LINQ bad? Am I expecting too much? Should I be rolling my own binary search function?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
首先,您的查询将被评估两次(一次针对
Any
,一次针对Min
)。其次,Min
要求它迭代整个列表,即使它已排序的事实意味着第一项将是最小值。您应该能够更改此设置:改为:
UPDATE
因为您正在选择值类型,所以
FirstOrDefault
不会返回可为 null 的对象。我已更改您的查询,将所选值转换为int?
,从而允许检查结果值是否为null
。我建议使用ContainsKey
,因为如果您的列表包含0
的值,则会返回true
。例如,假设您有以下值0 2 4 6 8
如果您传入小于或等于 8 的任何值,那么您将获得正确的值。但是,如果您传入 9,您将得到 0 (
default(int)
),它在列表中,但不在 有效的结果。First, your query is being evaluated twice (once for
Any
, and once forMin
). Second,Min
requires that it iterate over the entire list, even though the fact that it's sorted means that the first item will be the minimum. You should be able to change this:To this:
UPDATE
Because you're selecting a value type,
FirstOrDefault
doesn't return a nullable object. I have altered your query to cast the selected value to anint?
instead, allowing the resulting value to be checked fornull
. I would advocate this over usingContainsKey
, as that would returntrue
if your list contained a value for0
. For example, say you have the following values0 2 4 6 8
If you were to pass in anything less than or equal to 8, then you would get the correct value. However, if you were to pass in 9, you would get 0 (
default(int)
), which is in the list but isn't a valid result.自己编写二分搜索可能很困难。
幸运的是,微软已经编写了一个非常强大的:
Array.BinarySearch
。 这实际上是SortedList.IndexOfKey
内部使用的方法。唯一的问题是,它需要一个T[]
参数,而不是任何IList
(如SortedList.Keys
)。但你知道吗?有一个名为 Reflector 的出色工具,可让您查看 .NET 源代码。 代码
看看:
IList
上的通用BinarySearch
扩展方法,直接取自 Microsoft 的Array.BinarySearch
的反射 > 实施。这将让您调用 list.Keys.BinarySearch 并获取所需索引的负位按位补码,以防找不到所需的键(以下内容基本上直接取自 tzaman 的答案):
Writing a binary search on your own can be tough.
Fortunately, Microsoft already wrote a pretty robust one:
Array.BinarySearch<T>
. This is, in fact, the method thatSortedList<TKey, TValue>.IndexOfKey
uses internally. Only problem is, it takes aT[]
argument, instead of anyIList<T>
(likeSortedList<TKey, TValue>.Keys
).You know what, though? There's this great tool called Reflector that lets you look at .NET source code...
Check it out: a generic
BinarySearch
extension method onIList<T>
, taken straight from the reflected code of Microsoft'sArray.BinarySearch<T>
implementation.This will let you call
list.Keys.BinarySearch
and get the negative bitwise complement of the index you want in case the desired key isn't found (the below is taken basically straight from tzaman's answer):在
SortedList
上使用 LINQ 不会给您带来排序的好处。为了获得最佳性能,您应该编写自己的二分搜索。
Using LINQ on a
SortedList
will not give you the benefit of the sort.For optimal performance, you should write your own binary search.
好的,只是为了让这一点更加可见 - 这是 Adam Robinson 答案的更简洁版本:
FirstOrDefault
函数有一个接受谓词的重载,它选择满足条件的第一个元素 - 您可以使用它可以直接获取所需的元素,如果不存在则返回 null。OK, just to give this a little more visibility - here's a more concise version of Adam Robinson's answer:
The
FirstOrDefault
function has an overload that accepts a predicate, which selects the first element satisfying a condition - you can use that to directly get the element you want, ornull
if it doesn't exist.为什么不使用
List
类中内置的BinarySearch
呢?如果搜索目标不在列表中,则
BinarySearch
返回下一个较高项的按位补码;如果结果为负,我们可以使用它来重新补充结果,从而直接得到您想要的结果。如果它等于Count
,则您的搜索键比列表中的任何内容都大。这应该比执行 LINQwhere
快得多,因为它已经排序了...正如评论所指出的,
ToList
调用将强制对整个列表进行评估,因此只有当您在不更改底层SortedList
的情况下进行多次搜索时,这才有用,并且您单独保留keys
列表。Why not use the
BinarySearch
that's built into theList
class?If the search target isn't in the list,
BinarySearch
returns the bit-wise complement of the next-higher item; we can use that to directly get you what you want by re-complementing the result if it's negative. If it becomes equal to theCount
, your search key was bigger than anything in the list.This should be much faster than doing a LINQwhere
, since it's already sorted...As comments have pointed out, the
ToList
call will force an evaluation of the whole list, so this is only beneficial if you do multiple searches without altering the underlyingSortedList
, and you keep thekeys
list around separately.在 PowerCollections 中使用 OrderedDictionary,您可以获得一个枚举器,该枚举器从您要查找的关键位置开始...如果它不存在,您将获得下一个最近的节点,然后可以从 O(log N 中的节点向前/向后导航) 每次导航呼叫的时间。
这样做的优点是您不必编写自己的搜索,甚至不必在 SortedList 之上管理自己的搜索。
Using OrderedDictionary in PowerCollections you can get an enumerator that starts where they key you are looking for should be... if it's not there, you'll get the next closest node and can then navigate forwards/backwards from that in O(log N) time per nav call.
This has the advantage of you not having to write your own search or even manage your own searches on top of a SortedList.