LINQ to Objects 并通过索引改进性能?

发布于 2024-12-08 09:49:06 字数 843 浏览 0 评论 0 原文

我正在使用 LINQ to Objects,想知道是否可以通过使用我拥有的索引来提高查询性能。最好用一个例子来解释这一点。想象一个简单的类型...

public class Person
{
    public int Age;
    public string FirstName;
    public string LastName;
}

以及我将针对它进行的一个简单查询...

List<Person> people = new List<Person>();

// 'people' populated with 50,000 instances...

var x = from t in people
        where t.Age > 18 && t.Age < 21
        select t;

如果我正确理解 LINQ to Objects,那么Where 扩展方法的实现将枚举 people 集合中的所有 50,000 个实例,以便找到该类型的 100 个实例。实际上匹配。碰巧我已经有了一个按年龄排序的人员集合的索引。像这样...

SortedList<int, Person> ageSorted = new SortedList<int, Person>();

显然,如果我可以获得在何处使用 SortedList,这样它就不再需要枚举所有 50,000 个实例,而是查找 100 个匹配条目的范围,从而节省时间,这是有意义的。

是否可以扩展 LINQ to Objects 来实现我的情况?这已经可能了,但我缺少这项技术吗?

I am using LINQ to Objects and wonder if it is possible to improve the performance of my queries by making use of an index that I have. This is best explained with an example. Imagine a simple type...

public class Person
{
    public int Age;
    public string FirstName;
    public string LastName;
}

And a simple query I would make against it...

List<Person> people = new List<Person>();

// 'people' populated with 50,000 instances...

var x = from t in people
        where t.Age > 18 && t.Age < 21
        select t;

If I understand LINQ to Objects correctly then the implementation of the Where extension method will enumerate all 50,000 instances in the people collection in order to find the 100 that actually match. As it happens I already have an index of the people collection that is sorted by Age. Like this...

SortedList<int, Person> ageSorted = new SortedList<int, Person>();

Clearly it would make sense if I could get the Where to use the SortedList so that it no longer has to enumerate all 50,000 instances, instead finding the range of 100 matching entries and so saving time.

Is it possible to extend LINQ to Objects to enable my situation? Is it already possible but I am missing the technique?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

云胡 2024-12-15 09:49:06

我相信已经有一个项目可以做到这一点 - i4o。我不能说我自己使用过它,但这听起来像是您想要的东西......您可能需要稍微调整一下现有的代码,但它确实值得一看。

如果这没有帮助,您至少可以在 SortedList 上编写自己的扩展方法。您可能无法轻松使用实际的where子句,但您可以使用自己的方法获取最小值和最大值。您可能希望将它们应用于IList,您断言您已经对值进行了适当的排序(根据某些比较器)。

例如(完全未经测试):(

public static IEnumerable<T> Between<T, TKey>(this IList<T> source,
                                              Func<T, TKey> projection,
                                              TKey minKeyInclusive,
                                              TKey maxKeyExclusive,
                                              IComparer<TKey> comparer)
{
    comparer = comparer ?? Comparer<TKey>.Default;

    // TODO: Find the index of the lower bound via a binary search :)
    // (It's too late for me to jot it down tonight :)
    int index = ...; // Find minimum index

    while (index < source.Count &&
           comparer.Compare(projection(source[index]), maxKeyExclusive) < 0)
    {
        yield return source[index];
        index++;
    }
}

如果您只有 List 而不是 IList,您可以使用 List.BinarySearch,尽管您需要构建自定义 IComparer。)

另外,请查看 SortedSet

There's already a project which I believe does exactly that - i4o. I can't say I've used it myself, but it sounds like the kind of thing you want... you may need to juggle your existing code a bit, but it's certainly worth looking at.

If that doesn't help, you could at least write your own extension methods on SortedList<TKey, TValue>. You probably wouldn't be able to easily use your actual where clause, but you could use your own methods taking a minimum and a maximum value. You might also want to make them apply to IList<T> where you assert that you've already sorted the values appropriately (according to some comparer).

For example (completely untested):

public static IEnumerable<T> Between<T, TKey>(this IList<T> source,
                                              Func<T, TKey> projection,
                                              TKey minKeyInclusive,
                                              TKey maxKeyExclusive,
                                              IComparer<TKey> comparer)
{
    comparer = comparer ?? Comparer<TKey>.Default;

    // TODO: Find the index of the lower bound via a binary search :)
    // (It's too late for me to jot it down tonight :)
    int index = ...; // Find minimum index

    while (index < source.Count &&
           comparer.Compare(projection(source[index]), maxKeyExclusive) < 0)
    {
        yield return source[index];
        index++;
    }
}

(If you only have List<T> instead of IList<T>, you could use List<T>.BinarySearch, although you'd need to build a custom IComparer<T>.)

Also, have a look at SortedSet<T> in .NET 4.

万水千山粽是情ミ 2024-12-15 09:49:06

你是对的,你编写的查询将枚举整个列表,因为显然 LINQ 无法假设有关你的数据的任何信息。

如果您有一个 SortedList,您可以使用 SkipWhile/TakeWhile linq 方法来利用它:

 var x = x.SkipWhile(kv => kv.Key <= 18).TakeWhile(kv => kv.Key < 21)

编辑

@Davy8 当然是正确的,最坏的情况下它仍然具有相同的性能。请参阅其他答案,了解更快速找到第一个值的方法。

如果您需要针对不同的年龄范围多次执行此操作,那么您也可以通过按年龄分组来加快速度:

var byAge = people.GroupBy(p => p.Age);

var x = from grp in byAge 
        where grp.Key > 18 && grp.Key < 21
        from person in grp
        select person;

You're right that the query you wrote will enumerate the whole list as obviously LINQ can't assume anything about your data.

If you have a SortedList, you can exploit that using the SkipWhile/TakeWhile linq methods:

 var x = x.SkipWhile(kv => kv.Key <= 18).TakeWhile(kv => kv.Key < 21)

EDIT

@Davy8 is right of course that worst case this still has the same performance. See the other answers for a way to more quickly find the first value.

If you need to do this operation more than once for different age ranges then you can probably also speed it up by grouping on age:

var byAge = people.GroupBy(p => p.Age);

var x = from grp in byAge 
        where grp.Key > 18 && grp.Key < 21
        from person in grp
        select person;
纵山崖 2024-12-15 09:49:06

LINQ 查询语法实际上使用与签名匹配的任何名为 Where 的扩展方法,因此您始终可以编写自己的扩展方法来按照您想要的方式处理特定类型。

    public static IEnumerable<Person> Where(this IEnumerable<Person> collection, Func<Person, bool> condition )
    {
        Console.WriteLine("My Custom 'Where' method called");
        return System.Linq.Enumerable.Where(collection, condition);
    }

...

        var x = from t in people
                where t.Age > 18 && t.Age < 21
                select t; //Will print "My Custom 'Where' method called"

然后你可以应用任何你想要的逻辑。我相信正常的方法重载规则适用于确定将调用哪个 Where 扩展方法。

The LINQ query syntax actually uses any extension method named Where that matches the signature, so you can always write your own that handles your specific type the way you want.

    public static IEnumerable<Person> Where(this IEnumerable<Person> collection, Func<Person, bool> condition )
    {
        Console.WriteLine("My Custom 'Where' method called");
        return System.Linq.Enumerable.Where(collection, condition);
    }

...

        var x = from t in people
                where t.Age > 18 && t.Age < 21
                select t; //Will print "My Custom 'Where' method called"

Then you can apply any logic you want. I believe the normal method overload rules apply for determining which Where extension method would be called.

心在旅行 2024-12-15 09:49:06

在预排序容器中,通过快速找到第一个元素来实现效率。找到第一个元素后,只需线性检索以下元素,直到找到范围的末尾。

假设您的 SortedListPerson.Age 排序,您可以使用 SortedList.IndexOfKey 找到该范围的第一个元素,它是一个 二分搜索算法;因此,该方法是一个 O(log n) 操作。

我认为您无法更改代码,因此 Enumerable.Where 突然变得更加智能,并通过使用二分搜索找到范围开始。

-- - 编辑 ---

实际上,您真正需要的是 List.BinarySearchArray.BinarySearch。如果不存在精确匹配,SortedList.IndexOfKey 不会让您获取最接近匹配的索引。或者您可以自己实现二分搜索。

In a pre-sorted container, the efficiency is achieved by finding the first element quickly. Once you find the first element, just linearly retrieve the following elements until you find the end of your range.

Assuming your SortedList is sorted by Person.Age, you can find the first element of the range using SortedList.IndexOfKey, which is a binary search algorithm; therefore, this method is an O(log n) operation.

(I don't think you can change your code so the Enumerable.Where suddenly becomes more intelligent and finds the range start by using binary search.)

--- EDIT ---

Actually, what you really need is List.BinarySearch or Array.BinarySearch. The SortedList.IndexOfKey won't let you get the index of the closest match in case exact match does not exist. Or you can just implement the binary search yourself.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文