IEnumerable上的扩展方法：性能如何？

发布于 2024-11-06 04:49:37 字数 4086 浏览 0 评论 0原文

我的导师说：与 IEnumerable 的扩展方法相比，更喜欢本机方法（直接在集合上实现），因为：

LINQ-to-Objects 扩展方法在 IEnumerable 上实现，意味着在最坏的情况下场景（当您搜索的项目集合中不存在）你必须枚举所有元素。如果您有包含或存在直接实现的方法集合，它可以利用内部知识，也许只是做一个哈希表查找或其他快速查找操作。

我非常困惑，因为我认为微软应该已经为 IEnumerable Contains/Exists 实现了哈希表。 List 和 IEnumerable 的快速基准测试显示没有差异：

static void Main(string[] args)
{
    Console.Write("input the number of elements: ");
    int count = Convert.ToInt32(Console.ReadLine());
    Console.Write("input the number of loops: ");
    int loop = Convert.ToInt32(Console.ReadLine());

    Random r = new Random();

    Stopwatch sw = new Stopwatch();
    for (int i = 0; i < loop; i++)
    {
        var list = CreateListOfInt(count);
        sw.Start();
        for (int j = 0; j < count; j++)
        {
            DoContains(list, r.Next());
        }
        sw.Stop();
    }

    Console.WriteLine("List<T> native method: Iterated {0} times on {1} elements, elapsed :{2}",loop,count,sw.Elapsed);

    sw.Reset();
    for (int i = 0; i < loop; i++)
    {
        var list = CreateListOfInt(count);
        sw.Start();
        for (int j = 0; j < count; j++)
        {
            DoContainsEnumerable(list, r.Next());
        }
        sw.Stop();
    }

    Console.WriteLine("IEnumerable<T> extension method: Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);

    sw.Reset();
    for (int i = 0; i < loop; i++)
    {
        var list = CreateListOfInt2(count);
        sw.Start();
        for (int j = 0; j < count; j++)
        {
            //make sure that the element is not in the list
            DoContains(list, r.Next(20000, 50000));
        }
        sw.Stop();
    }
    Console.WriteLine("List<T> native method: element does not exist:Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);

    sw.Reset();
    for (int i = 0; i < loop; i++)
    {
        var list = CreateListOfInt2(count);
        sw.Start();
        for (int j = 0; j < count; j++)
        {
            //make sure that the element is not in the list
            DoContainsEnumerable(list, r.Next(20000, 50000));
        }
        sw.Stop();
    }
    Console.WriteLine("IEnumerable<T> extension method: element does not exist: Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);


    Console.ReadKey();
}

static List<int> CreateListOfInt(int count)
{
    Random r = new Random(1000);
    List<int> numbers = new List<int>(count);
    for (int i = 0; i < count; i++)
    {
        numbers.Add(r.Next());
    }
    return numbers;
}

static bool DoContains(List<int> list, int number)
{
    return list.Contains(number);
}

static bool DoContainsEnumerable(IEnumerable<int> list, int number)
{
    return list.Contains(number);
}


//define the scope of randomly created number, to make sure that lookup number will not in the List
static List<int> CreateListOfInt2(int count)
{
    Random r = new Random(1000);
    List<int> numbers = new List<int>(count);
    for (int i = 0; i < count; i++)
    {
        numbers.Add(r.Next(0,10000));
    }
    return numbers;
}

}

编辑：我尝试了 HashSet 实现，这大大提高了性能：

  sw.Reset();
            for (int i = 0; i < loop; i++)
            {
                var list = CreateListOfInt2(count);
                HashSet<int> hashtable = new HashSet<int>(list);
                sw.Start();
                for (int j = 0; j < count; j++)
                {
                    //make sure that the element is not in the list
                    hashtable.Contains(r.Next(20000, 50000));
                }
                sw.Stop();
            }
            Console.WriteLine("IEnumerable<T> extension method: element does not exist: Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);

不过，您对我的导师所说的话有何看法？

有人可以帮我清理一下吗？我的导师说得对吗？如果他是对的，我的代码有什么问题？

非常感谢

原文

From my mentor: Prefer native methods (implemented directly on the collection) over extension methods of IEnumerable, because:

The LINQ-to-Objects extension methods
are implemented on IEnumerable,
meaning that in the worst-case
scenario (when the item you search for
does not exist in the collection) you
will have to enumerate thru all
elements. If you have a Contains or
Exists method implemented directly on
the collection, it could make use of
internal knowledge and maybe just do a
hash table look up or some other quick
operation.

I was a deeply confused, because I think Microsoft should have implemented hash table for IEnumerable Contains/Exists already. A quick benchmark with List and IEnumerable show no differences:

static void Main(string[] args)
{
    Console.Write("input the number of elements: ");
    int count = Convert.ToInt32(Console.ReadLine());
    Console.Write("input the number of loops: ");
    int loop = Convert.ToInt32(Console.ReadLine());

    Random r = new Random();

    Stopwatch sw = new Stopwatch();
    for (int i = 0; i < loop; i++)
    {
        var list = CreateListOfInt(count);
        sw.Start();
        for (int j = 0; j < count; j++)
        {
            DoContains(list, r.Next());
        }
        sw.Stop();
    }

    Console.WriteLine("List<T> native method: Iterated {0} times on {1} elements, elapsed :{2}",loop,count,sw.Elapsed);

    sw.Reset();
    for (int i = 0; i < loop; i++)
    {
        var list = CreateListOfInt(count);
        sw.Start();
        for (int j = 0; j < count; j++)
        {
            DoContainsEnumerable(list, r.Next());
        }
        sw.Stop();
    }

    Console.WriteLine("IEnumerable<T> extension method: Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);

    sw.Reset();
    for (int i = 0; i < loop; i++)
    {
        var list = CreateListOfInt2(count);
        sw.Start();
        for (int j = 0; j < count; j++)
        {
            //make sure that the element is not in the list
            DoContains(list, r.Next(20000, 50000));
        }
        sw.Stop();
    }
    Console.WriteLine("List<T> native method: element does not exist:Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);

    sw.Reset();
    for (int i = 0; i < loop; i++)
    {
        var list = CreateListOfInt2(count);
        sw.Start();
        for (int j = 0; j < count; j++)
        {
            //make sure that the element is not in the list
            DoContainsEnumerable(list, r.Next(20000, 50000));
        }
        sw.Stop();
    }
    Console.WriteLine("IEnumerable<T> extension method: element does not exist: Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);


    Console.ReadKey();
}

static List<int> CreateListOfInt(int count)
{
    Random r = new Random(1000);
    List<int> numbers = new List<int>(count);
    for (int i = 0; i < count; i++)
    {
        numbers.Add(r.Next());
    }
    return numbers;
}

static bool DoContains(List<int> list, int number)
{
    return list.Contains(number);
}

static bool DoContainsEnumerable(IEnumerable<int> list, int number)
{
    return list.Contains(number);
}


//define the scope of randomly created number, to make sure that lookup number will not in the List
static List<int> CreateListOfInt2(int count)
{
    Random r = new Random(1000);
    List<int> numbers = new List<int>(count);
    for (int i = 0; i < count; i++)
    {
        numbers.Add(r.Next(0,10000));
    }
    return numbers;
}

}

Edit: I tried HashSet implementation, which greatly increases performance:

  sw.Reset();
            for (int i = 0; i < loop; i++)
            {
                var list = CreateListOfInt2(count);
                HashSet<int> hashtable = new HashSet<int>(list);
                sw.Start();
                for (int j = 0; j < count; j++)
                {
                    //make sure that the element is not in the list
                    hashtable.Contains(r.Next(20000, 50000));
                }
                sw.Stop();
            }
            Console.WriteLine("IEnumerable<T> extension method: element does not exist: Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);

Still, what is your opinion about my mentor saying?

Can anyone clear out for me? Is my mentor right? If he's right, what is wrong with my code?

Thank you very much

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

如日中天 2024-11-13 04:49:37

List Contains 调用只是迭代列表，因此它们不会比扩展方法更快。如果您使用 HashSet 并尝试一系列 Contains() 操作，您会发现显着的改进。

编辑：微软没有对 IEnumerable 扩展方法使用哈希的原因是他们无法保证实现类使用哈希或类似的东西。他们不得不采用简单的方法，因为 IEnumerable 接口仅保证枚举实现类。

回复收藏 0 原文

可爱暴击 2024-11-13 04:49:37

如果 LINQ 版本在对象上具有更快的本机实现，则将使用该更快的实现。

例如，Count 的实现如下：

if (source is Array)
    return source.Length;
if (source is ICollection)
    return source.Count;
// else iterate through all the items and count them.

Contains 如下：

if (source is ICollection)
    return source.Contains(item);
// else iterate through the enumerable, and see if item exists

由于 HashSet 实现 ICollection code> 使用本机 Contains。

因此，LINQ已经针对标准接口进行了优化。但是，如果您的自定义类型具有不属于默认接口的本机调用，则 LINQ 调用可能会更慢。

If the LINQ version has a quicker native implementation on the object then that quicker implementation is used instead.

For example, Count is implemented like so:

if (source is Array)
    return source.Length;
if (source is ICollection)
    return source.Count;
// else iterate through all the items and count them.

Contains like so:

if (source is ICollection)
    return source.Contains(item);
// else iterate through the enumerable, and see if item exists

Since a HashSet<T> implements ICollection<T> the native Contains is used.

So, LINQ has been optimized for the standard interfaces. However, if you have a custom type that has a native call that isn't part of the default interface then the LINQ call may be slower.

回复收藏 0 原文

~没有更多了~