为什么列表＆lt; t＆gt; .contains（）太慢了？

发布于 2025-02-09 13:26:43 字数 663 浏览 1 评论 0原文

我有一个对象列表，我想通过删除某些具有ids的对象来过滤列表，即单独的列表中，即

List<Vehicle> vehicles;  //All Vehicles List (260 vehicles data)
List<int> vehiclesIds;  //Ids List to remove from Original List (Length < 260)

现在使用removeall（） >它采用Asound 25到30秒。

vehicles.RemoveAll(e => vehiclesIds.Contains(e.Id));

但是，如果我将用于循环，则仅需8至20毫秒。

foreach (var vehiclesId in vehiclesIds)
{
    vehicles.RemoveAll(e=> vehiclesId == e.Id);
}

我多次运行此代码，并在使用stopwatch list.contains（）是进行缓慢处理的原因。

原文

I have a list of objects and I want to filter the list by removing some objects that have ids present in a separate list i.e.

List<Vehicle> vehicles;  //All Vehicles List (260 vehicles data)
List<int> vehiclesIds;  //Ids List to remove from Original List (Length < 260)

Now for removing if i'm using RemoveAll() it takes areound 25 to 30 seconds.

vehicles.RemoveAll(e => vehiclesIds.Contains(e.Id));

But if I'm using for loop as the following it takes only 8 to 20 milliseconds.

foreach (var vehiclesId in vehiclesIds)
{
    vehicles.RemoveAll(e=> vehiclesId == e.Id);
}

I multiple times run this code and analyse this after using StopWatch that List.Contains() is the reason for slow processing.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

℉絮湮 2025-02-16 13:26:44

这两个版本之间的逻辑不同。

第一个版本首先循环车辆，寻找ID中的比赛。 包含使用默认比较，这可能是一种缓慢的比较方式，因为它使用了接口。

您可以使用hashset＆lt;查找匹配ID来改进此版本。这应该更快（即使它也使用包含），因为它可以内部使用HashTable。

List<Vehicle> vehicles;  //All Vehicles List (260 vehicles data)
HashSet<int> vehiclesIds;  //Ids List to remove from Original List (Length < 260)

vehicles.RemoveAll(e => vehiclesIds.Contains(e.Id));

第二个版本首先循环较小的ID列表。您可以通过使用字典为车辆来改进此版本，

Dictionary<int, Vehicle> vehicles;  //All Vehicles List (260 vehicles data)
List<int> vehiclesIds;  //Ids List to remove from Original List (Length < 260)

foreach (var vehiclesId in vehiclesIds)
{
    vehicles.Remove(vehiclesId);
}

我想这是最后一个版本是最快的。

The logic between these two versions is different.

The first version loops the vehicles first, looking for matches in the IDs. Contains uses the default comparer, which might be a slow way of comparing because it uses interfaces.

You can improve this version by using a HashSet<int> to find matching IDs. This should be faster (even though it also uses Contains) as it uses a hashtable internally.

List<Vehicle> vehicles;  //All Vehicles List (260 vehicles data)
HashSet<int> vehiclesIds;  //Ids List to remove from Original List (Length < 260)

vehicles.RemoveAll(e => vehiclesIds.Contains(e.Id));

The second version instead loops the smaller list of IDs first. You can improve this version by using a Dictionary for the vehicles

Dictionary<int, Vehicle> vehicles;  //All Vehicles List (260 vehicles data)
List<int> vehiclesIds;  //Ids List to remove from Original List (Length < 260)

foreach (var vehiclesId in vehiclesIds)
{
    vehicles.Remove(vehiclesId);
}

i would imagine this last version to be the fastest.

回复收藏 0 原文

思念绕指尖 2025-02-16 13:26:44

TLDR：您问题的答案“为什么list.contains（）太慢？”是：不是。您没有向我们展示的其他地方一定有问题。

更长的答案：

第一个版本应该更快，因为它仅在列表中删除一次而不是多次删除元素。（事实并不能更快地指向其他地方的某些问题。）

但是，第一个版本确实调用了eartilesIds.contains（e.id）（e.id） 每辆车一次，这是O（n）手术。

您可以通过从车辆ID中创建HashSet＆gt;来改进它，然后将其用于查找。

我尝试了三种方法（您的两种方法hashset One）：

using System;
using System.Collections.Generic;
using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

namespace Demo;

static class Program
{
    public static void Main()
    {
        var summary = BenchmarkRunner.Run<UnderTest>();
    }
}

sealed class Vehicle
{
    public int Id { get; set; }
}

public class UnderTest
{
    public UnderTest()
    {
        var rng = new Random(34655);

        for (int i = 0; i < NUM_IDS; ++i)
            _vehicleIds.Add(rng.Next(0, MAX_ID));

        _idsHash = _vehicleIds.ToHashSet();
    }

    [IterationSetup]
    public void IterationSetup()
    {
        Console.WriteLine("IterationSetup");
        createVehicleList();
    }

    [Benchmark]
    public void RemoveItemsUsingRemoveAll()
    {
        _vehicles.RemoveAll(e => _vehicleIds.Contains(e.Id));
    }

    [Benchmark]
    public void RemoveItemsUsingLoop()
    {
        foreach (int vehiclesId in _vehicleIds)
        {
            _vehicles.RemoveAll(e => vehiclesId == e.Id);
        }
    }

    [Benchmark]
    public void RemoveItemsUsingHashSet()
    {
        _vehicles.RemoveAll(e => _idsHash.Contains(e.Id));
    }

    void createVehicleList()
    {
        _vehicles.Clear();
        var rng = new Random(78712);

        for (int i = 0; i < NUM_VEHICLES; ++i)
        {
            _vehicles.Add(new Vehicle {Id = rng.Next(MAX_ID)});
        }
    }

    const int NUM_VEHICLES = 200_000;
    const int MAX_ID       = 800_000;
    const int NUM_IDS      =   1_000;

    readonly List<int>     _vehicleIds = new ();
    readonly List<Vehicle> _vehicles   = new ();
    readonly HashSet<int>  _idsHash;
}

我在PC上获得的结果如下：

|                     Method |       Mean |      Error |     StdDev |     Median |
|--------------------------- |-----------:|-----------:|-----------:|-----------:|
|  RemoveItemsUsingRemoveAll |  68.360 ms |  1.4030 ms |  4.0926 ms |  67.828 ms |
|       RemoveItemsUsingLoop | 625.148 ms | 19.4181 ms | 56.3354 ms | 607.409 ms |
|    RemoveItemsUsingHashSet |   5.293 ms |  0.2643 ms |  0.7626 ms |   5.323 ms |

如您所见，使用第一个版本明显比第二个版本要快得多。版本（我预期），但是使用HashSet甚至更快。

我们必须得出的结论是，您看到的放缓是由于您没有向我们展示的代码。

TLDR: The answer to your question "Why List.Contains() is too slow?" is: It isn't. There must be something wrong elsewhere that you haven't shown us.

Longer answer:

The first version should be faster because it's only removing elements from the list once rather than multiple times. (The fact that it isn't faster points to some problem elsewhere.)

However, the first version does call vehiclesIds.Contains(e.Id) once for each vehicle, which is an O(N) operation.

You could improve that by creating a HashSet<int> from the vehicle IDs and then use that for the lookup.

I tried the three approaches (your two plus the HashSet one) in a benchmark:

using System;
using System.Collections.Generic;
using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

namespace Demo;

static class Program
{
    public static void Main()
    {
        var summary = BenchmarkRunner.Run<UnderTest>();
    }
}

sealed class Vehicle
{
    public int Id { get; set; }
}

public class UnderTest
{
    public UnderTest()
    {
        var rng = new Random(34655);

        for (int i = 0; i < NUM_IDS; ++i)
            _vehicleIds.Add(rng.Next(0, MAX_ID));

        _idsHash = _vehicleIds.ToHashSet();
    }

    [IterationSetup]
    public void IterationSetup()
    {
        Console.WriteLine("IterationSetup");
        createVehicleList();
    }

    [Benchmark]
    public void RemoveItemsUsingRemoveAll()
    {
        _vehicles.RemoveAll(e => _vehicleIds.Contains(e.Id));
    }

    [Benchmark]
    public void RemoveItemsUsingLoop()
    {
        foreach (int vehiclesId in _vehicleIds)
        {
            _vehicles.RemoveAll(e => vehiclesId == e.Id);
        }
    }

    [Benchmark]
    public void RemoveItemsUsingHashSet()
    {
        _vehicles.RemoveAll(e => _idsHash.Contains(e.Id));
    }

    void createVehicleList()
    {
        _vehicles.Clear();
        var rng = new Random(78712);

        for (int i = 0; i < NUM_VEHICLES; ++i)
        {
            _vehicles.Add(new Vehicle {Id = rng.Next(MAX_ID)});
        }
    }

    const int NUM_VEHICLES = 200_000;
    const int MAX_ID       = 800_000;
    const int NUM_IDS      =   1_000;

    readonly List<int>     _vehicleIds = new ();
    readonly List<Vehicle> _vehicles   = new ();
    readonly HashSet<int>  _idsHash;
}

The results I obtained on my PC were as follows:

|                     Method |       Mean |      Error |     StdDev |     Median |
|--------------------------- |-----------:|-----------:|-----------:|-----------:|
|  RemoveItemsUsingRemoveAll |  68.360 ms |  1.4030 ms |  4.0926 ms |  67.828 ms |
|       RemoveItemsUsingLoop | 625.148 ms | 19.4181 ms | 56.3354 ms | 607.409 ms |
|    RemoveItemsUsingHashSet |   5.293 ms |  0.2643 ms |  0.7626 ms |   5.323 ms |

As you can see, using your first version is significantly faster than your second version (as I expected), but using a HashSet is even faster.

The conclusion that we must draw is that the slowdown you're seeing is due to code that you haven't shown us.

回复收藏 0 原文

~没有更多了~