哈希集与字典相对查找某项是否存在的搜索时间

发布于 2024-08-30 13:18:01 字数 288 浏览 2 评论 0原文

HashSet<T> t = new HashSet<T>();
// add 10 million items


Dictionary<K, V> t = new Dictionary<K, V>();
// add 10 million items.

谁的 .Contains 方法返回速度更快?

澄清一下,我的要求是我有 1000 万个对象(实际上是字符串),我需要检查它们是否存在于数据结构中。我永远不会重复。

HashSet<T> t = new HashSet<T>();
// add 10 million items


Dictionary<K, V> t = new Dictionary<K, V>();
// add 10 million items.

Whose .Contains method will return quicker?

Just to clarify, my requirement is I have 10 million objects (well, strings really) that I need to check if they exist in the data structure. I will NEVER iterate.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

旧时模样 2024-09-06 13:18:01

HashSet、List、Dictionary 性能测试,取自此处

添加 1000000 个对象(不检查重复项)

包含检查对象的一半10000 个集合

删除 10000 个集合中的一半对象

HashSet vs List vs Dictionary performance test, taken from here.

Add 1000000 objects (without checking duplicates)

Contains check for half the objects of a collection of 10000

Remove half the objects of a collection of 10000

眼角的笑意。 2024-09-06 13:18:01

我假设您的意思是第二种情况下的 DictionaryHashTable 是一个非泛型类。

您应该根据您的实际需求选择适合工作的集合。您真的想要将每个键映射到一个值吗?如果是这样,请使用Dictionary<,>。如果您关心它作为一个集合,请使用HashSet<>

我希望 HashSet.ContainsDictionary.ContainsKey (这是类似的操作,假设您明智地使用字典)基本上执行相同的操作 - 从根本上讲,他们使用相同的算法。我猜想,由于 Dictionary<,> 中的条目较大,因此使用 Dictionary<,> 比使用 HashSet破坏缓存的可能性更大;>,但我认为与仅仅根据您想要实现的目标选择错误数据类型的痛苦相比,这微不足道。

I assume you mean Dictionary<TKey, TValue> in the second case? HashTable is a non-generic class.

You should choose the right collection for the job based on your actual requirements. Do you actually want to map each key to a value? If so, use Dictionary<,>. If you only care about it as a set, use HashSet<>.

I would expect HashSet<T>.Contains and Dictionary<TKey, TValue>.ContainsKey (which are the comparable operations, assuming you're using your dictionary sensibly) to basically perform the same - they're using the same algorithm, fundamentally. I guess with the entries in Dictionary<,> being larger you end up with a greater likelihood of blowing the cache with Dictionary<,> than with HashSet<>, but I'd expect that to be insignificant compared with the pain of choosing the wrong data type simply in terms of what you're trying to achieve.

━╋う一瞬間旳綻放 2024-09-06 13:18:01

来自 字典的 MSDN 文档< TKey,TValue>:

“使用键检索值非常快,接近O(1),因为 Dictionary作为哈希表实现。”强>”

附注:

“检索速度取决于为 TKey 指定类型的哈希算法的质量”

From the MSDN documentation for Dictionary<TKey,TValue>:

"Retrieving a value by using its key is very fast, close to O(1), because the Dictionary<TKey, TValue> class is implemented as a hash table."

With a note:

"The speed of retrieval depends on the quality of the hashing algorithm of the type specified for TKey"

黎夕旧梦 2024-09-06 13:18:01

该问题接受的答案并不能有效回答该问题!它恰好给出了正确的答案,但他们提供的证据并未显示该答案。

该答案表明,在 DictionaryHashSet 上进行键查找比在 List 中查找要快得多。这是事实,但并不有趣,也不令人惊讶,也不能证明它们具有相同的速度。

我运行了下面的代码来比较查找时间,我的结论是它们实际上是相同的速度。 (或者至少,如果有任何差异,那么差异完全在该速度的标准偏差之内)

具体来说,在本次测试中,对于我来说,100,000,000 次查找花费的时间在 10 到 11.5 秒之间。

测试代码:

private const int TestReps = 100_000_000;
[Test]
public void CompareHashSetContainsVersusDictionaryContainsKey()
{
    for (int j = 0; j < 10; j++)
    {
        var rand = new Random();
        var dict = new Dictionary<int, int>();
        var hash = new HashSet<int>();

        for (int i = 0; i < TestReps; i++)
        {
            var key = rand.Next();
            var value = rand.Next();
            hash.Add(key);
            dict.TryAdd(key, value);
        }

        var testPoints = Enumerable.Repeat(1, TestReps).Select(_ => rand.Next()).ToArray();
        var timer = new Stopwatch();
        var total = 0;
        
        timer.Restart();
            for (int i = 0; i < TestReps; i++)
            {
                var newKey = testPoints[i];
                if (hash.Contains(newKey))
                {
                    total++;
                }
            }
        Console.WriteLine(timer.Elapsed);
        
        var target = total;
        Assert.That(total == target);
        

        timer.Restart();
            for (int i = 0; i < TestReps; i++)
            {
                var newKey = testPoints[i];
                if (dict.ContainsKey(newKey))
                {
                    total++;
                }
            }
        Console.WriteLine(timer.Elapsed);

        Assert.That(total == target * 2);
        Console.WriteLine("Set");
    }
}

The accepted answer to this question does NOT validly answer the question! It happens to give the correct answer, but that answer isn't shown by the evidence they provided.

What that answer shows is that Key lookups on a Dictionary or HashSet are vastly quicker than looking up in a List. Which is true, but not interesting, nor surprising, nor proof that they have the same speed.

I've run the code below to compare the lookup times, and my conclusion is that they ARE in fact the same speed. (Or at least, if there is any difference, then the difference is well within the Standard Deviation of that speed)

Specifically, 100,000,000 lookups was taking between 10 and 11.5 seconds for both, for me, in this test.

Test Code:

private const int TestReps = 100_000_000;
[Test]
public void CompareHashSetContainsVersusDictionaryContainsKey()
{
    for (int j = 0; j < 10; j++)
    {
        var rand = new Random();
        var dict = new Dictionary<int, int>();
        var hash = new HashSet<int>();

        for (int i = 0; i < TestReps; i++)
        {
            var key = rand.Next();
            var value = rand.Next();
            hash.Add(key);
            dict.TryAdd(key, value);
        }

        var testPoints = Enumerable.Repeat(1, TestReps).Select(_ => rand.Next()).ToArray();
        var timer = new Stopwatch();
        var total = 0;
        
        timer.Restart();
            for (int i = 0; i < TestReps; i++)
            {
                var newKey = testPoints[i];
                if (hash.Contains(newKey))
                {
                    total++;
                }
            }
        Console.WriteLine(timer.Elapsed);
        
        var target = total;
        Assert.That(total == target);
        

        timer.Restart();
            for (int i = 0; i < TestReps; i++)
            {
                var newKey = testPoints[i];
                if (dict.ContainsKey(newKey))
                {
                    total++;
                }
            }
        Console.WriteLine(timer.Elapsed);

        Assert.That(total == target * 2);
        Console.WriteLine("Set");
    }
}
我偏爱纯白色 2024-09-06 13:18:01

这些是不同的数据结构。此外,HashTable 也没有通用版本。

HashSet 包含类型 T 的值,其中 HashTable(或 Dictionary)包含键值对。因此,您应该根据需要存储的数据来选择集合。

These are different data structures. Also there is no generic version of HashTable.

HashSet contains values of type T which HashTable (or Dictionary) contains key-value pairs. So you should choose collection on what data you need to be stored.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文