优化查找：字典键查找与数组索引查找

发布于 2024-07-22 04:25:00 字数 1875 浏览 9 评论 0原文

我正在编写一个 7 张牌扑克手牌评估器，作为我的最爱项目之一。在尝试优化其速度时（我喜欢挑战），我惊讶地发现字典键查找的性能与数组索引查找相比相当慢。

例如，我运行了这个示例代码，它枚举了所有 52 选择 7 = 133,784,560 可能的 7 张牌：

var intDict = new Dictionary<int, int>();
var intList = new List<int>();
for (int i = 0; i < 100000; i ++)
{
    intDict.Add(i, i);  
    intList.Add(i);
}

int result;

var sw = new Stopwatch();
sw.Start();
for (int card1 = 0; card1 < 46; card1++)
  for (int card2 = card1 + 1; card2 < 47; card2++)
    for (int card3 = card2 + 1; card3 < 48; card3++)
      for (int card4 = card3 + 1; card4 < 49; card4++)
        for (int card5 = card4 + 1; card5 < 50; card5++)
          for (int card6 = card5 + 1; card6 < 51; card6++)
            for (int card7 = card6 + 1; card7 < 52; card7++)
              result = intDict[32131]; // perform C(52,7) dictionary key lookups
sw.Stop();
Console.WriteLine("time for dictionary lookups: {0} ms", sw.ElapsedMilliseconds);

sw.Reset();

sw.Start();
for (int card1 = 0; card1 < 46; card1++)
  for (int card2 = card1 + 1; card2 < 47; card2++)
    for (int card3 = card2 + 1; card3 < 48; card3++)
      for (int card4 = card3 + 1; card4 < 49; card4++)
        for (int card5 = card4 + 1; card5 < 50; card5++)
          for (int card6 = card5 + 1; card6 < 51; card6++)
            for (int card7 = card6 + 1; card7 < 52; card7++)
              result = intList[32131]; // perform C(52,7) array index lookups
sw.Stop();
Console.WriteLine("time for array index lookups: {0} ms", sw.ElapsedMilliseconds);

哪个输出：

time for dictionary lookups: 2532 ms
time for array index lookups: 313 ms

这种类型的行为是否符合预期（性能下降 8 倍）？ IIRC，字典平均有 O(1) 次查找，而数组在最坏情况下有 O(1) 次查找，所以我确实希望数组查找速度更快，但不会快这么多！

我目前将扑克手牌排名存储在字典中。我想如果这与字典查找一样快，我必须重新考虑我的方法并使用数组来代替，尽管索引排名会变得有点棘手，我可能不得不问另一个关于它的问题。

原文

I'm writing a 7 card poker hand evaluator as one of my pet projects. While trying to optimize its speed (I like the challenge), I was shocked to find that the performance of Dictionary key lookups was quite slow compared to array index lookups.

For example, I ran this sample code that enumerates over all 52 choose 7 = 133,784,560 possible 7 card hands:

var intDict = new Dictionary<int, int>();
var intList = new List<int>();
for (int i = 0; i < 100000; i ++)
{
    intDict.Add(i, i);  
    intList.Add(i);
}

int result;

var sw = new Stopwatch();
sw.Start();
for (int card1 = 0; card1 < 46; card1++)
  for (int card2 = card1 + 1; card2 < 47; card2++)
    for (int card3 = card2 + 1; card3 < 48; card3++)
      for (int card4 = card3 + 1; card4 < 49; card4++)
        for (int card5 = card4 + 1; card5 < 50; card5++)
          for (int card6 = card5 + 1; card6 < 51; card6++)
            for (int card7 = card6 + 1; card7 < 52; card7++)
              result = intDict[32131]; // perform C(52,7) dictionary key lookups
sw.Stop();
Console.WriteLine("time for dictionary lookups: {0} ms", sw.ElapsedMilliseconds);

sw.Reset();

sw.Start();
for (int card1 = 0; card1 < 46; card1++)
  for (int card2 = card1 + 1; card2 < 47; card2++)
    for (int card3 = card2 + 1; card3 < 48; card3++)
      for (int card4 = card3 + 1; card4 < 49; card4++)
        for (int card5 = card4 + 1; card5 < 50; card5++)
          for (int card6 = card5 + 1; card6 < 51; card6++)
            for (int card7 = card6 + 1; card7 < 52; card7++)
              result = intList[32131]; // perform C(52,7) array index lookups
sw.Stop();
Console.WriteLine("time for array index lookups: {0} ms", sw.ElapsedMilliseconds);

which outputs:

time for dictionary lookups: 2532 ms
time for array index lookups: 313 ms

Is this type of behavior expected (performance decrease by a factor of 8)? IIRC, a Dictionary has, on average, O(1) lookups, while an array has worst-case O(1) lookups, so I do expect the array lookups to be faster, but not by this much!

I am currently storing poker hand rankings in a Dictionary. I suppose if this is as fast as the dictionary lookups can be, I have to rethink my approach and use arrays instead, although indexing the rankings will get a little tricky and I'll probably have to ask another question about it.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

安稳善良 2024-07-29 04:25:00

不要忘记 Big-O 符号仅表示复杂性如何随大小（等）而增长 - 它没有给出所涉及的常数因素的任何指示。这就是为什么有时当键足够少时，对键的线性搜索比字典查找更快。在这种情况下，您甚至没有对数组进行搜索 - 只是直接索引操作。

的情况

pointer_into_array = base_pointer + offset * size

对于直接索引查找，数组基本上是理想的 - 这只是（然后是指针取消引用）

。执行字典查找相对复杂 - 当有很多键时，与（例如）按键进行线性查找相比非常快，但比直接数组查找复杂得多。它必须计算密钥的哈希值，然后计算出应该位于哪个存储桶中，可能处理重复的哈希值（或重复的存储桶），然后检查是否相等。

与往常一样，为作业选择正确的数据结构 - 如果您真的可以只索引到数组（或 List），那么是的，这将快得令人眼花缭乱。

Don't forget that Big-O notations only says how the complexity grows with respect to the size (etc) - it doesn't give any indication of the constant factors involved. That's why sometimes even a linear search for keys is faster than a dictionary lookup, when there are sufficiently few keys. In this case you're not even doing a search with the array though - just a straight indexing operation.

For straight index lookups, arrays are basically ideal - it's just a case of

pointer_into_array = base_pointer + offset * size

(And then a pointer dereference.)

Performing a dictionary lookup is relatively complicated - very fast compared with (say) a linear lookup by key when there are lots of keys, but much more complicated than a straight array lookup. It has to calculate the hash of the key, then work out which bucket that should be in, possibly deal with duplicate hashes (or duplicate buckets) and then check for equality.

As always, choose the right data structure for the job - and if you really can get away with just indexing into an array (or List<T>) then yes, that will be blindingly fast.

回复收藏 0 原文