优化查找:字典键查找与数组索引查找

发布于 2024-07-22 04:25:00 字数 1875 浏览 4 评论 0原文

我正在编写一个 7 张牌扑克手牌评估器,作为我的最爱项目之一。 在尝试优化其速度时(我喜欢挑战),我惊讶地发现字典键查找的性能与数组索引查找相比相当慢。

例如,我运行了这个示例代码,它枚举了所有 52 选择 7 = 133,784,560 可能的 7 张牌:

var intDict = new Dictionary<int, int>();
var intList = new List<int>();
for (int i = 0; i < 100000; i ++)
{
    intDict.Add(i, i);  
    intList.Add(i);
}

int result;

var sw = new Stopwatch();
sw.Start();
for (int card1 = 0; card1 < 46; card1++)
  for (int card2 = card1 + 1; card2 < 47; card2++)
    for (int card3 = card2 + 1; card3 < 48; card3++)
      for (int card4 = card3 + 1; card4 < 49; card4++)
        for (int card5 = card4 + 1; card5 < 50; card5++)
          for (int card6 = card5 + 1; card6 < 51; card6++)
            for (int card7 = card6 + 1; card7 < 52; card7++)
              result = intDict[32131]; // perform C(52,7) dictionary key lookups
sw.Stop();
Console.WriteLine("time for dictionary lookups: {0} ms", sw.ElapsedMilliseconds);

sw.Reset();

sw.Start();
for (int card1 = 0; card1 < 46; card1++)
  for (int card2 = card1 + 1; card2 < 47; card2++)
    for (int card3 = card2 + 1; card3 < 48; card3++)
      for (int card4 = card3 + 1; card4 < 49; card4++)
        for (int card5 = card4 + 1; card5 < 50; card5++)
          for (int card6 = card5 + 1; card6 < 51; card6++)
            for (int card7 = card6 + 1; card7 < 52; card7++)
              result = intList[32131]; // perform C(52,7) array index lookups
sw.Stop();
Console.WriteLine("time for array index lookups: {0} ms", sw.ElapsedMilliseconds);

哪个输出:

time for dictionary lookups: 2532 ms
time for array index lookups: 313 ms

这种类型的行为是否符合预期(性能下降 8 倍)? IIRC,字典平均有 O(1) 次查找,而数组在最坏情况下有 O(1) 次查找,所以我确实希望数组查找速度更快,但不会快这么多!

我目前将扑克手牌排名存储在字典中。 我想如果这与字典查找一样快,我必须重新考虑我的方法并使用数组来代替,尽管索引排名会变得有点棘手,我可能不得不问另一个关于它的问题。

I'm writing a 7 card poker hand evaluator as one of my pet projects. While trying to optimize its speed (I like the challenge), I was shocked to find that the performance of Dictionary key lookups was quite slow compared to array index lookups.

For example, I ran this sample code that enumerates over all 52 choose 7 = 133,784,560 possible 7 card hands:

var intDict = new Dictionary<int, int>();
var intList = new List<int>();
for (int i = 0; i < 100000; i ++)
{
    intDict.Add(i, i);  
    intList.Add(i);
}

int result;

var sw = new Stopwatch();
sw.Start();
for (int card1 = 0; card1 < 46; card1++)
  for (int card2 = card1 + 1; card2 < 47; card2++)
    for (int card3 = card2 + 1; card3 < 48; card3++)
      for (int card4 = card3 + 1; card4 < 49; card4++)
        for (int card5 = card4 + 1; card5 < 50; card5++)
          for (int card6 = card5 + 1; card6 < 51; card6++)
            for (int card7 = card6 + 1; card7 < 52; card7++)
              result = intDict[32131]; // perform C(52,7) dictionary key lookups
sw.Stop();
Console.WriteLine("time for dictionary lookups: {0} ms", sw.ElapsedMilliseconds);

sw.Reset();

sw.Start();
for (int card1 = 0; card1 < 46; card1++)
  for (int card2 = card1 + 1; card2 < 47; card2++)
    for (int card3 = card2 + 1; card3 < 48; card3++)
      for (int card4 = card3 + 1; card4 < 49; card4++)
        for (int card5 = card4 + 1; card5 < 50; card5++)
          for (int card6 = card5 + 1; card6 < 51; card6++)
            for (int card7 = card6 + 1; card7 < 52; card7++)
              result = intList[32131]; // perform C(52,7) array index lookups
sw.Stop();
Console.WriteLine("time for array index lookups: {0} ms", sw.ElapsedMilliseconds);

which outputs:

time for dictionary lookups: 2532 ms
time for array index lookups: 313 ms

Is this type of behavior expected (performance decrease by a factor of 8)? IIRC, a Dictionary has, on average, O(1) lookups, while an array has worst-case O(1) lookups, so I do expect the array lookups to be faster, but not by this much!

I am currently storing poker hand rankings in a Dictionary. I suppose if this is as fast as the dictionary lookups can be, I have to rethink my approach and use arrays instead, although indexing the rankings will get a little tricky and I'll probably have to ask another question about it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

安稳善良 2024-07-29 04:25:00

不要忘记 Big-O 符号仅表示复杂性如何随大小(等)而增长 - 它没有给出所涉及的常数因素的任何指示。 这就是为什么有时当键足够少时,对键的线性搜索比字典查找更快。 在这种情况下,您甚至没有对数组进行搜索 - 只是直接索引操作。

的情况

pointer_into_array = base_pointer + offset * size

对于直接索引查找,数组基本上是理想的 - 这只是(然后是指针取消引用)

。执行字典查找相对复杂 - 当有很多键时,与(例如)按键进行线性查找相比非常快,但比直接数组查找复杂得多。 它必须计算密钥的哈希值,然后计算出应该位于哪个存储桶中,可能处理重复的哈希值(或重复的存储桶),然后检查是否相等。

与往常一样,为作业选择正确的数据结构 - 如果您真的可以只索引到数组(或 List),那么是的,这将快得令人眼花缭乱。

Don't forget that Big-O notations only says how the complexity grows with respect to the size (etc) - it doesn't give any indication of the constant factors involved. That's why sometimes even a linear search for keys is faster than a dictionary lookup, when there are sufficiently few keys. In this case you're not even doing a search with the array though - just a straight indexing operation.

For straight index lookups, arrays are basically ideal - it's just a case of

pointer_into_array = base_pointer + offset * size

(And then a pointer dereference.)

Performing a dictionary lookup is relatively complicated - very fast compared with (say) a linear lookup by key when there are lots of keys, but much more complicated than a straight array lookup. It has to calculate the hash of the key, then work out which bucket that should be in, possibly deal with duplicate hashes (or duplicate buckets) and then check for equality.

As always, choose the right data structure for the job - and if you really can get away with just indexing into an array (or List<T>) then yes, that will be blindingly fast.

青瓷清茶倾城歌 2024-07-29 04:25:00

这种行为是否符合预期(性能下降 8 倍)?

为什么不? 每个数组查找几乎是瞬时/可忽略的,而字典查找可能至少需要额外的子例程调用。

它们都是 O(1) 的点意味着即使每个集合中有 50 倍以上的项目,性能下降仍然只是一个因素 (8)。

Is this type of behavior expected (performance decrease by a factor of 8)?

Why not? Each array lookup is almost intantaneous/negligeable, whereas a dictionary lookup may need at least an extra subroutine call.

The point of their both being O(1) means that even if you have 50 times more items in each collection, the performance decrease is still only a factor of whatever it is (8).

无法回应 2024-07-29 04:25:00

有些事情可能需要一千年,但仍然是 O(1)。

如果您在反汇编窗口中单步执行此代码,您很快就会明白其中的区别。

Something could take a millenium, and still be O(1).

If you single-step through this code in the disassembly window, you will quickly come to understand what the difference is.

各自安好 2024-07-29 04:25:00

当键空间非常大并且无法映射到稳定的有序顺序时,字典结构最有用。 如果您可以将键转换为相对较小范围内的简单整数,那么您将很难找到比数组性能更好的数据结构。

关于实施说明; 在 .NET 中,字典本质上是可哈希的。 您可以通过确保您的键散列到一个大的唯一值空间中来在一定程度上提高它们的键查找性能。 在你的情况下,你使用一个简单的整数作为键(我相信它会散列到它自己的值) - 所以这可能是你能做的最好的事情。

Dictionary structures are most useful when the key space is very large and cannot be mapped into a stable, sequenced order. If you can convert your keys into a simple integer in a relatively small range, you will be hard-pressed to find a data structure that will perform better than an array.

On an implementation note; in .NET, dictionaries are essentially hashables. You can somewhat improve their key-lookup performance by ensuring that your keys hash into a large space of unique values. It looks like in your case, you are using a simple integer as a key (which I believe hashes to its own value) - so that may be the best you can do.

醉梦枕江山 2024-07-29 04:25:00

数组查找是您能做的最快的事情 - 本质上它只是从数组开头到您想要查找的元素的一位指针算术。 另一方面,字典查找可能会慢一些,因为它需要进行散列并关注找到正确的存储桶。 尽管预期的运行时间也是 O(1) - 算法常数更大,所以速度会更慢。

An array lookup is about the fastest thing you can do - essentially all it is is a single bit of pointer arithmetic to go from the start of the array to the element you wanted to find. On the other hand, the dictionary lookup is likely to be somewhat slower since it needs to do hashing and concern itself with finding the correct bucket. Although the expected runtime is also O(1) - the algorithmic constants are greater so it will be slower.

自演自醉 2024-07-29 04:25:00

欢迎使用 Big-O 表示法。 您始终必须考虑其中涉及一个恒定因素。

进行一次字典查找当然比数组查找要昂贵得多。

Big-O 只告诉您算法如何扩展。 将查找次数加倍,看看数字如何变化:两者都应该花费大约两倍的时间。

Welcome to Big-O notation. You always have to consider that there is a constant factor involved.

Doing one Dict-Lookup is of course much more expensive than an array lookup.

Big-O only tells you how algorithms scale. Double the amount of lookups and see how the numbers change: Both should take around the twice time.

鯉魚旗 2024-07-29 04:25:00

字典检索元素的成本为 O(1) ,但那是因为字典是作为哈希表实现的 - 因此您必须首先计算哈希值才能知道要返回哪个元素。 哈希表通常效率不高,但它们适用于大型数据集或具有大量唯一哈希值的数据集。

List(除了是一个用来描述数组而不是链表的垃圾词!)会更快,因为它将通过直接计算您想要返回的元素来返回值。

The cost of retrieving an element from a Dictionary is O(1), but that's because a dictionary is implemented as a hashtable - so you have to first calculate the hash value to know which element to return. Hashtables are often not that efficient - but they are good for large datasets, or datasets that have a lot of unique-hash values.

The List (apart from being a rubbish word used to dercribe an array rather than a linked list!) will be faster as it will return the value by directly calculating the element you want returned.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文